177 107 10MB
English Pages 767 Year 2015
The Tera-Tom Video Series
Lessons with Tera-Tom Teradata Architecture and SQL Video Series These exciting videos make learning and certification much easier
Three ways to view them: 1. Safari (look up Coffing Studios) 2. CoffingDW.com (sign-up on our website) 3. Your company can buy them all for everyone to see (contact [email protected])
Current Books in the Tera-Tom Genius Series
Current Books in the Tera-Tom Genius Series
Our Recommended Book In The Tera-Tom Genius Series
Tera-Tom- Author of over 50 Books
Tera-Tom books have been the primary source of Teradata learning for over 20 years. They have helped to teach millions of people all aspects of Teradata. What people love the most about the Tera-Tom books is how easy they are to understand. They are so easy that a seven year old boy (raised by wolves) can understand them!
The Best Query Tool Works on all Systems
When you possess a tool like Nexus, you have access to every system in your enterprise! The Nexus Query Chameleon is the only tool that works on all systems. Its Super Join Builder allows for the ERwin Logical Model to be loaded, and then Nexus shows tables and views visually. It then guides users to show what joins to what. As users choose the tables and columns they want in their report, Nexus builds the SQL for them with each click of the mouse. Nexus was designed for Teradata and Hadoop, but works on all platforms. Nexus even converts table structures between vendors, so querying and managing multi-vendor platforms is transparent. Even if you only work with one system, you will find that the Nexus is the best query tool you have ever used. If you work with multiple systems, you will be even more amazed. Download a free trial at www.CoffingDW.com.
Trademarks and Copyrights Microsoft Windows, Windows 2003 Server, SQL Server 2012, SQL Server Compact Edition, .NET, PDW, SQL Server, T-SQL, Azure SQL Data Warehouse and Azure Cloud are trademarks of Microsoft. Teradata, NCR, BYNET and SQL Assistant are registered trademarks of Teradata Corporation, Dayton, Ohio, U.S.A., IBM, DB2 and Netezza are registered trademarks of IBM Corporation, ANSI is a registered trademark of the American National Standards Institute. Ethernet is a trademark of Xerox. UNIX is a trademark of The Open Group. Linux is a trademark of Linus Torvalds. Java and Oracle is a trademark of Oracle. ParAccel is a trademark of ParAccel. Kognitio is a trademark of Kognitio. Greenplum is a trademark of EMC Corporation. Nexus Query Chameleon is a trademark of Coffing Data Warehousing. Coffing Data Warehousing shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of programs or program segments that are included. The manual is not a publication of Microsoft Corporation, nor was it produced in conjunction with Microsoft Corporation. Copyright © May 2015 by Coffing Publishing ISBN 978-1-940540-32-0 All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without written permission from the publisher. No patent liability is assumed with respect to the use of information contained herein. Although every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, neither is any liability assumed for damages resulting from the use of information contained herein.
About Tom Coffing
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 20 years. Tom has written over 50 books on all aspects of Teradata, Netezza, Kognitio, Redshift, ParAccel, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 Teradata classes in places such as India, Africa, Europe, China, Malaysia, and throughout North America. Tom is also the owner and designer of the Nexus Query Chameleon, the most sophisticated enterprise query tool in the industry. The Nexus works on all platforms, including Hadoop, converts table structures between all systems, and allows companies to load their ERwin logical model inside Nexus. The Nexus guides users like a GPS system. Users point and click on any table or view from any system, and they are guided to what joins to what. As users choose the columns they want on their report, the SQL is built automatically. In High School, Tom was the first athlete from his school to ever place at state. He was selected by his school to represent them at Buckeye Boys State, and Tom was inducted into the first class of the Lakota High School Hall of Fame. At the University of Arizona and University of Nevada Las Vegas, Tom was a two-time All-American wrestler, Sophomore Athlete of the year, and a two-time winner of the 1980 Olympic wrestling trials. Tom graduated with a Bachelor’s degree in Speech Communications. After college, Tom became a state and national champion speech winner for Toastmasters and won two orchid awards as an actor. Tom is the proud father of three wonderful children and has been married for the past 32 years. You can contact Tom at 513 300-0341 or at [email protected].
About Todd Wilson
Todd Wilson is the Chief Technology Officer of Coffing Data Warehousing. As CTO, Todd has overseen the development of CoffingDW's premiere data analytics tool, The Nexus Query Chameleon. Under Todd’s leadership, the Nexus has expanded to support data sources from all spectrums of data warehousing such as Kognitio (in-memory analytics), HP Vertica (columnar data), Hadoop (including such “Big Data” companies as Cloudera and Hortonworks), cloud service data sources (Amazon’s Redshift), as well as traditional data sources such as Oracle, Teradata, SQL Server, Greenplum, and DB2. An experienced .NET developer, Todd has developed data replication tools, data movement tools, data visualization tools, database management tools, DDL conversion tools and is currently overseeing the development of the Nexus Logical Model Loader. As a technical consultant, he has worked with multiple Fortune 500 companies in fields such as telecommunication, PC manufacturing, and health care. Todd is a Teradata Certified Master and a graduate of Pepperdine University.
Table of Contents
Contents Chapter 1 – Introduction to the Azure SQL Data Warehouse ...................................................................................... 1 Introduction to the Family of SQL Server Products .................................................................................................. 2 Introduction to the Family Continued ........................................................................................................................ 3 Microsoft Azure SQL Data Warehouse ..................................................................................................................... 4 Symmetric Multi-Processing (SMP) .......................................................................................................................... 5 What is Parallel Processing? ...................................................................................................................................... 6 The Basics of a Single Computer ............................................................................................................................... 7 Data in Memory is Fast as Lightning ......................................................................................................................... 8 Parallel Processing of Data ....................................................................................................................................... 9 A Table has Columns and Rows............................................................................................................................... 10 The Azure SQL Data Warehouse has Linear Scalability ......................................................................................... 11 The Architecture of the Azure SQL Data Warehouse ............................................................................................. 12 Nexus is now Available on the Microsoft Azure Cloud ........................................................................................... 13 The MPP Engine is the Optimizer ........................................................................................................................... 14 The Azure SQL Data Warehouse System ................................................................................................................ 15 The Azure SQL Data Warehouse System is Scalable ............................................................................................. 16 The Control Node ..................................................................................................................................................... 17 The Data Rack .......................................................................................................................................................... 18 The Landing Zone .................................................................................................................................................... 19 The Backup Node ..................................................................................................................................................... 20 Software as a Service (SaaS) and the Elastic Database ........................................................................................... 21 Azure Data Lake ...................................................................................................................................................... 22 Azure Disaster Recovery.......................................................................................................................................... 23 Security and Compliance ......................................................................................................................................... 24 How to Get an EXPLAIN Plan ................................................................................................................................ 25
Table of Contents Chapter 2 – The Azure SQL Data Warehouse Table Structures ................................................................................ 27 The 5 Concepts of Azure SQL Data Warehouse Tables.......................................................................................... 28 Tables are Either Distributed by Hash or Replicated (1 of 5) ................................................................................. 29 Table Rows are Either Sorted or Unsorted (2 of 5) ................................................................................................. 30 Tables are Stored in Either Row or Columnar Format (3 of 5) ............................................................................... 31 Tables can be Partitioned (4 of 5) ............................................................................................................................ 32 There are Permanent, Temporary and External Tables (5 of 5) .............................................................................. 33 Creating a Table With a Distribution Key ............................................................................................................... 34 Creating a Table that is Replicated .......................................................................................................................... 35 Distributed by Hash vs. Replication ........................................................................................................................ 36 The Concept is All About the Joins ......................................................................................................................... 37 Creation of a Hash Distributed Table with a Clustered Index ................................................................................. 38 A Clustered Index Sorts the Data Stored on Disk.................................................................................................... 39 Each Node Has 8 Distributions ................................................................................................................................ 40 How Hashed Tables are Stored Among a Single Node ........................................................................................... 41 Hashed Tables Will Be Distributed Among All Distributions ................................................................................ 42 Creation of a Replicated Table................................................................................................................................. 43 How Replicated Tables are Stored Among a Single Node ...................................................................................... 44 Replicated Table will be Duplicated among Each Node ........................................................................................ 45 Distributed by Replication ....................................................................................................................................... 46 How Hashed and Replicated Tables Work Together ............................................................................................... 47 Tables are Stored as Row-based or Column-based.................................................................................................. 48 Creation of a Columnar Table that is Hashed .......................................................................................................... 49 How Hashed Columnar Tables are Stored on a Single Node .................................................................................. 50 How Hashed Columnar Tables are Stored on All Distributions .............................................................................. 51 Comparing Normal Table Vs. Columnar Tables ..................................................................................................... 52 Columnar can move just One Segment to Memory ................................................................................................. 53 Segments on Distributions are Aligned to Rebuild a Row ...................................................................................... 54 Why Columnar? ....................................................................................................................................................... 55
Table of Contents Columnar Tables Store Each Column in Separate Pages ........................................................................................ 56 Visualize the Data – Rows vs. Columns .................................................................................................................. 57 Creation of a Columnar Table that is Replicated ..................................................................................................... 58 Creating a Partitioned Table Per Month .................................................................................................................. 59 A Visual of One Year of Data with Range Per Month ............................................................................................ 60 Another Create Example of a Partitioned Table ...................................................................................................... 61 Creating a Partitioned Table Per Month That is a Columnstore .............................................................................. 62 Visual of Row Partitioning and Columnar Storage ................................................................................................. 63 CREATE TABLE AS (CTAS) Example ................................................................................................................. 64 Creating a Temporary Table .................................................................................................................................... 65 Facts About Tables ................................................................................................................................................... 66 Chapter 3 – Hashing and Data Distribution ................................................................................................................ 68 Distribution Keys Hashed on Unique Values Spread Evenly.................................................................................. 69 Distribution Keys With Non-Unique Values Spread Unevenly .............................................................................. 70 Best Practices for Choosing a Distribution Key ...................................................................................................... 71 The Hash Map Determines which Distribution owns the Row ............................................................................... 72 The Hash Map Determines which Node will Own the Row ................................................................................... 73 The Hash Map Determines which Node will Own the Row ................................................................................... 74 The Hash Map Determines which Node will Own the Row ................................................................................... 75 The Hash Map Determines which Node will Own the Row ................................................................................... 76 A Review of the Hashing Process ............................................................................................................................ 77 Non-Unique Distribution Keys have Skewed Data ................................................................................................. 78 Chapter 4 – The Technical Details.............................................................................................................................. 80 Every Node has the Exact Same Tables................................................................................................................... 81 Hashed Tables are spread across All Distributions ................................................................................................. 82 The Table Header and the Data Rows are Stored Separately .................................................................................. 83 A Distribution Stores the Rows of a Table inside a Data Block.............................................................................. 84
Table of Contents To Read a Data Block a Node Moves the Block into Memory ............................................................................... 85 A Full Table Scan Means All Nodes Must Read All Rows..................................................................................... 86 Rows are Organized inside a Page ........................................................................................................................... 87 Moving Data Blocks is Like Checking In Luggage................................................................................................. 88 As Row-Based Tables Get Bigger, the Page Splits ................................................................................................. 89 Data Pages are Processed One at a Time Per Unit................................................................................................... 90 Creating a Table that is a Heap ................................................................................................................................ 91 Heap Page ................................................................................................................................................................. 92 Extents ...................................................................................................................................................................... 93 Creating a Table that has a Clustered Index ............................................................................................................ 94 Clustered Index Page................................................................................................................................................ 95 The Row Offset Array is the Guidance System for Every Row .............................................................................. 96 The Row Offset Array Provides Two Search Options (1 of 2) ............................................................................... 97 The Row Offset Array Provides Two Search Options (2 of 2) ............................................................................... 98 The Row Offset Array Helps With Inserts .............................................................................................................. 99 B-Trees ................................................................................................................................................................... 100 The Building of a B-Tree for a Clustered Index (1 of 3) ....................................................................................... 101 The Building of a B-Tree for a Clustered Index (2 of 3) ....................................................................................... 102 The Building of a B-Tree for a Clustered Index (3 of 3) ....................................................................................... 103 When Do I Create a Clustered Index? ................................................................................................................... 104 When Do I Create a Non Clustered Index? ........................................................................................................... 105 B-Tree for Non Clustered Index on a Clustered Table (1 of 2) ............................................................................. 106 B-Tree for Non Clustered Index on a Clustered Table (2 of 2) ............................................................................. 107 Adding a Non Clustered Index To A Heap ............................................................................................................ 108 B-Tree for Non Clustered Index on a Heap Table (1 of 2) .................................................................................... 109 B-Tree for Non Clustered Index on a Heap Table (2 of 2) .................................................................................... 110 Max Levels on the Azure SQL Data Warehouse ................................................................................................... 111 Azure SQL Data Warehouse Data Types .............................................................................................................. 112 Character Data Types for SQL Server ................................................................................................................... 113
Table of Contents Numeric Data Types for SQL Server ..................................................................................................................... 114 Date and Time Data Types for SQL Server ........................................................................................................... 115 Additional Data Types for SQL Server.................................................................................................................. 116 Chapter 5 – CREATE Statistics ............................................................................................................................... 118 CREATE Statistics Syntax..................................................................................................................................... 119 CREATE Statistics on a Percentage of a Table ..................................................................................................... 120 CREATE Statistics on a Sample by Using the System Default ............................................................................ 121 CREATE Statistics on a Multi-Column Join Key ................................................................................................. 122 What to Column(s) to CREATE Statistics On ....................................................................................................... 123 CREATE Statistics Using a WHERE Clause ........................................................................................................ 124 Updating All Statistics on a Table ......................................................................................................................... 125 Updating Only Certain Statistics on a Table.......................................................................................................... 126 Dropping Statistics on Certain Statistics on a Table .............................................................................................. 127 Showing the Statistics ............................................................................................................................................ 128 DBCC SHOW_STATISTICS ................................................................................................................................ 129 DBCC SHOW_STATISTICS WITH HISTOGRAM ........................................................................................... 130 Chapter 6 - The Basics of SQL ................................................................................................................................. 132 Introduction ............................................................................................................................................................ 133 Naming of Objects ................................................................................................................................................. 134 Setting Your Default Database ............................................................................................................................... 135 SELECT * (All Columns) in a Table ..................................................................................................................... 136 Fully Qualifying a Database, Schema and Table ................................................................................................... 137 SELECT Specific Columns in a Table .................................................................................................................. 138 Commas in the Front or Back? .............................................................................................................................. 139 Place your Commas in front for better Debugging Capabilities ............................................................................ 140 Sort the Data with the ORDER BY Keyword ....................................................................................................... 141 ORDER BY Defaults to Ascending ....................................................................................................................... 142
Table of Contents Use the Name or the Number in your ORDER BY Statement .............................................................................. 143 Two Examples of ORDER BY using Different Techniques ................................................................................. 144 Changing the ORDER BY to Descending Order................................................................................................... 145 NULL Values sort First in Ascending Mode (Default) ......................................................................................... 146 NULL Values sort Last in Descending Mode (DESC).......................................................................................... 147 Major Sort vs. Minor Sorts .................................................................................................................................... 148 Multiple Sort Keys using Names vs. Numbers ...................................................................................................... 149 Sorts are Alphabetical, NOT Logical ..................................................................................................................... 150 Using A CASE Statement to Sort Logically .......................................................................................................... 151 An Order By That Uses an Expression .................................................................................................................. 152 How to ALIAS a Column Name ............................................................................................................................ 153 Aliasing a Column Name with Spaces or Reserved Words................................................................................... 154 A Missing Comma can by Mistake become an Alias ............................................................................................ 155 Comments using Double Dashes are Single Line Comments ............................................................................... 156 Comments for Multi-Lines..................................................................................................................................... 157 Comments for Multi-Lines as Double Dashes Per Line ........................................................................................ 158 A Great Technique for Comments to Look for SQL Errors .................................................................................. 159 sp_help at the Database Level ................................................................................................................................ 160 sp_help at the Object Level .................................................................................................................................... 161 Getting System Information ................................................................................................................................... 162 Getting Additional System Information ................................................................................................................. 163 Chapter 7 – The Where Clause ................................................................................................................................. 165 The WHERE Clause limits Returning Rows ......................................................................................................... 166 Double Quoted Aliases are for Reserved Words and Spaces ................................................................................ 167 Using A Column ALIAS in a WHERE Clause ..................................................................................................... 168 Using A Column ALIAS in a ORDER BY Clause ............................................................................................... 169 In What Order Does SQL Server Process A Query? ............................................................................................. 170 Character Data needs Single Quotes in the WHERE Clause................................................................................. 171
Table of Contents Character Data needs Single Quotes, but Numbers Don’t..................................................................................... 172 Declaring a Variable .............................................................................................................................................. 173 Comparisons against a Null Value ......................................................................................................................... 174 NULL means UNKNOWN DATA so Equal (=) won’t Work .............................................................................. 175 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 176 NULL is UNKNOWN DATA so NOT Equal won’t Work .................................................................................. 177 Use IS NULL or IS NOT NULL when dealing with NULLs ............................................................................... 178 Using Greater Than or Equal To (>=).................................................................................................................... 179 AND in the WHERE Clause .................................................................................................................................. 180 Troubleshooting AND ............................................................................................................................................ 181 OR in the WHERE Clause ..................................................................................................................................... 182 Troubleshooting Or ................................................................................................................................................ 183 Troubleshooting Character Data ............................................................................................................................ 184 Using Different Columns in an AND Statement ................................................................................................... 185 Quiz – How many rows will return? ...................................................................................................................... 186 Answer to Quiz – How many rows will return? .................................................................................................... 187 LIKE command Underscore is Wildcard for one Character.................................................................................. 188 LIKE command using a Range of Values.............................................................................................................. 189 LIKE command Using a NOT Range of Values ................................................................................................... 190 LIKE Command Works Differently on Char Vs Varchar ..................................................................................... 191 Troubleshooting LIKE Command on Character Data ........................................................................................... 192 Introducing the RTRIM Command ........................................................................................................................ 193 Quiz – What Data is Left Justified and What is Right? ......................................................................................... 194 Numbers are Right Justified and Character Data is Left ....................................................................................... 195 Answer – What Data is Left Justified and What is Right? .................................................................................... 196 An Example of Data with Left and Right Justification ......................................................................................... 197 A Visual of CHARACTER Data vs. VARCHAR Data ........................................................................................ 198 RTRIM command Removes Trailing spaces on CHAR Data ............................................................................... 199 Using Like with an AND Clause to Find Multiple Letters .................................................................................... 200
Table of Contents Using Like with an OR Clause to Find Either Letters ........................................................................................... 201 Declaring a Variable and Using it with the LIKE Command ................................................................................ 202 Escape Character in the LIKE Command changes Wildcards .............................................................................. 203 Escape Characters Turn off Wildcards in the LIKE Command ............................................................................ 204 Quiz – Turn off that Wildcard................................................................................................................................ 205 ANSWER – To Find that Wildcard ....................................................................................................................... 206 Chapter 8 – Distinct, Group By and TOP ................................................................................................................. 208 The Distinct Command .......................................................................................................................................... 209 Distinct vs. GROUP BY ........................................................................................................................................ 210 Quiz – How many rows come back from the Distinct? ......................................................................................... 211 Answer – How many rows come back from the Distinct? .................................................................................... 212 TOP Command....................................................................................................................................................... 213 TOP Command is brilliant when ORDER BY is used! ......................................................................................... 214 TOP Command with Ties....................................................................................................................................... 215 TOP Command Using a Variable .......................................................................................................................... 216 Chapter 9 – Aggregation ........................................................................................................................................... 218 Quiz – You calculate the Answer Set in your own Mind ...................................................................................... 219 Answer – You calculate the Answer Set in your own Mind ................................................................................. 220 The 3 Rules of Aggregation ................................................................................................................................... 221 There are Five Aggregates ..................................................................................................................................... 222 Quiz – How many rows come back? ..................................................................................................................... 223 Answer – How many rows come back? ................................................................................................................. 224 Troubleshooting Aggregates .................................................................................................................................. 225 GROUP BY when Aggregates and Normal Columns Mix ................................................................................... 226 GROUP BY delivers one row per Group .............................................................................................................. 227 Count_Big .............................................................................................................................................................. 228 Limiting Rows and Improving Performance with WHERE .................................................................................. 229
Table of Contents WHERE Clause in Aggregation limits unneeded Calculations ............................................................................. 230 Keyword HAVING tests Aggregates after they are Totaled ................................................................................. 231 Group By Grouping Sets ........................................................................................................................................ 232 Group By Rollup .................................................................................................................................................... 233 Answer Set for Group By Rollup Query................................................................................................................ 234 Creating a Cube ...................................................................................................................................................... 235 Answer Set for Cube Query ................................................................................................................................... 236 An Easy Example of Creating a Cube ................................................................................................................... 237 Quiz - GROUP BY GROUPING SETS Challenge ............................................................................................... 238 Answer To Quiz - GROUP BY GROUPING SETS Challenge ............................................................................ 239 Getting the Average Values Per Column ............................................................................................................... 240 Average Values per Column for all Columns in a Table ....................................................................................... 241 Chapter 10 - Join Functions ...................................................................................................................................... 243 The Azure SQL Data Warehouse Join Quiz .......................................................................................................... 244 The Azure SQL Data Warehouse Join Quiz Answer ............................................................................................ 245 Redistribution ......................................................................................................................................................... 246 Big Table Small Table Join Strategy ..................................................................................................................... 247 Duplication of the Smaller Table across All-Distributions ................................................................................... 248 If the Join Condition is the Distribution Key no Movement ................................................................................. 249 Matching Rows That Are On The Same Node Naturally ...................................................................................... 250 What if the Join Condition Columns are Not Primary Indexes ............................................................................. 251 Strategy 1 of 4 – The Merge Join ........................................................................................................................... 252 Quiz – Redistribute the Employees by their Dept_No .......................................................................................... 253 Quiz –Dept_No landed on Distribution with Matches .......................................................................................... 254 Quiz – Redistribute the Orders to the Proper Distribution .................................................................................... 255 Answer to Redistribute the Employees by their Dept_No Quiz ............................................................................ 256 Strategy 2 of 4 – The Hash Join ............................................................................................................................. 257 Strategy 4 of 4 – The Product Join ......................................................................................................................... 258
Table of Contents A Two-Table Join Using Traditional Syntax ......................................................................................................... 259 A two-table join using Non-ANSI Syntax with Table Alias ................................................................................. 260 You Can Fully Qualify All Columns ..................................................................................................................... 261 A two-table join using ANSI Syntax ..................................................................................................................... 262 Both Queries have the same Results and Performance.......................................................................................... 263 Quiz – Can You Finish the Join Syntax? ............................................................................................................... 264 Answer to Quiz – Can You Finish the Join Syntax? ............................................................................................. 265 Quiz – Can You Find the Error? ............................................................................................................................ 266 Answer to Quiz – Can You Find the Error? .......................................................................................................... 267 Super Quiz – Can You Find the Difficult Error? ................................................................................................... 268 Answer to Super Quiz – Can You Find the Difficult Error? ................................................................................. 269 Quiz – Which rows from both tables won’t Return? ............................................................................................. 270 Answer to Quiz – Which rows from both tables Won’t Return?........................................................................... 271 LEFT OUTER JOIN .............................................................................................................................................. 272 LEFT OUTER JOIN Results ................................................................................................................................. 273 RIGHT OUTER JOIN............................................................................................................................................ 274 RIGHT OUTER JOIN Example and Results......................................................................................................... 275 FULL OUTER JOIN .............................................................................................................................................. 276 FULL OUTER JOIN Results ................................................................................................................................. 277 Which Tables are the Left and which Tables are Right? ....................................................................................... 278 Answer - Which Tables are the Left and Which are the Right? ............................................................................ 279 INNER JOIN with Additional AND Clause .......................................................................................................... 280 ANSI INNER JOIN with Additional AND Clause ............................................................................................... 281 ANSI INNER JOIN with Additional WHERE Clause .......................................................................................... 282 OUTER JOIN with Additional WHERE Clause ................................................................................................... 283 OUTER JOIN with Additional AND Clause ......................................................................................................... 284 OUTER JOIN with Additional AND Clause Results ............................................................................................ 285 Quiz – Why is this considered an INNER JOIN? .................................................................................................. 286 Evaluation Order for Outer Queries ....................................................................................................................... 287
Table of Contents The DREADED Product Join ................................................................................................................................ 288 The DREADED Product Join Results ................................................................................................................... 289 The Horrifying Cartesian Product Join .................................................................................................................. 290 The ANSI Cartesian Join will ERROR .................................................................................................................. 291 Quiz – Do these Joins Return the Same Answer Set? ........................................................................................... 292 Answer – Do these Joins Return the Same Answer Set? ....................................................................................... 293 The CROSS JOIN .................................................................................................................................................. 294 The CROSS JOIN Answer Set............................................................................................................................... 295 The Self Join.......................................................................................................................................................... 296 The Self Join with ANSI Syntax ........................................................................................................................... 297 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 298 Answer – Will both queries bring back the same Answer Set? ............................................................................. 299 Quiz – Will both queries bring back the same Answer Set? ................................................................................. 300 Answer – Will both queries bring back the same Answer Set? ............................................................................. 301 How would you Join these two tables? .................................................................................................................. 302 An Associative Table is a Bridge that Joins Two Tables ...................................................................................... 303 Quiz – Can you write the 3-Table Join? ................................................................................................................ 304 Answer to Quiz – Can you Write the 3-Table Join? .............................................................................................. 305 Quiz – Can you write the 3-Table Join to ANSI Syntax? ...................................................................................... 306 Answer – Can you Write the 3-Table Join to ANSI Syntax? ................................................................................ 307 Quiz – Can you Place the ON Clauses at the End?................................................................................................ 308 Answer – Can you Place the ON Clauses at the End? ........................................................................................... 309 The 5-Table Join – Logical Insurance Model ........................................................................................................ 310 Quiz - Write a Five Table Join Using ANSI Syntax .............................................................................................. 311 Answer - Write a Five Table Join Using ANSI Syntax ......................................................................................... 312 Quiz - Write a Five Table Join Using Non-ANSI Syntax ..................................................................................... 313 Answer - Write a Five Table Join Using Non-ANSI Syntax ................................................................................. 314 Quiz –Re-Write this putting the ON clauses at the END ...................................................................................... 315 Answer –Re-Write this putting the ON clauses at the END .................................................................................. 316
Table of Contents Chapter 11 – Date Function ..................................................................................................................................... 318 Current_Timestamp................................................................................................................................................ 319 Getdate ................................................................................................................................................................... 320 Date and Time Keywords....................................................................................................................................... 321 SYSDATETIMEOFFSET Provides the Timezone Offset .................................................................................... 322 SYSDATETIMEOFFSET Provides the Timezone Offset .................................................................................... 323 Using both CAST and CONVERT in Literal Values ............................................................................................ 324 Using Both CAST and CONVERT in Literal Values ........................................................................................... 325 Using both CAST and CONVERT in Literal Values ............................................................................................ 326 The DATEADD Function ...................................................................................................................................... 327 The DATEDIFF Function ...................................................................................................................................... 328 DATEADD Function ............................................................................................................................................. 329 A Real World Example for DateAdd Using the Order Table ................................................................................ 330 DATEPART Function ............................................................................................................................................ 331 DATEPART Function Examples ........................................................................................................................... 332 YEAR, MONTH, and DAY Functions .................................................................................................................. 333 A Better Technique for YEAR, MONTH, and DAY Functions ........................................................................... 334 DATENAME Function .......................................................................................................................................... 335 ISDATE Function .................................................................................................................................................. 336 Chapter 12 - Temporary Tables ................................................................................................................................ 338 Temporary Tables .................................................................................................................................................. 339 CREATING A Derived Table................................................................................................................................ 340 Naming the Derived Table ..................................................................................................................................... 341 Aliasing the Column Names in the Derived Table ................................................................................................ 342 Multiple Ways to Alias the Columns in a Derived Table ...................................................................................... 343 CREATING a Derived Table using the WITH Command .................................................................................... 344 The Same Derived Query shown Three Different Ways ....................................................................................... 345 MULTIPLE Derived Tables using the WITH Command ..................................................................................... 346
Table of Contents Column Alias Can Default For Normal Columns.................................................................................................. 347 Most Derived Tables Are Used To Join To Other Tables ..................................................................................... 348 A Join Example Showing Different Column Alias Styles ..................................................................................... 349 The Three Components of a Derived Table ........................................................................................................... 350 Visualize This Derived Table ................................................................................................................................ 351 Our Join Example With The WITH Syntax ........................................................................................................... 352 Quiz - Answer the Questions ................................................................................................................................. 353 Answer to Quiz - Answer the Questions................................................................................................................ 354 Clever Tricks on Aliasing Columns in a Derived Table ........................................................................................ 355 A Derived Table lives only for the lifetime of a single query ............................................................................... 356 An Example of Two Derived Tables in a Single Query ........................................................................................ 357 RECURSIVE Derived Table Hierarchy ................................................................................................................ 358 RECURSIVE Derived Table Query ...................................................................................................................... 359 RECURSIVE Derived Table Definition ................................................................................................................ 360 WITH RECURSIVE Derived Table Seeding ........................................................................................................ 361 WITH RECURSIVE Derived Table Looping ....................................................................................................... 362 RECURSIVE Derived Table Looping in Slow Motion......................................................................................... 363 RECURSIVE Derived Table Looping Continued ................................................................................................. 364 RECURSIVE Derived Table Looping Continued ................................................................................................. 365 Six rows are added in the third loop. RECURSIVE Derived Table Ends the Looping ........................................ 365 RECURSIVE Derived Table Ends the Looping .................................................................................................... 366 RECURSIVE Derived Table Definition ................................................................................................................ 367 RECURSIVE Derived Table Answer Set .............................................................................................................. 368 What is TEMPDB? ................................................................................................................................................ 369 Creating a Temporary Table .................................................................................................................................. 370 The Three Steps to Use a Private Temporary Table .............................................................................................. 371 Creating a Temporary Table With a Clustered Index ............................................................................................ 372 Creating a Columnstore Temporary Table From a CTAS ..................................................................................... 373
Table of Contents Chapter 13 – Sub-query Functions ........................................................................................................................... 375 An IN List is much like a Subquery ....................................................................................................................... 376 An IN List Never has Duplicates – Just like a Subquery....................................................................................... 377 An IN List Ignores Duplicates ............................................................................................................................... 378 The Subquery ......................................................................................................................................................... 379 The Three Steps of How a Basic Subquery Works................................................................................................ 380 These are Equivalent Queries ................................................................................................................................ 381 The Final Answer Set from the Subquery.............................................................................................................. 382 Quiz- Answer the Difficult Question ..................................................................................................................... 383 Answer to Quiz- Answer the Difficult Question ................................................................................................... 384 Should you use a Subquery of a Join? ................................................................................................................... 385 Quiz- Write the Subquery ...................................................................................................................................... 386 Answer to Quiz- Write the Subquery..................................................................................................................... 387 Quiz- Write the More Difficult Subquery .............................................................................................................. 388 Answer to Quiz- Write the More Difficult Subquery ............................................................................................ 389 Quiz – Write the Extreme Subquery ...................................................................................................................... 390 Answer to Quiz – Write the Extreme Subquery .................................................................................................... 391 Quiz- Write the Subquery with an Aggregate........................................................................................................ 392 Answer to Quiz- Write the Subquery with an Aggregate ...................................................................................... 393 Quiz- Write the Correlated Subquery .................................................................................................................... 394 Answer to Quiz- Write the Correlated Subquery ................................................................................................... 395 The Basics of a Correlated Subquery ..................................................................................................................... 396 The Top Query always runs first in a Correlated Subquery .................................................................................. 397 Correlated Subquery Example vs. a Join with a Derived Table ............................................................................ 398 Quiz- A Second Chance to Write a Correlated Subquery ..................................................................................... 399 Answer - A Second Chance to Write a Correlated Subquery ................................................................................ 400 Quiz- A Third Chance to Write a Correlated Subquery ........................................................................................ 401 Answer - A Third Chance to Write a Correlated Subquery ................................................................................... 402 Quiz- Last Chance To Write a Correlated Subquery ............................................................................................. 403
Table of Contents Answer – Last Chance to Write a Correlated Subquery ........................................................................................ 404 Quiz – Write the Extreme Correlated Subquery .................................................................................................... 405 Answer To Quiz – Write the Extreme Correlated Subquery ................................................................................. 406 Quiz- Write the NOT Subquery ............................................................................................................................. 407 Answer to Quiz- Write the NOT Subquery ........................................................................................................... 408 Quiz- Write the Subquery using a WHERE Clause............................................................................................... 409 Answer - Write the Subquery using a WHERE Clause ......................................................................................... 410 Quiz – Write the Triple Subquery .......................................................................................................................... 411 Answer to Quiz – Write the Triple Subquery ........................................................................................................ 412 Quiz – How many rows return on a NOT IN with a NULL? ................................................................................ 413 Answer – How many rows return on a NOT IN with a NULL? ........................................................................... 414 How to handle a NOT IN with Potential NULL Values........................................................................................ 415 Using a Correlated Exists ....................................................................................................................................... 416 How a Correlated Exists matches up ..................................................................................................................... 417 The Correlated NOT Exists.................................................................................................................................... 418 The Correlated NOT Exists Answer Set ................................................................................................................ 419 Quiz – How many rows come back from this NOT Exists? .................................................................................. 420 Answer – How many rows come back from this NOT Exists? ............................................................................. 421 Chapter 14 – Window Functions OLAP ................................................................................................................... 423 The Row_Number Command ................................................................................................................................ 424 Quiz – How did the Row_Number Reset? ............................................................................................................. 425 Quiz – How did the Row_Number Reset? ............................................................................................................. 426 Using a Derived Table and Row_Number ............................................................................................................. 427 Ordered Analytics OVER ...................................................................................................................................... 428 RANK and DENSE RANK ................................................................................................................................... 429 RANK Defaults to Ascending Order ..................................................................................................................... 430 Getting RANK to Sort in DESC Order .................................................................................................................. 431 RANK() OVER and PARTITION BY .................................................................................................................. 432
Table of Contents Cumulative Sum ..................................................................................................................................................... 433 The ANSI CSUM – Getting a Sequential Number ................................................................................................ 434 Troubleshooting The ANSI OLAP on a GROUP BY ........................................................................................... 435 Reset with a PARTITION BY Statement .............................................................................................................. 436 PARTITION BY only Resets a Single OLAP not ALL of them........................................................................... 437 Sorting in DESC Order .......................................................................................................................................... 438 Moving Average..................................................................................................................................................... 439 Casting a Moving Average .................................................................................................................................... 440 Partition By Resets an ANSI OLAP ...................................................................................................................... 441 COUNT OVER for a Sequential Number ............................................................................................................. 442 Quiz – What caused the COUNT OVER to Reset? ............................................................................................... 443 Answer to Quiz – What caused the COUNT OVER to Reset? ............................................................................. 444 The MAX OVER Command.................................................................................................................................. 445 MAX OVER with PARTITION BY Reset ........................................................................................................... 446 MAX OVER Without Rows Unbounded Preceding ............................................................................................. 447 The MIN OVER Command ................................................................................................................................... 448 Quiz – Fill in the Blank .......................................................................................................................................... 449 Answer – Fill in the Blank ..................................................................................................................................... 450 How Ntile Works ................................................................................................................................................... 451 Ntile ........................................................................................................................................................................ 452 Ntile Continued ...................................................................................................................................................... 453 Ntile Percentile ....................................................................................................................................................... 454 Another Ntile Example .......................................................................................................................................... 455 Using Quartiles (Partitions of Four)....................................................................................................................... 456 NTILE Buckets ...................................................................................................................................................... 457 NTILE Using a Value of 10 ................................................................................................................................... 458 NTILE With a Partition.......................................................................................................................................... 459 Using LAG and LEAD........................................................................................................................................... 460 Using LEAD........................................................................................................................................................... 461
Table of Contents Using LEAD With and Offset of 2 ........................................................................................................................ 462 LEAD ..................................................................................................................................................................... 463 LEAD With Partitioning ........................................................................................................................................ 464 Using LAG ............................................................................................................................................................. 465 Using LAG With an Offset of 2 ............................................................................................................................. 466 LAG ........................................................................................................................................................................ 467 LAG with Partitioning............................................................................................................................................ 468 SUM(SUM(n)) ....................................................................................................................................................... 469 Chapter 15 - Working with Strings ........................................................................................................................... 471 The ASCII Function ............................................................................................................................................... 472 The CHAR Function .............................................................................................................................................. 473 The UNICODE Function ....................................................................................................................................... 474 The NCHAR Function ........................................................................................................................................... 475 The LEN Function.................................................................................................................................................. 476 The DATALENGTH Function ............................................................................................................................... 477 Concatenation ......................................................................................................................................................... 478 The RTRIM and LTRIM Command trims Spaces ................................................................................................ 479 The SUBSTRING Command................................................................................................................................. 480 Using SUBSTRING to move Backwards .............................................................................................................. 481 How SUBSTRING Works with a Starting Position of -1 ..................................................................................... 482 How SUBSTRING Works with an Ending Position of 0 ...................................................................................... 483 Concatenation and SUBSTRING........................................................................................................................... 484 SUBSTRING and Different Aliasing .................................................................................................................... 485 The LEFT and RIGHT Functions .......................................................................................................................... 486 Four Concatenations Together ............................................................................................................................... 487 The DATALENGTH Function and RTRIM.......................................................................................................... 488 A Visual of the TRIM Command Using Concatenation ........................................................................................ 489 CHARINDEX Function Finds a Letter(s) Position in a String ............................................................................. 490
Table of Contents The CHARINDEX Command is brilliant with SUBSTRING .............................................................................. 491 The CHARINDEX Command Using a Literal ...................................................................................................... 492 PATINDEX Function............................................................................................................................................. 493 PATINDEX Function to Find a Character Pattern ................................................................................................ 494 SOUNDEX Function to Find a Sound ................................................................................................................... 495 DIFFERENCE Function to Quantile a Sound ....................................................................................................... 496 The REPLACE Function ....................................................................................................................................... 497 LEN and REPLACE Functions for Number of Occurrences ................................................................................ 498 REPLICATE Function ........................................................................................................................................... 499 STUFF Function..................................................................................................................................................... 500 STUFF without Deleting Function ........................................................................................................................ 501 UPPER and lower Functions.................................................................................................................................. 502 Chapter 16 - Interrogating the Data ......................................................................................................................... 504 Quiz – What would the Answer be? ...................................................................................................................... 505 Answer to Quiz – What would the Answer be? ..................................................................................................... 506 The NULLIF Command ......................................................................................................................................... 507 Quiz – Fill in the Answers for the NULLIF Command ......................................................................................... 508 Answer– Fill in the Answers for the NULLIF Command ..................................................................................... 509 The COALESCE Command – Fill In the Answers ............................................................................................... 510 The COALESCE Answer Set ................................................................................................................................ 511 COALESCE is Equivalent to This CASE Statement ............................................................................................ 512 The Basics of CAST (Convert and Store).............................................................................................................. 513 Some Great CAST (Convert and Store) Examples ................................................................................................ 514 Some Great CAST (Convert and Store) Examples ................................................................................................ 515 A Rounding Example ............................................................................................................................................. 516 Quiz - CAST Examples .......................................................................................................................................... 517 Answer to Quiz - CAST Examples ........................................................................................................................ 518 Quiz - The Basics of the CASE Statements ........................................................................................................... 519
Table of Contents Answer to Quiz - The Basics of the CASE Statements ......................................................................................... 520 Using an ELSE in the Case Statement ................................................................................................................... 521 Using an ELSE as a Safety Net .............................................................................................................................. 522 Rules For a Valued Case Statement ....................................................................................................................... 523 Rules for a Searched Case Statement ..................................................................................................................... 524 Valued Case Vs. A Searched Case.......................................................................................................................... 525 Quiz - Valued Case Statement ............................................................................................................................... 526 Answer - Valued Case Statement........................................................................................................................... 527 Quiz - Searched Case Statement ............................................................................................................................ 528 Answer - Searched Case Statement ....................................................................................................................... 529 Quiz - When NO ELSE is present in CASE Statement ......................................................................................... 530 Answer - When NO ELSE is present in CASE Statement .................................................................................... 531 Quiz -When an Alias is NOT used in a CASE Statement ..................................................................................... 532 Answer -When an Alias is NOT used in a CASE Statement................................................................................. 533 Combining Searched Case and Valued Case ......................................................................................................... 534 A Trick for getting a Horizontal Case .................................................................................................................... 535 Nested Case ............................................................................................................................................................ 536 Put a CASE in the ORDER BY ............................................................................................................................. 537 Chapter 17 – Table Create and Data Types ............................................................................................................. 539 Creating a Database................................................................................................................................................ 540 Creating a Table that is a Heap .............................................................................................................................. 541 Heap Page ............................................................................................................................................................... 542 Extents .................................................................................................................................................................... 543 Creating a Table That Has a Clustered Index ........................................................................................................ 544 Clustered Index Page.............................................................................................................................................. 545 When Do I Create a Clustered Index? ................................................................................................................... 546 B-Trees ................................................................................................................................................................... 547 The Building of a B-Tree for a Clustered Index (1 of 3) ....................................................................................... 548
Table of Contents The Building of a B-Tree for a Clustered Index (2 of 3) ....................................................................................... 549 The Building of a B-Tree for a Clustered Index (3 of 3) ....................................................................................... 550 The Row Offset Array is the Guidance System For Every Row ........................................................................... 551 The Row Offset Array Provides Two Search Options (1 of 2) ............................................................................. 552 The Row Offset Array Provides Two Search Options (2 of 2) ............................................................................. 553 The Row Offset Array Helps With Inserts ............................................................................................................ 554 What is a Uniquefier?............................................................................................................................................. 555 Adding an Index ..................................................................................................................................................... 556 When Do I Create a Non Clustered Index? ........................................................................................................... 557 B-Tree for Non Clustered Index on a Clustered Table (1 of 2) ............................................................................. 558 B-Tree for Non Clustered Index on a Clustered Table (2 of 2) ............................................................................. 559 Adding a Non Clustered Index To A Heap ............................................................................................................ 560 B-Tree for Non Clustered Index on a Heap Table (1 of 2) .................................................................................... 561 B-Tree for a Non Clustered Index on a Heap Table (2 of 2) ................................................................................. 562 Default Values ........................................................................................................................................................ 563 Chapter 18 – View Functions ................................................................................................................................... 565 The Fundamentals of Views .................................................................................................................................. 566 Creating a Simple View to Restrict Sensitive Columns ........................................................................................ 567 Creating a Simple View to Restrict Rows ............................................................................................................. 568 Basic Rules for Views ............................................................................................................................................ 569 Two Exceptions to the ORDER BY Rule inside a View ....................................................................................... 570 Views sometimes CREATED for Row Security ................................................................................................... 571 Creating a View to Join Tables Together............................................................................................................... 572 You Select From a View ........................................................................................................................................ 573 Another Way to Alias Columns in a View CREATE ............................................................................................ 574 The Standard Way Most Aliasing is done ............................................................................................................. 575 What Happens When Both Aliasing Options Are Present .................................................................................... 576 Resolving Aliasing Problems in a View CREATE ............................................................................................... 577
Table of Contents Answer to Resolving Aliasing Problems in a View CREATE .............................................................................. 578 Aggregates on View Aggregates............................................................................................................................ 579 Altering a Table...................................................................................................................................................... 580 Altering a Table after a View has been Created .................................................................................................... 581 A View that Errors After an ALTER ..................................................................................................................... 582 Troubleshooting a View ......................................................................................................................................... 583 Loading Data through a View ................................................................................................................................ 584 Chapter 19 – Data Manipulation Language (DML) ................................................................................................. 586 INSERT Syntax # 1 ................................................................................................................................................ 587 INSERT Example with Syntax 1 ........................................................................................................................... 588 INSERT Syntax #2 ................................................................................................................................................. 589 INSERT Example with Syntax 2 ........................................................................................................................... 590 INSERT/SELECT Command ................................................................................................................................ 591 INSERT/SELECT Example using All Columns (*) .............................................................................................. 592 INSERT/SELECT Example with Less Columns ................................................................................................... 593 The UPDATE Command Basic Syntax ................................................................................................................. 594 Two UPDATE Examples ....................................................................................................................................... 595 Subquery UPDATE Command Syntax .................................................................................................................. 596 Example of Subquery UPDATE Command .......................................................................................................... 597 Join UPDATE Command Syntax .......................................................................................................................... 598 Example of an UPDATE Join Command .............................................................................................................. 599 The DELETE Command Basic Syntax .................................................................................................................. 600 Two DELETE Examples to DELETE ALL Rows in a Table ............................................................................... 601 To DELETE or to TRUNCATE ............................................................................................................................ 602 A DELETE Example Deleting only Some of the Rows ........................................................................................ 603 Subquery and Join DELETE Command Syntax .................................................................................................... 604 Example of Subquery DELETE Command ........................................................................................................... 605 MERGE INTO ....................................................................................................................................................... 606
Table of Contents MERGE INTO ....................................................................................................................................................... 607 Chapter 20 – Set Operators Functions ...................................................................................................................... 609 Rules of Set Operators ........................................................................................................................................... 610 INTERSECT Explained Logically......................................................................................................................... 611 INTERSECT Explained Logically......................................................................................................................... 612 UNION Explained Logically ................................................................................................................................. 613 UNION Explained Logically ................................................................................................................................. 614 UNION ALL Explained Logically ........................................................................................................................ 615 UNION ALL Explained Logically ........................................................................................................................ 616 EXCEPT Explained Logically ............................................................................................................................... 617 EXCEPT Explained Logically ............................................................................................................................... 618 Another EXCEPT Example ................................................................................................................................... 619 EXCEPT Explained Logically in Reverse Order................................................................................................... 620 An Equal Amount of Columns in both SELECT List ........................................................................................... 621 Columns in the SELECT list should be from the same Domain ........................................................................... 622 The Top Query handles all Aliases ........................................................................................................................ 623 The Bottom Query does the ORDER BY .............................................................................................................. 624 Great Trick: Place your Set Operator in a Derived Table..................................................................................... 625 UNION Vs UNION ALL ....................................................................................................................................... 626 Using UNION ALL and Literals ........................................................................................................................... 627 A Great Example of how EXCEPT works ............................................................................................................ 628 USING Multiple SET Operators in a Single Request............................................................................................ 629 Changing the Order of Precedence with Parentheses ............................................................................................ 630 Building Grouping Sets Using UNION ................................................................................................................. 631 Three Grouping Sets Using a UNION ................................................................................................................... 632 Chapter 21 – Stored Procedure Functions ................................................................................................................ 634 Creating a Stored Procedure .................................................................................................................................. 635
Table of Contents Executing a Stored Procedure ................................................................................................................................ 636 There are Three Ways to Execute a Stored Procedure .......................................................................................... 637 Creating a Stored Procedure with a CASE Statement ........................................................................................... 638 Our Answer Set ...................................................................................................................................................... 639 Dropping a Stored Procedure ................................................................................................................................. 640 Passing an Input Parameter to a Stored Procedure ................................................................................................ 641 Executing With Positional Parameter vs. Named Parameters ............................................................................... 642 Passing an Output Parameter to a Stored Procedure .............................................................................................. 643 Changing a Stored Procedure with an ALTER ...................................................................................................... 644 Answer Set for the Altered Stored Procedure ........................................................................................................ 645 Using a Stored Procedure to Delete a Row ............................................................................................................ 646 A Different Method to Delete a Row ..................................................................................................................... 647 Deleting a Row Using an Input Parameter ............................................................................................................ 648 Using Loops in Stored Procedures ......................................................................................................................... 649 Stored Procedure Workshop .................................................................................................................................. 650 Looping with a WHILE Statement ........................................................................................................................ 651 Chapter 22 – Statistical Aggregate Functions........................................................................................................... 653 The Stats Table ....................................................................................................................................................... 654 Above, is the Stats_Table data in which we will use in our statistical examples. ................................................. 654 The VAR and VARP Functions ............................................................................................................................. 655 A VAR Example .................................................................................................................................................... 656 A VARP Example .................................................................................................................................................. 657 The STDEV and STDEVP Functions .................................................................................................................... 658 A STDEV Example ................................................................................................................................................ 659 A STDEVP Example.............................................................................................................................................. 660 Chapter 23 – Systems Views .................................................................................................................................... 662 System Views ......................................................................................................................................................... 663
Table of Contents sys.all_columns ...................................................................................................................................................... 664 sys.all_objects ........................................................................................................................................................ 665 sys.all_sql_modules ............................................................................................................................................... 666 sys.all_views .......................................................................................................................................................... 667 sys.columns ............................................................................................................................................................ 668 sys.data_spaces....................................................................................................................................................... 669 sys.database_files ................................................................................................................................................... 670 sys.database_principals .......................................................................................................................................... 671 sys.database_role_members ................................................................................................................................... 672 sys.databases .......................................................................................................................................................... 673 sys.filegroups.......................................................................................................................................................... 674 sys.identity_columns .............................................................................................................................................. 675 sys.objects .............................................................................................................................................................. 676 sys.partition_range_values ..................................................................................................................................... 677 sys.schemas ............................................................................................................................................................ 678 sys.server_role_members ....................................................................................................................................... 679 sys.sql_logins ......................................................................................................................................................... 680 Chapter 24 – Nexus ................................................................................................................................................... 682 Nexus is Now Available on the Microsoft Azure Cloud ....................................................................................... 683 Nexus Queries Every Major System ...................................................................................................................... 684 Setup of Nexus is as easy as pie ............................................................................................................................. 685 Setup of Nexus is a Easy as 1, 2, 3 ........................................................................................................................ 686 Nexus Data Visualization ....................................................................................................................................... 687 Nexus Data Visualization ....................................................................................................................................... 688 Nexus Data Visualization Shows What Tables Can Be Joined ............................................................................. 689 Nexus is doing a Five-Table Join ........................................................................................................................... 690 Nexus Generates the SQL Automatically .............................................................................................................. 691 Nexus Delivers the Report ..................................................................................................................................... 692
Table of Contents Cross-System Joins from Teradata, Oracle and SQL Server ................................................................................. 693 The Tab of the Super Join Builder ......................................................................................................................... 694 The 9 Tabs of the Super Join Builder – Objects Tab 1 .......................................................................................... 695 Selecting Columns in the Objects Tab ................................................................................................................... 696 The 9 Tabs of the Super Join Builder – Columns Tab 2........................................................................................ 697 Removing Columns from the Report in the Columns Tab .................................................................................... 698 The 9 Tabs of the Super Join Builder – Sorting Tab 3 .......................................................................................... 699 The 9 Tabs of the Super Join Builder – Joins Tab 4 .............................................................................................. 700 The 9 Tabs of the Super Join Builder – Where Tab 5 ........................................................................................... 701 Using the WHERE Tab For Additional WHERE or AND .................................................................................... 702 The 9 Tabs of the Super Join Builder – SQL Tab 6............................................................................................... 703 The 9 Tabs of the Super Join Builder – Answer Set Tab 7 ................................................................................... 704 The 9 Tabs of the Super Join Builder – Analytics Tab 9 ....................................................................................... 705 Analytics Tab ......................................................................................................................................................... 706 Analytics Tab – OLAP Example ........................................................................................................................... 707 Analytics Tab – OLAP Example of SQL Generated ............................................................................................. 708 Analytics Tab – Grouping Sets Example ............................................................................................................... 709 Analytics Tab – Grouping Sets Answer Set .......................................................................................................... 710 Nexus Data Movement ........................................................................................................................................... 711 Moving a Single Table To a Different System ...................................................................................................... 712 The Single Table Data Movement Screen ............................................................................................................. 713 Moving an Entire Database To a Different System ............................................................................................... 714 The Database Mover Screen .................................................................................................................................. 715 The Database Mover Options Tab ......................................................................................................................... 716 Converting DDL Table Structures ......................................................................................................................... 717 Converting DDL Table Structures ......................................................................................................................... 718 Converting DDL Table Structures ......................................................................................................................... 719 Compare and Synchronize ..................................................................................................................................... 720 Compare Two Different Databases From Different Systems ................................................................................ 721
Table of Contents Comparisons Down to the Column Level .............................................................................................................. 722 The Results Tab...................................................................................................................................................... 723 View Differences ................................................................................................................................................... 724 Synchronizing Differences In the Results Tab ...................................................................................................... 725 Synchronizing Differences In the Results Tab ...................................................................................................... 726 Hound Dog Compression ....................................................................................................................................... 727 Hound Dog Compression on Teradata ................................................................................................................... 728 Hound Dog Compression on Teradata ................................................................................................................... 729
Chapter 1
Introduction to the Azure SQL Data Warehouse
Chapter 1
Introduction
Chapter 1 – Introduction to the Azure SQL Data Warehouse
“The man who has no imagination has no wings.” – Muhammad Ali
Page 1
Chapter 1
Introduction
Introduction to the Family of SQL Server Products Microsoft SQL Server Compact 4.0 Microsoft SQL Server Compact 4.0 is a compact database that is embedded inside Nexus and other desktops around the world. It is ideal for also embedding in web applications. SQL Server Compact 4.0 provides developers a common programming model with other SQL Server editions. This is important for developing both native and managed applications. SQL Server Compact provides outstanding flexibility, but in a small footprint.
SQL Server 2014 Express Edition Microsoft provides this for free! This powerful database engine is perfect for embedded applications or for redistribution with other solutions. Independent software vendors (ISV's) use it to build desktop applications. If you need support for greater than 10 GB databases, SQL Server Express is compatible with other editions of SQL Server.
SQL Server Standard Edition Microsoft's robust data management and business intelligence database is ideal for departments and small workgroups. It supports common development tools for both on premise and cloud applications. This edition enables effective database management with minimal IT resources and it is compatible with other editions.
Above, are the first three offerings from Microsoft on SQL Server. Page 2
Chapter 1
Introduction
Introduction to the Family Continued Microsoft SQL Server Web Edition Microsoft's Web edition is a low total-cost-of-ownership option to host Web applications that provides scalability, affordability, and manageability capabilities for small to large scale Web initiatives.
SQL Server 2014 Business Intelligence Edition Microsoft's Business Intelligence edition is for the BI intelligence community and delivers a comprehensive platform. This empowers organizations to build and deploy secure, scalable and manageable BI solutions. It has browser based data exploration and visualization, plus includes powerful integration capabilities.
SQL Server 2014 Enterprise Edition Microsoft's SQL Server 2014 Enterprise edition delivers high-end datacenter capabilities with performance that has been enhanced for virtualization, business intelligence and integration capabilities. This enables high service levels for mission-critical workloads and end user access to data insights.
Above, are the next three offerings from Microsoft on SQL Server. Page 3
Chapter 1
Introduction
Microsoft Azure SQL Data Warehouse Azure SQL Data Warehouse Microsoft's Azure SQL Data Warehouse is a massively parallel processing (MPP) data warehousing appliance built for any volume of relational data and provides integration to Hadoop. Azure SQL Data Warehouse can provide up to 100x performance gains over other SQL Server platforms. This is the MPP platform that provides linear scalability for when data volumes grow and the number of users increases.
Azure SQL Data Warehouse is designed to parallelize and distribute the processing across multiple Symmetric Multi-Processing (SMP ) compute nodes. Azure SQL Data Warehouse is only available as part of Microsoft’s Analytics Platform System (APS) appliance. Azure SQL Data Warehouse is a shared-nothing architecture, which means each processor has its own operating system, memory and set of disks. Nothing is shared! Data is “horizontally partitioned” across nodes. This means that each node has a subset of the rows from each table in the database. Each node is then responsible for processing only the rows on its own disks. Above is the information about Microsoft's Azure SQL Data Warehouse, which is Microsoft's MPP system
Page 4
Chapter 1
Introduction
Symmetric Multi-Processing (SMP) CPU
CPU
CPU
CPU
Cache
Cache
Cache
Cache
Bus
Shared Memory Disk I/O
A Symmetric Multi-Processing system has multiple processors for extra power, but these processors share a single operating system, memory pool and they share access to the disks. This is a great architecture for speed, similar to a restaurant that is quick and organized, but it lack the ability for unlimited expansion. When there are too many cooks in the kitchen you need an MPP system that scales many SMP systems together as one parallel processing data warehouse.
A Symmetric Multi-Processing (SMP) system is what Microsoft is known for in their SQL Server suite of products. The only product that does not use SMP design is the new Azure SQL Data Warehouse. It uses a Massively Parallel Design (MPP).
Page 5
Chapter 1
Introduction
What is Parallel Processing? “After enlightenment, the laundry” - Zen Proverb
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
Tera-Tom’s Parallel Processing Wash and Dry
“After parallel processing the laundry, enlightenment!” - Azure SQL Data Warehouse Zen Proverb
Two guys were having fun on a Saturday night when one said, “ I’ve got to go and do my laundry.” The other said, “What?!” The man explained that if he went to the laundromat the next morning, he would be lucky to get one machine and then would be there all day. But, if he went on Saturday night he could get all the machines and he could do all his wash and dry in two hours. Now that’s parallel processing mixed in with a little dry humor! Page 6
Chapter 1
Introduction
The Basics of a Single Computer CPU
Memory How are we doing on orders today?
Orders Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
How would I know? I'm just a disk. I need to transfer the block of data to the memory, and that is a slow process.
“When you are courting a nice girl, an hour seems like a second. When you sit on a red-hot cinder, a second seems like an hour. That’s relativity.”
– Albert Einstein
Data on disk does absolutely nothing. When data is requested, the computer moves the data one block at a time from disk into memory. Once the data is in memory, it is processed by the CPU at lightning speed. All computers work this way. The "Achilles Heel" of every computer is the slow process of moving data from disk to memory. The real theory of relativity is to find out how to get blocks of data from the disk into memory faster! Page 7
Chapter 1
Introduction
Data in Memory is Fast as Lightning CPU
Memory Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
Orders Order_No 100 200 300 400
Customer_No
Order_Date
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12347.53 8005.91 5111.47 15231.62
“You can observe a lot by watching.” – Yogi Berra
Once the data block is moved off of the disk and into memory, the processing of that block happens as fast as lightning. It is the movement of the block from disk into memory that slows down every computer. Data being processed in memory is so fast that even Yogi Berra couldn't catch it! Page 8
Chapter 1
Introduction
Parallel Processing of Data Distribution
Distribution
Memory
Memory
Order_Date
Order_Total
Cust_No
Order_Date
Order_Total
21345679 32456733 31323134 87323456
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12347.53 8005.91 5111.47 15231.62
34345699 41456543 51323154 67823486
01/01/2013 01/01/2013 01/01/2013 01/01/2013
13347.51 13005.91 7611.57 11671.92
Orders 21345679 32456733 31323134 87323456
Cust_No
Order_Date
87945679 98756733 35623134 97873456
Orders
Order_Date
Order_Total
Cust_No
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12347.53 8005.91 5111.47 15231.62
34345699 41456543 51323154 67823486
Order_Date
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Distribution
Memory
Cust_No
Cust_No
Distribution
Memory
Order_Total
Cust_No
Order_Date
Order_Total
8347.53 17005.91 3451.47 19871.62
44445679 32547733 57497134 87768956
01/01/2013 01/01/2013 01/01/2013 01/01/2013
12447.53 8055.66 5651.47 231.62
Order_Total
Cust_No
01/01/2013 01/01/2013 01/01/2013 01/01/2013
Orders
Order_Total 13347.51 13005.91 7611.57 11671.92
Cust_No 87945679 98756733 35623134 97873456
Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013
Orders 8347.53 17005.91 3451.47 19871.62
44445679 32547733 57497134 87768956
Order_Date 01/01/2013 01/01/2013 01/01/2013 01/01/2013
Order_Total 12447.53 8055.66 5651.47 231.62
"If the facts don't fit the theory, change the facts." -Albert Einstein
Big Data is all about parallel processing. Parallel processing is all about taking the rows of a table and spreading them among many parallel processing units. Above, we can see a table called Orders. There are 16 rows in the table. Each parallel processor holds four rows. Now they can process the data in parallel and be four times as fast. What Albert Einstein meant to say was, “If the theory doesn't fit the dimension table, change it to a fact."
Page 9
Chapter 1
Introduction
A Table has Columns and Rows Emp_No Dept_No First_Name 100 1001 Rafael 200 1002 Maria 300 1003 Charl 400 1004 Kyle 400 1005 Rob 300 1006 Inna 200 1007 Sushma 100 1008 Mo 300 1009 Mo Distribution
Distribution
Last_Name Salary Minal 90000 Gomez 80000 Kertzel 70000 Stover 60000 Rivers 50000 Kinski 50000 Davis 50000 Khan 60000 Swartz 70000 Distribution
Employee_Table 1001 100 Rafael
Employee_Table Employee_Table 80000 Maria Gomez Minal 90000 1002 200 1003 300 Charl Kertzel 70000
1004 400 Kyle
Stover 60000 1005 400 Rob
1007 200 Sushma Davis 50000 1008 100 Mo
Rivers 50000 1006 300 Inna Kinski 50000 Khan
60000 1009 300 Mo Swartz 70000
The table above has 9 rows. Our small system above has three parallel processing units called distributions. Each distribution holds three rows. There are eight distributions per node. A four node system will have 32 distributions. Double your nodes and double your speed and power. The idea of parallel processing is to take the rows of a table and spread them across the distributions so each distribution can process their portion of the data in parallel. Page 10
Chapter 1
Introduction
The Azure SQL Data Warehouse has Linear Scalability MPP Engine
Infiniband Network
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
Infiniband Network
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
"A Journey of a thousand miles begins with a single step ."
- Lao Tzu
The Azure SQL Data Warehouse was born to be parallel. With each query, a single step is performed in parallel by each distribution. An Azure SQL Data Warehouse system consists of a series of distributions that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Azure SQL Data Warehouse system provides you with an excellent Return On Investment (ROI), then continue to invest by purchasing more nodes (adds additional Distributions). Most companies start small, but after seeing what an Azure SQL Data Warehouse can do, they continue to grow their ROI from the single step of implementing an Azure SQL Data Warehouse system to millions of dollars in profits. Double your compute nodes and double your speeds….Forever. The Azure SQL Data Warehouse actually provides a journey of a thousand smiles! Page 11
Chapter 1
Introduction
The Architecture of the Azure SQL Data Warehouse The MPP Engine manages the distribution of data and builds the plan for the nodes to follow.
MPP Engine
Azure SQL Data Warehouse Node 1 D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
Azure SQL Data Warehouse Node n D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
“Be the change that you want to see in the world.” - Mahatma Gandhi
The MPP Engine is the brains behind the entire operation. The user logs into the MPP Engine, and for each SQL query, the MPP Engine will come up with a plan to retrieve the data. It passes that compiled plan to each compute node, and each of 8 Distributions process their portion of the data. If the data is spread evenly, parallel processing works perfectly. This technology is relatively inexpensive. It might not "be the change", but it will help your company "keep the change" because costs are low. Microsoft's Azure SQL Data Warehouse uses both SMP and MPP technology. Each node is an SMP, but then many nodes are lined together to become one big MPP system. Page 12
Chapter 1
Introduction
Nexus is now Available on the Microsoft Azure Cloud
Why the Nexus Chameleon should be your query tool of choice: 1) Queries every major system 2) Provides visualization and automatically writes the SQL 3) Can perform cross-system joins with a few clicks of the mouse 4) Converts table structures and moves the table and data between systems 5) Compares and synchronizes databases 6) Can move an entire database of tables or views between systems 7) Has the "Garden of Analysis" to re-query answer sets inside your PC 8) Provides a dashboard of graphs and charts for answer sets
Download the Nexus for a free trial at www.CoffingDW.com and use Nexus in-house or on the Microsoft Azure cloud. Page 13
Chapter 1
Introduction
The MPP Engine is the Optimizer Control Rack Management Node Active/Passive
The control node receives all queries and then the MPP Engine (Optimizer):
Control Node Active/Passive User Queries
1.
Parses the SQL text.
2.
Validates and authorizes that all objects exist and that the user has the right access rights.
3.
Builds a plan for the nodes to follow.
4.
Runs the MPP execution plan by executing SQL SELECT commands in parallel on each compute node.
5.
Gathers and merges all the parallel result sets from the compute nodes.
6.
Returns a single result set to the client.
SQL SQL
Landing Zone
Backup Node
The brains behind all user queries lie in the MPP Engine. The MPP Engine receives the query, checks the syntax and the security and then comes up with a plan for the nodes to follow. Page 14
Chapter 1
Introduction
The Azure SQL Data Warehouse System Control Rack
Data Rack
Management Node Active/Passive
Active Server
Dedicated Storage SQL
SQL
Control Node Active/Passive User Queries
SQL
SQL
SQL SQL
SQL
Landing Zone SQL
Data Loading
SQL
Backup Node Passive Server SQL
Data Backup
Dual Infiniband
Dual Fibre Channel
Above, is a pictorial of a Azure SQL Data Warehouse system. There is one Control Rack and many Data Racks.
Page 15
Chapter 1
Introduction
The Azure SQL Data Warehouse System is Scalable Control Rack
Data Rack
Management
Active
Storage
Data Rack
Active
Storage
Control Node
Landing Zone
Backup Node
Passive
Passive
The Azure SQL Data Warehouse will take up at least two full racks of space, and you can add storage and compute capacity one data rack at a time. A data rack will contain between 8 to 10 compute servers. A great asset about the Azure SQL Data Warehouse is that it works on a wide variety of hardware. Vendors such as Bull, Dell, HP, and IBM provide the hardware, and Fibre Channel storage arrays come from vendors like EMC, HP, and IBM. The control node controls the physical servers and guides them to work together, in parallel. It is the control node that acts as the optimizer and it accepts client query requests, and then creates the plan. It will then call upon one or more compute nodes to execute different parts of the query, often in parallel. The result set is then sent back to the user. Page 16
Chapter 1
Introduction
The Control Node Control Rack
The Azure SQL Data Warehouse control node allows the users to connect and query the Azure SQL Data Warehouse database.
Management Node Active/Passive
It is the control node that comes up with a parallel plan for the nodes to follow to retrieve query results. The control node has an instance of the SQL Server 2014 database for storing metadata.
Control Node Active/Passive SQL SQL
Landing Zone
Backup Node
The control node is also responsible for all intermediate query results in TempDB. The control node receives the results of intermediate query results from multiple compute nodes and store those results in SQL Server temporary tables, then merges those results into a single result set for final delivery to the client. The control node is an active/passive cluster server. Plus, there's a spare compute node for redundancy and failover capability.
Think of the control node as the optimizer, or a conductor in the Azure SQL Data Warehouse orchestra of servers. Page 17
Chapter 1
Introduction
The Data Rack Data Rack Active Server
Dedicated Storage SQL
The data rack of the Azure SQL Data Warehouse contains 8 to 10 compute nodes along with their related storage nodes, depending on the hardware vendor.
SQL
SQL
SQL
Each compute node is a physical server that runs a standalone SQL Server 2014 relational engine instance.
SQL
SQL
SQL
Passive Server SQL
Dual Infiniband
Dual Fibre Channel
The storage nodes are Fibre Channel-connected storage arrays containing 10 to 12 disk drives.
Above, is a pictorial of an Azure SQL Data Warehouse Data Rack. This is where the data is stored and the parallel processing magic occurs. Page 18
Chapter 1
Introduction
The Landing Zone Control Rack Management Node Active/Passive
This Landing Zone node is used to load data directly to the Azure SQL Data Warehouse. The load utility named dwloader is used for highspeed parallel loading of large data files into databases.
Control Node Active/Passive SQL SQL
Landing Zone
The brilliance of this design is that there is minimal impact to concurrent queries executing on the Azure SQL Data Warehouse. With this utility, data from a disk or SQL Server Integration Services (SSIS) pipeline can be loaded, in parallel, to all compute nodes.
Backup Node
This Landing Zone node is used to load data directly to the Azure SQL Data Warehouse. Page 19
Chapter 1
Introduction
The Backup Node Control Rack Management Node Active/Passive
Control Node Active/Passive SQL SQL
Landing Zone
The Azure SQL Data Warehouse backup node is used for backing up user databases. These databases are physically spread across all compute nodes and their related storage nodes. When backing up a user database, each compute node backs up, in parallel, its portion of the database.
Backup Node
The Azure SQL Data Warehouse Backup node is used for backing up user databases.
Page 20
Chapter 1
Introduction
Software as a Service (SaaS) and the Elastic Database MPP Engine
Infiniband Network
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
Infiniband Network
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
D I S T
Software-as-a-service (SaaS) applications need versatility and flexibility and need the ability to focus on end-user solutions and not have to worry about managing databases, schemas and sizing requirements. The Microsoft Azure SQL Data Warehouse is designed to provide flexibility for the growth needed at the time. Different workloads that continually change over time have unpredictable database resource consumption. That is why the elastic database model provides users with the ability to pool resources to be leveraged among a single or groups of databases. The idea of using the resources you need, when you need them, and not having to worry about provisioning and predicting the unpredictable is priceless. The most important thing a system can do for its users is to give them the space they need. Page 21
Chapter 1
Introduction
Azure Data Lake Azure SQL Data Warehouse
Traditional on-premise systems
Structured Data Azure Data Lake
HDFS File System SQL Server Cloud systems
Structured Data
Hadoop
Semi-Structured Data
Unstructured Data Hortonworks
Cloudera
The future of computing is data! Not data on any particular system or structure, but simply data that resides anywhere in the enterprise. That is why Microsoft has created the Azure SQL Data Warehouse to store relational data in the cloud and the Azure SQL Data Lake unstructured data. A data lake is comprised of raw data that can be all types of data in its native format. This includes all data types and can consist of traditional structured data, unstructured data and semi-structured data. The Azure Data Lake is a data store for big data analytics. This great idea gives users the ability to have the best of all data worlds, thus mixing on-premise traditional systems with Hadoop HDFS file systems. In the Azure data lake, there are plenty of fish in the sea because this lake can store every type of data with no fixed limits on account size or file size. The Azure Data Lake is a Hadoop File System compatible with HDFS. It is integrated with Azure HDInsight , Revolution-R Enterprise, Hortonworks and Cloudera. Page 22
Chapter 1
Introduction
Azure Disaster Recovery Node 1
Node n
Order_Table Sales_Table Student_Table
Order_Table Sales_Table Student_Table
Course_Table Cust_Table Claims_Table
…
Course_Table Cust_Table Claims_Table
The Microsoft Azure cloud provides data availability with built-in replicas and a competitive 99.99% Service Level Agreement at the database level. Instead of worrying about a disaster, you can count on an estimated 360x lower disaster recovery objectives. Microsoft uses something called active geo-replication, which gives users the ability to create up to 4 readable secondaries in any Microsoft Azure region, and additionally they give users control when and where to failover. Currently, users have up to 35 days of backups available for recovery. Page 23
Chapter 1
Introduction
Security and Compliance Data Rack
Active
Storage
Stop! Who goes there?
Passive
The Microsoft Azure cloud provides security and compliance-related tasks through a wide variety of features. Users can implement database level security such as database auditing, views, row-level security, data masking and encryption. And of course Microsoft Azure has independently verified cloud security and compliance through key cloud auditors as part of the scope of key Azure compliance certifications and approvals such as HIPAA BAA, E.U. Model Clauses, ISO/IEC 27001:2005 and FedRAMP. Some companies actually feel safer on the cloud than in their own on-premises data centers. Page 24
Chapter 1
Introduction
How to Get an EXPLAIN Plan To EXPAIN a Query just: 1. Type EXPLAIN 2. Or Press F6 3. Or click the Magnifying Glass on Nexus
EXPLAIN SELECT * FROM Employee_Table;
This EXPLAIN plan shows we are utilizing a system with 2 nodes on a table that has spread the rows across 16 distributions
SELECT * FROM Employee_Table
SELECT [T1_1].[Employee_No] AS [Employee_No], [T1_1].[Dept_No] AS [Dept_No], [T1_1].[Last_name] AS [Last_name], [T1_1].[First_name] AS [First_name], [T1_1].[Salary] AS [Salary] FROM [SQL_CLASS].[dbo].[Employee_table] AS T1_1
You can get an explain by placing the keyword EXPLAIN in front of any SQL. You can also hit the function key 6 (F6). If you are using the Nexus Chameleon you can click on the magnifying glass near the EXECUTE button.
Page 25
Chapter 2
Page 26
The Azure SQL Data Warehouse Table Structures
Chapter 2
The Azure SQL Data Warehouse Table Structures
Chapter 2 – The Azure SQL Data Warehouse Table Structures
“Let me once again explain the rules. The Azure SQL Data Warehouse Rules!” - Tera-Tom Coffing
Page 27
Chapter 2
The Azure SQL Data Warehouse Table Structures
The 5 Concepts of Azure SQL Data Warehouse Tables 1. Tables are either Distributed by Hash or Replicated
2. The rows of a table are either sorted or unsorted 3. Tables are stored physically on disk in either a row or columnar Format 4. Tables can be partitioned 5. Tables are either permanent, temporary or external Tables
Above, are some basics about concepts for Azure SQL Data Warehouse tables. The next five pages will cover each point one at a time. This will allow you to see exactly what is going on immediately. Page 28
Chapter 2
The Azure SQL Data Warehouse Table Structures
Tables are Either Distributed by Hash or Replicated (1 of 5) Distribution
Distribution
Distribution
Memory
Memory
Memory
Hashed Each Distribution holds different rows. Each row is hashed by the values in a certain column, such as Employee_No
1 4 7 11
Joel Davis Rick Jahns Lynn Meyer Seth Rogers
2 5 8 12
Mary Lewis John Miller Rich Jones Kyle Watson
3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily
Replicated Each Node holds all rows of a table. The table is literally duplicated on each and every node.
100 200 300 400
Sales Marketing Finance HR
100 200 300 400
Sales Marketing Finance HR
100 200 300 400
Sales Marketing Finance HR
The Azure SQL Data Warehouse gives you two choices for table distribution. These choices are either Hash or Replicated. Large fact tables are usually hashed and smaller tables are usually replicated. When a table is hashed, one of the columns is chosen as the distribution key. In our example above, the Employee_Table (top) is hashed by the Employee_No. The Replicated table (bottom) only has four rows in it and all four rows are on each Node. Page 29
Chapter 2
The Azure SQL Data Warehouse Table Structures
Table Rows are Either Sorted or Unsorted (2 of 5) This table is sorted because it was created with a Clustered Index on Employee_No
Sorted
Employee_No Dept_No Last_Name
First_Name
Salary
1001
100
Rafael
Minal
90000
1004
400
Kyle
Stover
60000
1007
200
Sushma
Davis
50000
1020
200
May
Jones
60000
This table is unsorted (heap) because it was NOT created with a Clustered Index
Not Sorted
Employee_No Dept_No Last_Name
First_Name
Salary
1001 1007
100 200
Rafael Sushma
Minal Davis
90000 50000
1020
200
May
Jones
60000
1004
400
Kyle
Stover
60000
The rows of a table are either sorted or unsorted. If the table has a clustered index it is sorted, but if it does not have a clustered index then it is unsorted, which is referred to as a heap. You can only have one clustered index per table because you can only sort a table one way. Sorting has nothing to do with a distribution key or a replicated table, but once the rows are placed on a distribution they are then either sorted (clustered index) or unsorted (heap). Page 30
Chapter 2
The Azure SQL Data Warehouse Table Structures
Tables are Stored in Either Row or Columnar Format (3 of 5) Distribution
Distribution
Distribution
Memory
Memory
Memory
Employee_Row_Based
Employee_Row_Based
Employee_Row_Based
Employee_Columnar
Employee_Columnar
Employee_Columnar
A table is stored in either a row format or a columnar format. Traditionally, most systems have always stored the rows of a table in a row format (row store). When a query is run on the table the entire block of rows must be moved from disk into memory, where they are processed. This works well when all columns (or most columns) are needed to satisfy the query. Modern designs of computer systems will often now include a column format (column store). This works extremely well on queries that don't need all columns (or most columns) to satisfy the query, such as analytics, aggregations, etc. Only the columns needed will then be transferred from disk into memory. The Azure SQL Data Warehouse gives you a choice. Page 31
Chapter 2
The Azure SQL Data Warehouse Table Structures
Tables can be Partitioned (4 of 5) CREATE TABLE Ord_Tbl_Part ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) WITH ( DISTRIBUTION = HASH (Order_Number), PARTITION ( Order_Date RANGE RIGHT FOR VALUES ( '2015-01-01','2015-02-01','2015-03-01','2015-04-01', '2015-05-01','2015-06-01','2015-07-01','2015-08-01' ,'2015-09-01','2015-10-01','2015-11-01','2015-12-01' )));
Distribution 1
Distribution 2
Distribution 3
Distribution 4
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
01
JAN
JAN
JAN
JAN
02 03
FEB
FEB
FEB
FEB
MAR
MAR
MAR
MAR
12
DEC
DEC
DEC
DEC
Above, is the CREATE statement for the Ord_Tbl_Part table. This table is a rowstore table that is partitioned by Order_Date. By using RANGE RIGHT and dates for the boundary values, it puts a month of data in each partition. The distributions each hold different rows, but store each month in their own block(s). This physical partitioning allows for faster loads and faster maintenance (Insert, Update, Deletes). This is the design you want when users are performing range queries on dates. Page 32
Chapter 2
The Azure SQL Data Warehouse Table Structures
There are Permanent, Temporary and External Tables (5 of 5) Permanent Tables – These tables reside permanently and only a DROP or TRUNCATE statement removes them. Temporary Tables – These tables reside temporarily on the system. Here is more information:
• • •
Global Temp tables are not supported on the Azure SQL Data Warehouse When creating TEMP Table you must specify LOCATION=USER_DB Creating NON CLUSTERED indexes are not supported on temp tables
External Tables – These tables point to data in a Hadoop cluster or Azure blob storage. External tables are used most often to: • •
Query Hadoop data from within the Azure SQL Data Warehouse. Import and store Hadoop data into the Azure SQL Data Warehouse by using the CREATE TABLE AS SELECT statement.
The Azure SQL Data Warehouse utilizes permanent tables for permanent data, temporary tables for temporary information and external tables in order to query Hadoop and blobs.
Page 33
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creating a Table With a Distribution Key CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (DISTRIBUTION = HASH (Employee_No)) ; Distribution Memory
Distribution Memory
Distribution Memory
Hashed Each Distribution holds different rows. Each row is hashed by the values in a certain column, such as Employee_No
1 4 7 11
Joel Davis Rick Jahns Lynn Meyer Seth Rogers
2 5 8 12
Mary Lewis John Miller Rich Jones Kyle Watson
3 Tony Brady 6 Lana Payne 9 Lorie Stewart 13 Dawn Daily
Above, is a basic TABLE CREATE STATEMENT for a table with a Distribution Key. You can only use one column as the Distribution Key in the Azure SQL Data Warehouse. The values in this column will be hashed with a hashing formula and used to distribute the rows of the table across the Distributions. Picking a good key is essential. An excellent Distribution Key will allow for even distribution among the many distributions. Page 34
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creating a Table that is Replicated CREATE TABLE Dept_Intl ( Dept_No INTEGER ,Department_Name VARCHAR(30) ) WITH (DISTRIBUTION = REPLICATE) ; Node 1
Node 2
Node 3
Memory
Memory
Memory
Replicated Each Node holds all rows of a table. The table is literally Duplicated on each and every node.
100 200 300 400
Sales Marketing Finance HR
100 200 300 400
Sales Marketing Finance HR
100 200 300 400
Sales Marketing Finance HR
Above, is a basic TABLE CREATE STATEMENT for a table that is replicated across all nodes. That means that the entire table with every row is copied to each and every node. This should be done for relatively small tables because you are in essence duplicating the table on each node. This is done so when a join is performed between this Dept_Intl table and a large Emp_Intl table, the matching rows will be Distribution Local. This means the matching rows are already on the same node and therefore will not have to be shuffled across nodes to make the join happen. Page 35
Chapter 2
The Azure SQL Data Warehouse Table Structures
Distributed by Hash vs. Replication Node Memory
Node Memory 100 200 300 400
Sales Marketing Finance HR
Replicated tables are stored once on each node, in a convenient place for querying and for joining to other tables
Tables distributed by hash are generally stored across all 8 distributions on each node, thus taking advantage of I/O parallelism
Each node has eight distributions and each distribution has its own set of disks. So, think of this as each node having at least eight disks to place the table rows that it owns. If a table was small, then a node might have all of the rows it owns in a single distribution. This is often the case with a table that is replicated. If a table is huge, then a node might have rows stored in all eight distributions, which is often the case for tables distributed by hash. Page 36
Chapter 2
The Azure SQL Data Warehouse Table Structures
The Concept is All About the Joins HASH Providers Dimension Table Provider_Code Provider_Name P_Address P_City P_State P_Zip P_Error_Rate
Services Dimension Table Service_Code Service_Desc Service_Pay
REPLICATED
Claims Fact Table Claim_Id Claim_Date Claim_Service Subscriber_No Member_No Claim_Amt Provider_No
Hash by Claim_Id
Subscribers Dimension Table Subscriber_No Member_No Last_Name First_Name Gender SSN
Addresses Dimension Table Subscriber_No Street City State Zip AreaCode Phone
The Azure SQL Data Warehouse gives you two choices for table distribution. These choices are either hash or replicated. Large fact tables are usually hashed and smaller tables are usually replicated. The bottom line is that an Azure SQL Data Warehouse needs for two joining rows to be on the same Node. That is why in a 5-table join, an Azure SQL Data Warehouse will join two tables at a time. If tables are replicated, then they are always on the same node as the rows they join. That is why a large Fact table will often be distributed by hash and the smaller tables it joins to will be replicated. The setup of tables on MPP systems are all about the joins. Page 37
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creation of a Hash Distributed Table with a Clustered Index Claims Fact Table Claim_Id Claim_Date Claim_Service Subscriber_No Member_No Claim_Amt Provider_No
Hash by Claim_Id
CREATE TABLE Claims ( Claim_ID int NOT NULL ,Claim_Date int NOT NULL ,Claim_Service int NOT NULL ,Subscriber_No int NOT NULL ,Member_No int NOT NULL ,Claim_Amt decimal(18,2) NOT NULL ,Provider_No int NOT NULL ) WITH ( CLUSTERED INDEX(Claim_Date), DISTRIBUTION = HASH(Claim_ID));
Above, is the CREATE statement for the Claims table. This has a DISTRIBUTION=Hash on Claim_ID. It also has a clustered index on Claim_Date. That means that each node will sort the rows by Claim_Date. This is excellent for range queries. Users will often look up claims based on a time frame, such as per day, week, month, quarter or year. Page 38
Chapter 2
The Azure SQL Data Warehouse Table Structures
A Clustered Index Sorts the Data Stored on Disk Node 1 Data is Sorted on disk with a Clustered Index on Claim_Date
Node n Data is Sorted on disk with a Clustered Index on Claim_Date
1/1/2014 1/2/2014 1/3/2014
1/4/2014 1/5/2014 1/6/2014
1/7/2014 1/8/2014 1/9/2014
1/10/2014 1/11/2014 1/12/2014
1/1/2014 1/2/2014 1/3/2014
1/4/2014 1/5/2014 1/6/2014
1/7/2014 1/8/2014 1/9/2014
1/10/2014 1/11/2014 1/12/2014
1/13/2014 1/14/2014 1/15/2015
1/16/2014 1/17/2014 1/18/2014
1/19/2014 1/20/2014 1/21/2014
1/22/2014 1/23/2014 1/24/2014
1/13/2014 1/14/2014 1/15/2015
1/16/2014 1/17/2014 1/18/2014
1/19/2014 1/20/2014 1/21/2014
1/22/2014 1/23/2014 1/24/2014
1/25/2014 1/26/2014 1/27/2014
1/28/2014 1/29/2014 1/30/2014
1/31/2014 2/1/2014 2/2/2014
2/3/2014 2/4/2014 2/5/2014
1/25/2014 1/26/2014 1/27/2014
1/28/2014 1/29/2014 1/30/2014
1/31/2014 2/1/2014 2/2/2014
2/3/2014 2/4/2014 2/5/2014
2/6/2014 2/7/2014 2/8/2014
2/9/2014 2/10/2014 2/11/2014
2/12/2014 2/13/2014 2/14/2014
2/15/2014 2/16/2014 2/17/2014
2/6/2014 2/7/2014 2/8/2014
2/9/2014 2/10/2014 2/11/2014
2/12/2014 2/13/2014 2/14/2014
2/15/2014 2/16/2014 2/17/2014
2/18/2014 2/19/2014 2/20/2015
2/21/2014 2/22/2014 2/23/2014
2/24/2014 2/25/2014 2/26/2014
2/27/2014 2/28/2014 3/1/2014
2/18/2014 2/19/2014 2/20/2015
2/21/2014 2/22/2014 2/23/2014
2/24/2014 2/25/2014 2/26/2014
2/27/2014 2/28/2014 3/1/2014
A Clustered Index is created to command the Azure SQL Data Warehouse to sort the actual data on disk according to the sorted order of the column values. Each table can have only one clustered index at the same time. For distributed tables, a clustered index affects the way data is stored within each distribution across the nodes, however, it does not affect which rows are assigned to each distribution. For replicated tables, the clustered index affects the way the data is stored within each replicated table, however, it does not affect where the replicated tables are stored. A clustered index sorts the data on disk which is very important for range queries. Above, we created a Clustered Index on Order_Date, so now a full table scan won't be needed for all queries. Page 39
Chapter 2
The Azure SQL Data Warehouse Table Structures
Each Node Has 8 Distributions Node Memory
Each node has eight distributions and each distribution has its own set of disks. So, think of this as each node having at least eight disks to place the table rows that it owns. Better yet, think of this as each compute node having eight parallel processes (called distributions) with each parallel process having its own dedicated disk. Page 40
Chapter 2
The Azure SQL Data Warehouse Table Structures
How Hashed Tables are Stored Among a Single Node Node Memory Addresses Table rows Subscribers Table rows
In a perfect distribution each hashed table is distributed evenly across all eight distributions
Providers Table rows Services Table rows
Claims Table rows
Think of this node as eight parallel processes simultaneously processing the rows of a table that they own
Each node has eight distributions and each distribution has its own set of disks. So, think of this as each node having at least eight disks to place the table rows that it owns. If a table was small, then a node might have all of the rows it owns in a single distribution. If a table is huge, then a node might have rows stored in all eight distributions. Page 41
Chapter 2
The Azure SQL Data Warehouse Table Structures
Hashed Tables Will Be Distributed Among All Distributions Node
Node
Node
Node
Memory
Memory
Memory
Memory
Above, we see four nodes and each node has eight distributions for a total of 32 distributions. We also see our five tables. Each table is hashed (in this example) and each table has spread different rows across all 32 distributions. All five tables above are row based tables. Page 42
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creation of a Replicated Table
Addresses Dimension Table Subscriber_No Street City State Zip AreaCode Phone
CREATE TABLE Addresses ( Subscriber_No INTEGER ,Street VARCHAR(30) ,City VARCHAR(20) ,State CHAR(2) ,Zip INTEGER ,AreaCode SMALLINT ,Phone INTEGER ) WITH (DISTRIBUTION = REPLICATE);
Above, is the CREATE statement for the Addresses table. This has a DISTRIBUTION=Hash on REPLICATE. This table's data will be duplicated on each node in its entirety.
Page 43
Chapter 2
The Azure SQL Data Warehouse Table Structures
How Replicated Tables are Stored Among a Single Node Node Memory Addressees Table rows
Subscribers Table rows Providers Table rows
Replicated tables are stored only once per node
Services Table rows
Claims Table rows
Each node has eight distributions and each distribution has its own set of disks. So, think of this as each node having at least eight disks to place the table rows that it owns. Replicated tables are duplicated in their entirety across each node. If a table has 20 rows and there are 4 nodes in the system then each node has the same 20 rows. The rows for a replicated table store all the rows only once per node, but the Azure SQL Data Warehouse actually spreads those 20 rows across all eight distributions. This is done using file groups. Above, it appears that the entire table is only on one of the nodes disk, but that is just to illustrate that the entire table is copied only once per node. Page 44
Chapter 2
The Azure SQL Data Warehouse Table Structures
Replicated Table will be Duplicated among Each Node Node
Node
Node
Node
Memory
Memory
Memory
Memory
Above, we see four nodes and each node has eight distributions for a total of 32 distributions. We also see four tables. Each table is replicated so each table is thus duplicated across a node one time. All four tables above are row based tables. Page 45
Chapter 2
The Azure SQL Data Warehouse Table Structures
Distributed by Replication Node 1 Memory
Node 2 Memory
Node 3 Memory
Node 4 Memory
Addresses
Addresses
Addresses
Addresses
With Replication, a table is copied in its entirety to every Azure SQL Data Warehouse compute node. Is this duplicating the table and data across each compute node? Yes! Why in the world would anyone do this? For one reason, The joins! For two rows to be joined they need to be on the same compute node. When the Addresses table joins to the Subscriber table, the replication of the Addresses table will guarantee that the matching rows to the Subscribers will be on the same compute node. Take good advice here and replicate all small table that join to larger tables. Page 46
Chapter 2
The Azure SQL Data Warehouse Table Structures
How Hashed and Replicated Tables Work Together Node Memory Addresses Table rows Subscribers Table rows
Replicated tables are stored only once per node
Providers Table rows Services Table rows
The Hashed table is distributed evenly across all eight distributions
Claims Table rows
The Fact table (Claims), which is large, will be spread across all eight distributions. The dimension tables (Addresses, Subscribers, Providers and Services) are replicated once on the node. This will allow for easy joining among the five tables. Page 47
Chapter 2
The Azure SQL Data Warehouse Table Structures
Tables are Stored as Row-based or Column-based Node
Node
Node
Employee_Row_Based
Employee_Row_Based
Employee_Row_Based
Employee_Columnar
Employee_Columnar
Employee_Columnar
Column Segments
Column Segments
Column Segments
Above, is a picture of the same table stored as a row-based (top) and column-based design. Notice that either way the node gets the entire row, but the Azure SQL Data Warehouse gives you the option of storing it in either a rowbased or column-based design. When a query select all columns in a table the row-based storage if faster, however for queries that only select a few columns the column-based storage if faster. The column-based storage has advanced compression opportunities that save a great deal of space. Page 48
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creation of a Columnar Table that is Hashed CREATE TABLE Sales_Columnar_Hashed ( Product_ID int NOT NULL, Sale_Date date, Daily_Sales decimal(9,2) ) WITH ( DISTRIBUTION = HASH(Product_ID), CLUSTERED COLUMNSTORE INDEX ); Distribution 1
Distribution 2
Distribution n
Above, is the CREATE statement for the Sales_Columnar_Hashed table. This table is a columnstore table that is hashed by the Product_ID column. The table has nine rows and three columns. The rows are hashed and the entire row is placed on a distribution, but then it is stored in separate columns. The idea is that when a query is run that can be satisfied by using only one or two of the columns, then the system only has to move that one or two columns from disk to memory. Page 49
Chapter 2
The Azure SQL Data Warehouse Table Structures
How Hashed Columnar Tables are Stored on a Single Node Node Memory In a perfect distribution each hashed table is distributed evenly across all eight distributions Addresses Table rows Subscribers Table rows Claims Table rows
A Columnar store will store each column in its own page. This is sometimes 10X faster for certain queries with 3X compression.
The Addresses table has four columns in it. The Subscribers table has five columns and the Claims table has nine columns. Each column is stored in its own page.
Page 50
Chapter 2
The Azure SQL Data Warehouse Table Structures
How Hashed Columnar Tables are Stored on All Distributions Node
Node
Node
Node
Memory
Memory
Memory
Memory
Above, we see four nodes and each node has eight distributions for a total of 32 distributions. The Addresses table has four columns in it. The Subscribers table has five columns, and the Claims table has nine columns. All 32 distributions hold a portion of each table, and each table stores each column in a separate page. Page 51
Chapter 2
The Azure SQL Data Warehouse Table Structures
Comparing Normal Table Vs. Columnar Tables Distribution
Employee_Normal Emp_No
Dept_No First_Name
1001 1004 1007
100 Rafael 400 Kyle 200 Sushma
Last_Name
Salary
Minal Stover Davis
90000.00 60000.00 50000.00
Employee_Columnar
Emp_No 1001 1004 1007
Dept_No 100 400 200
First_Name
Last_Name
Salary
Rafael Kyle Sushma
Minal Stover Davis
90000.00 60000.00 50000.00
Above, is a picture of the same table stored as a row-based (top) and column-based design. Notice that either way the node gets the entire row, but the Azure SQL Data Warehouse has the option of storing it in either a row-based or column-based design. Page 52
Chapter 2
The Azure SQL Data Warehouse Table Structures
Columnar can move just One Segment to Memory Distribution
Memory
Emp_No 1001 1004 1007
SELECT Emp_No FROM Employee_Columnar ;
Query
Employee_Columnar Emp_No 1001 1004 1007
Page 53
Dept_No 100 400 200
First_Name
Last_Name
Salary
Rafael Kyle Sushma
Minal Stover Davis
90000.00 60000.00 50000.00
Chapter 2
The Azure SQL Data Warehouse Table Structures
Segments on Distributions are Aligned to Rebuild a Row Distribution
Memory
Emp_No 1001 1004 1007
What if the query needed two columns?
Salary SELECT Emp_No, Salary FROM Employee_Columnar ;
90000.00 60000.00 50000.00
Employee_Columnar Emp_No 1001 1004 1007
Page 54
Dept_No 100 400 200
First_Name
Last_Name
Salary
Rafael Kyle Sushma
Minal Stover Davis
90000.00 60000.00 50000.00
Chapter 2
The Azure SQL Data Warehouse Table Structures
Why Columnar?
“Everyone is kneaded out of the same dough but not baked in the same oven.” – Yiddish Proverb
Emp_No
Dept_No
1001 1002 1003 1004 1005 1006 1007 1008 1009
100 200 300 400 400 300 200 100 300
First_Name
Rafael Maria Charl Kyle Rob Inna Sushma Mo Mo
Last_Name
Minal Gomez Kertzel Stover Rivers Kinski Davis Khan Swartz
Salary
90000 80000 70000 60000 50000 50000 50000 60000 70000
Each data block holds a single column. The row can be rebuilt because everything is aligned perfectly. If someone runs a query that would return the average salary, then only one small data block is moved into memory. The salary block moves into memory where it is processed as fast as lightning. We just cut down on moving large blocks by 80%! Why columnar? Because, like our Yiddish Proverb says, "All data is not kneaded on every query, so that is why it costs so much dough." Page 55
Chapter 2
The Azure SQL Data Warehouse Table Structures
Columnar Tables Store Each Column in Separate Pages Node
Node
Node
Memory
Memory
Memory
AVG Salary
AVG Salary
AVG Salary
This is the same data you saw on the previous page! The difference is that the above is a columnar design. I have color coded this for you. There are 8 rows in the table and five columns. Notice that the entire row stays on the same disk, but each column is a separate block. This is a brilliant design for Ad Hoc queries and analytics because when only a few columns are needed, columnar can move just the columns it needs to. Columnar can't be beat for queries because the pages are so much smaller, and what isn't needed isn't moved. Page 56
Chapter 2
The Azure SQL Data Warehouse Table Structures
Visualize the Data – Rows vs. Columns 24 rows (five columns) stored in 6 blocks in this row-based system
24 rows (five columns) stored in 15 blocks (each column is its own block)
Both examples above have the same data and the same amount of data. If your applications tend to need to analyze the majority of columns or read the entire table, then a row-based system (top example) can move more data into memory. Columnar tables are advantageous when only a few columns need to be read. This is just one of the reasons that analytics goes with columnar like bread goes with butter. A row-based system must move the entire page into memory even if it only needs to read one row or even a single column. If a user above needed to analyze the Salary, the columnar system would move 80% less block mass. Page 57
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creation of a Columnar Table that is Replicated CREATE TABLE Sales_Columnar_Replicated ( Product_ID int NOT NULL, Sale_Date date, Daily_Sales decimal(9,2)) WITH ( DISTRIBUTION = REPLICATE, CLUSTERED COLUMNSTORE INDEX ); Node 1
Node 2
Node n
Above, is the CREATE statement for the Sales_Columnar_Replicated table. This table is a columnstore table that is replicated on each node. The table only has nine rows and three columns. Each table holds the exact same data. It is like looking in a mirror. That is what replicated means. The table is Replicated, but the storage is a columnar (columnstore) design. This allows single columns to be placed into memory for processing. Page 58
Chapter 2
The Azure SQL Data Warehouse Table Structures
Creating a Partitioned Table Per Month CREATE TABLE Ord_Tbl_Part ( Order_Number integer ,Customer_Number integer ,Order_Date date ,Order_Total decimal(10,2)) WITH ( DISTRIBUTION = HASH (Order_Number), PARTITION ( Order_Date RANGE RIGHT FOR VALUES ( '2015-01-01','2015-02-01','2015-03-01','2015-04-01', '2015-05-01','2015-06-01','2015-07-01','2015-08-01' ,'2015-09-01','2015-10-01','2015-11-01','2015-12-01' )));
Above, is the CREATE statement for the Ord_Tbl_Part table. This table is a rowstore table that is partitioned by Order_Date. By using RANGE RIGHT and dates for the boundary values, it puts a month of data in each partition.
Page 59
Chapter 2
The Azure SQL Data Warehouse Table Structures
A Visual of One Year of Data with Range Per Month Distribution 1
Distribution 2
Distribution 3
Distribution 4
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
Ord_Tbl_Part
01
JAN
JAN
JAN
JAN
02 03 04 05 06 07 08 09 10 11
FEB
FEB
FEB
FEB
MAR APR
MAR APR
MAR APR
MAR APR
MAY JUN JUL AUG
MAY JUN JUL AUG
MAY JUN JUL AUG
MAY JUN JUL AUG
SEP
SEP
SEP
SEP
OCT NOV
OCT NOV
OCT NOV
OCT NOV
DEC
DEC
DEC
DEC
12
Above, is a visual of the Ord_Tbl_Part table that was created on the previous page. This table is a rowstore table that is partitioned by Order_Date. By using RANGE RIGHT and dates for the boundary values, it puts a month of data in each partition. This table is NOT replicated, but hashed. The nodes each hold different rows, but store each month in their own block(s). This physical partitioning allows for faster loads and faster maintenance (Insert, Update, Deletes). This is the design you want when users are performing range queries on dates. Page 60
Chapter 2
The Azure SQL Data Warehouse Table Structures
Another Create Example of a Partitioned Table CREATE TABLE Sales_Partitioned ( Product_ID int NOT NULL, Sale_Date date, Daily_Sales decimal(9,2) ) WITH ( PARTITION ( Product_ID RANGE LEFT FOR VALUES (100, 200, 300, 400 )), CLUSTERED COLUMNSTORE INDEX ); In this example of RANGE LEFT, data will be sorted into the following partitions:
This would be the partitioning if this same table was partitioned RANGE RIGHT instead of RANGE LEFT:
Partition 1: col =) SELECT * FROM Student_Table WHERE Grade_Pt >= 3.0 ; Greater than or Equal to
Student_ID _________
231222 234121 324652 123250 322133
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________
Wilson Thomas Delaney Phillips Bond
Susie Wendy Danny Martin Jimmy
SO FR SR SR JR
3.80 4.00 3.35 3.00 3.95
All rows returned have a Grade_Pt >= 3.0
The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN along with asking for things that are GREATER/LESSER THAN or EQUAL to. Page 179
Chapter 7
The WHERE Clause
AND in the WHERE Clause Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT FROM WHERE AND
* Student_Table Class_Code = 'FR' First_Name = 'Henry' ;
Notice the WHERE statement and the word AND. In this example, qualifying rows must have a Class_Code = ‘FR’ and also must have a First_Name of ‘Henry’. Notice how the WHERE and the AND clause are on their own line. Good practice! Page 180
Chapter 7
The WHERE Clause
Troubleshooting AND Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 AND Grade_Pt = 4.0; No rows qualify. How can a student have two grade points?
What is going wrong here? You are using an AND to check the same column. What you are basically asking with this syntax is to see the rows that have BOTH a Grade_Pt of 3.0 and a 4.0. That is impossible, so no rows will be returned. Page 181
Chapter 7
The WHERE Clause
OR in the WHERE Clause SELECT FROM WHERE OR
Student_ID _________ 234121 123250
* Student_Table Grade_Pt = 3.0 Grade_Pt = 4.0;
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Thomas Phillips
Wendy Martin
FR SR
4.00 3.00
Notice above in the WHERE Clause we use OR. Or allows for either of the parameters to be TRUE in order for the data to qualify and return.
Page 182
Chapter 7
The WHERE Clause
Troubleshooting Or Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 OR 4.0; error
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 OR Grade_Pt = 4.0; perfect
Notice above in the WHERE Clause we use OR. Or allows for either of the parameters to be TRUE in order for the data to qualify and return. The first example errors and is a common mistake. The second example is perfect.
Page 183
Chapter 7
The WHERE Clause
Troubleshooting Character Data Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 AND Class_Code = SR ;
Error!!! Why?
This query errors! What is WRONG with this syntax? No Single quotes around SR.
Page 184
Chapter 7
The WHERE Clause
Using Different Columns in an AND Statement Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 3.0 AND Class_Code = 'SR' ; Student_ID _________ 123250
Last_Name _________ Phillips
First_Name __________ Class_Code __________ Grade_Pt ________ Martin SR 3.00
Notice that AND separates two different columns, and the data will come back if both are TRUE.
Page 185
Chapter 7
The WHERE Clause
Quiz – How many rows will return? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0 AND Class_Code = 'SR' ; Which Seniors have a 3.0 or a 4.0 Grade_Pt average. How many rows will return?
Page 186
A) 2
C) Error
B) 1
D) 3
Chapter 7
The WHERE Clause
Answer to Quiz – How many rows will return? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0 AND Class_Code = 'SR' ;
Student_ID _________ Last_Name __________ First_Name Class_Code Grade_Pt _________ __________ ________ 234121 Thomas Wendy FR 4.00 123250 Phillips Martin SR 3.00
We had two rows return! Isn’t that a mystery? Why?
Page 187
Chapter 7
The WHERE Clause
LIKE command Underscore is Wildcard for one Character Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT * FROM Student_Table WHERE Last_Name LIKE '_a%' ;
Student_ID _________ 423400 125634
Last_Name _________ Larkins Hanson
Show me anyone with an 'a' as the 2nd letter in their Last_Name
First_Name __________ Class_Code __________ Grade_Pt ________ Michael FR 0.00 Henry FR 2.88
The _ underscore sign is a wildcard for any single character. We are looking for anyone who has an 'a' as the second letter of their last name. Page 188
Chapter 7
The WHERE Clause
LIKE command using a Range of Values
The above syntax allows us to use a range of values (a-f in this example). Any First_Name that starts with an a, b, c, d, e or f will return. How about that for clever SQL?
Page 189
Chapter 7
The WHERE Clause
LIKE command Using a NOT Range of Values
The ^ sign (Shift 6) acts like a NOT.
The above syntax allows us to use a NOT range of values (a-f in this example). Any First_Name that starts with the letter a, b, c, d, e or f will not return.
Page 190
Chapter 7
The WHERE Clause
LIKE Command Works Differently on Char Vs Varchar Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
First_Name's Data Type is VARCHAR (20)
Student_ID _________ 125634 322133 324652 333450 260000 234121
SELECT * FROM Student_Table WHERE First_Name LIKE '%y' ;
Last_Name __________ First_Name __________ Class_Code ________ Grade_Pt _________ Hanson Henry FR 2.88 Bond Jimmy JR 3.95 Delaney Danny SR 3.35 Smith Andy SO 2.00 Johnson Stanley ? ? Thomas Wendy FR 4.00
It is important that you know the data type of the column you are using with your LIKE command. VARCHAR and CHAR data differ slightly. Page 191
Chapter 7
The WHERE Clause
Troubleshooting LIKE Command on Character Data Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
Last_Name has a Data Type of CHAR (20)
Student_ID _________
SELECT * FROM Student_Table WHERE Last_Name LIKE '%n' ;
Last_Name _________
First_Name __________ Class_Code __________ Grade_Pt ________
No Rows are returned! Why?
This is a CHAR (20) data type. That means that any words under 20 characters will pad spaces behind them until they reach 20 characters. You will not get any rows back from this example because technically, no row ends in an ‘N’, but instead ends in a space. Page 192
Chapter 7
The WHERE Clause
Introducing the RTRIM Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250 Last_Name has a Data Type of CHAR (20)
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Last_Name FROM Student_Table WHERE RTRIM (Last_Name) LIKE '%n' ; Last_Name __________ Hanson Wilson Johnson
This is a CHAR(20) data type. That means that every Last_Name is going to be 20 characters long. Most names are not really 20 characters long, so spaces are padded at the end to ensure filling up all 20 characters. We need to do the RTRIM command to remove the trailing spaces. Once the spaces are trimmed, we can find out whose name ends in 'n'. Page 193
Chapter 7
The Where Clause
Quiz – What Data is Left Justified and What is Right? SELECT FROM WHERE AND
* Sample_Table Column1 IS NULL Column2 IS NULL ;
Answer Set Column1 Integers are Right Justified!
? Right Justified
Column2
?
Character Data is Left Justified!
Left Justified
Which Column from the Answer Set could have a DATA TYPE of INTEGER, and which could have Character Data?
Page 194
Chapter 7
The Where Clause
Numbers are Right Justified and Character Data is Left SELECT FROM WHERE AND
* Sample_Table Column1 IS NULL Column2 IS NULL ;
Answer Set Column1 Integers are Right Justified!
? Right Justified
Column2
?
Character Data is Left Justified!
Left Justified
All Integers will start from the right and move left. Thus, Col1 was defined during the table create statement to hold an INTEGER. The next page shows a clear example.
Page 195
Chapter 7
The Where Clause
Answer – What Data is Left Justified and What is Right? SELECT Employee_No, First_Name FROM Employee_Table WHERE Employee_No = 2000000;
Answer Set Employee_No ____________ Integers are Right justified!
2000000
First_Name __________ Squiggy
Characters are Left justified!
All Integers will start from the right and move left. All Character data will start from the left and move to the right.
Page 196
Chapter 7
The Where Clause
An Example of Data with Left and Right Justification SELECT Student_ID, Last_Name FROM Student_Table ;
Student_ID __________
Integers are Right justified!
423400 125634 280023 260000 231222 234121 324652 123250 322133 333450
Last_Name _______
Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith
Characters are Left justified!
This is how a standard result set will look. Notice that the integer type in Student_ID starts from the right and goes left. Character data type in Last_Name moves left to right like we are used to seeing while reading English.
Page 197
Chapter 7
The Where Clause
A Visual of CHARACTER Data vs. VARCHAR Data Character Data on Disk Last_Name as a Char(20)
Jones _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Hanson _ _ _ _ _ _ _ _ _ _ _ _ _ _
Spaces padded at the end
McRoberts _ _ _ _ _ _ _ _ _ _ _ Johnson _ _ _ _ _ _ _ _ _ _ _ _ _ Varchar Data on Disk
Last_Name as a Varchar(20) 2-byte VLI Variable Length Indicator
0
5 Jones
0
6 Hanson
0
9 McRoberts
0
7
No Spaces
Johnson
Character data pads spaces to the right and Varchar uses a 2-byte VLI instead.
Page 198
Chapter 7
The Where Clause
RTRIM command Removes Trailing spaces on CHAR Data Character Data on Disk Last_Name as a Char(20) Jones _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Hanson _ _ _ _ _ _ _ _ _ _ _ _ _ _
Spaces padded at the end
Wilson _ _ _ _ _ _ _ _ _ _ _ _ _ _
Johnson _ _ _ _ _ _ _ _ _ _ _ _ _ SELECT Last_Name FROM Student_Table WHERE RTRIM (Last_Name) LIKE '%n' ;
Trim removes spaces at the front and back
Last_Name __________ Hanson Wilson Johnson
Last_Name has a Data Type of CHAR (20)
By using the TRIM command on the Last_Name column, you are able to trim off any spaces from the end. Once we use the TRIM on Last_Name, we have eliminated any spaces at the end, so now we are set to bring back anyone with a Last_Name that truly ends in ‘n’! Page 199
Chapter 7
The Where Clause
Using Like with an AND Clause to Find Multiple Letters
The above uses an additional AND clause to find anyone with both an 'M' and an 'S' in their last name. Notice that the Azure SQL Data Warehouse is not case sensitive.
Page 200
Chapter 7
The Where Clause
Using Like with an OR Clause to Find Either Letters
The above uses an additional OR clause to find anyone with both an ‘M’ and an 'S' in their last name. Notice that the Azure SQL Data Warehouse is not case sensitive. Page 201
Chapter 7
The Where Clause
Declaring a Variable and Using it with the LIKE Command Addresses Subscriber_No _________________ ____________ Street City _________ State _____ MI 3333333 2468 Appreciate Ave. Mytown Sometown CA 2222222 123 Some St. Anytown AL 1111111 123 Any St. Big City NY 5555555 121 Jump St. Big City NY 4444444 12 Jump St. We declare a variable
Zip _________AreaCode ________ _______ Phone 123561111 937 3334567 256781212 475 5651213 456780000 435 5551213 334566598 310 4531111 334566576 310 4530097
DECLARE @StreetVar nvarchar(60) = 'Appreciate'; SELECT Street, City, State Variable value Highlight and FROM Addresses Run these Commands together WHERE street LIKE '%' + @StreetVar + '%'; + means concatenate
+ means concatenate
Street City _________________ _______ 2468 Appreciate Ave. Mytown
State _____ MI
The equivalent statement would be Select Street, City, State from Addresses WHERE street like '%Appreciate% ;
In the above example, we have declared a variable and placed a value in it with the word 'Appreciate'. We use it in our LIKE query in conjunction with concatenation. WARNING: The DECLARE and SELECT must be highlighted and run together. Page 202
Chapter 7
The Where Clause
Escape Character in the LIKE Command changes Wildcards
Student_ID __________ 423400 125634 280023 260000 231222 234121 324652 123250 322133 333450 999999
Student_Table Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith T_
Michael Henry Richard Stanley Susie Wendy Danny Martin Jimmy Andy S%
FR FR JR ? SO FR SR SR JR SO FR
0.00 2.88 1.90 ? 3.80 4.00 3.35 3.00 3.95 2.00 1.90
/* We just pretended to add a new row to the Student_Table */
/* Can you use the LIKE command to find S% above? */
Here you will have to utilize a Wildcard Escape Character. Turn the page for more. Page 203
Chapter 7
The Where Clause
Escape Characters Turn off Wildcards in the LIKE Command Student_ID __________ 423400 125634 280023 260000 231222 234121 324652 123250 322133 333450 999999
Student_Table Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith T_
Michael Henry Richard Stanley Susie Wendy Danny Martin Jimmy Andy S%
FR FR JR ? SO FR SR SR JR SO FR
0.00 2.88 1.90 ? 3.80 4.00 3.35 3.00 3.95 2.00 1.90
Can you use the LIKE command to find S% above? SELECT * FROM Student_Table WHERE First_Name LIKE 'S@%' Escape '@';
We can pick our Escape character and we have chosen the @ sign. This turns the wildcard off for 1 character so we find ‘S%’, without bringing back Stanley or Susie. Page 204
Chapter 7
The Where Clause
Quiz – Turn off that Wildcard
Student_ID __________ 423400 125634 280023 260000 231222 234121 324652 123250 322133 333450 999999
Student_Table Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith T_
Michael Henry Richard Stanley Susie Wendy Danny Martin Jimmy Andy S%
FR FR JR ? SO FR SR SR JR SO FR
0.00 2.88 1.90 ? 3.80 4.00 3.35 3.00 3.95 2.00 1.90
Can you use the LIKE command to find the Last_Name of T_? (pronounced Tunderscore!)
This is a little trickier than you might think so be on your toes…. And get a haircut! Page 205
Chapter 7
The Where Clause
ANSWER – To Find that Wildcard Student_ID __________ 423400 125634 280023 260000 231222 234121 324652 123250 322133 333450 999999
Student_Table Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Hanson McRoberts Johnson Wilson Thomas Delaney Phillips Bond Smith T_
Michael Henry Richard Stanley Susie Wendy Danny Martin Jimmy Andy S%
FR FR JR ? SO FR SR SR JR SO FR
0.00 2.88 1.90 ? 3.80 4.00 3.35 3.00 3.95 2.00 1.90
Can you use the LIKE command to find the Last_Name of T_? (pronounced Tunderscore!)
SELECT * FROM Student_Table WHERE RTRIM(Last_Name) LIKE 'T@_' Escape '@' ;
You didn’t really need to get a full haircut, but just a RTRIM Command and the Escape!
Page 206
Chapter 8
Page 207
Distinct, Group By and TOP
Chapter 8
Distinct, Group By and TOP
Chapter 8 – Distinct, Group By and TOP
“A bird does not sing because it has the answers, it sings because it has a song.” - Anonymous
Page 208
Chapter 8
Distinct, Group By and TOP
The Distinct Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Distinct Class_Code FROM Student_Table ORDER BY 1;
Class_Code __________ ? Distinct FR won't repeat SO duplicate JR values SR
DISTINCT eliminates duplicates from returning in the Answer Set.
Page 209
Chapter 8
Distinct, Group By and TOP
Distinct vs. GROUP BY SELECT Class_Code FROM Student_Table GROUP BY Class_Code ORDER BY 1;
SELECT Distinct Class_Code FROM Student_Table ORDER BY 1;
Both examples produce the exact same result
Class_Code _________ ? FR JR SO SR Rules for Distinct Vs. GROUP BY (1) Many Duplicates – use GROUP BY (2) Few Duplicates – use DISTINCT
(3) Space Exceeded - use GROUP BY
Distinct and GROUP BY in the two examples return the same answer set. Page 210
Chapter 8
Distinct, Group By and TOP
Quiz – How many rows come back from the Distinct? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Distinct Class_Code, Grade_Pt FROM Student_Table ORDER BY Class_Code, Grade_Pt;
How many rows will come back from the above SQL? Page 211
Chapter 8
Distinct, Group By and TOP
Answer – How many rows come back from the Distinct? SELECT Distinct Class_Code, Grade_Pt FROM Student_Table ORDER BY Class_Code, Grade_Pt ;
Class_Code __________ ? FR FR FR JR JR SO SO SR SR
Grade_Pt ________ ? 0.00 2.88 4.00 1.90 3.95 2.00 3.80 3.00 3.35
No Rows have the exact same values for both the Class_Code and Grade_Pt. Each row is Distinct!
How many rows will come back from the above SQL? 10. All rows came back. Why? Because there are no exact duplicates that contain a duplicate Class_Code and Duplicate Grade_Pt combined. Each row in the SELECT list is distinct. Page 212
Chapter 8
Distinct, Group By and TOP
TOP Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT TOP (3 ) Last_Name, Class_Code, Grade_Pt FROM Student_Table ;
Last_Name Class_Code Grade_Pt __________ __________ ________ Hanson Bond Smith
FR JR SO
2.88 3.95 2.00
In the above example, we brought back 3 rows only. This is because of the TOP 3 statement which means to get an answer set, and then bring back the first 3 rows in that answer set. Because this example does not have an ORDER BY statement, you can consider this example as merely bringing back 3 random rows. Page 213
Chapter 8
Distinct, Group By and TOP
TOP Command is brilliant when ORDER BY is used! Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT TOP (3) Last_Name, Class_Code, Grade_Pt FROM Student_Table ORDER BY Grade_Pt DESC ;
Last_Name Grade_Pt _________ Class_Code _________ ________ Thomas FR 4.00 Bond JR 3.95 Wilson SO 3.80
In the above example, we brought back 3 rows only. This is because of the TOP 3 statement which means to get an answer set, and then bring back the first 3 rows. Because this example uses an ORDER BY statement, the data brought back is from the top 3 students with the highest Grade_Pt. This is the real power of the TOP command. Use it with an ORDER BY! Page 214
Chapter 8
Distinct, Group By and TOP
TOP Command with Ties Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ 2nd Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Tie Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? 1st Thomas Wendy FR 4.00 Tie Phillips Martin SR 3.00
SELECT TOP (2) WITH TIES Last_Name ,Class_Code ,Grade_Pt FROM Student_Table ORDER BY Class_Code ;
Last_Name _________ Class_Code _________ Grade_Pt ________ Johnson ? ? Larkins FR 0.00 Thomas FR 4.00 Hanson FR 2.88
By using the TOP WITH TIES Command, this will bring in the TOP amount along with ANY ties. So while you might only ask for the top 2 with ties, you might get 4 rows back. Why did 4 rows return here? Which row came back first? Four rows returned with the first row coming back as a NULL for Class_Code. Then the next row returned was one of the Freshman. There were two other Freshman that tie. All ‘FR’ come back in a tie! Page 215
Chapter 8
Distinct, Group By and TOP
TOP Command Using a Variable
Highlight the DECLARE and the SELECT statement and hit EXECUTE
You can use the TOP command in conjunction with a variable. Above, we declared a variable called @TOPVAR. We set the variable to 5. If we highlight the DECLARE and the Query we get the answer set providing the TOP 5 salaried employees.
Page 216
Chapter 9
Page 217
Aggregation
Chapter 9
Aggregation
Chapter 9 – Aggregation
“The Azure SQL Data Warehouse climbed Aggregate Mountain and delivered a better way to Sum It.” – Tera-Tom Coffing
Page 218
Chapter 9
Aggregation
Quiz – You calculate the Answer Set in your own Mind Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT
FROM WHERE
Avg(Grade_Pt) AS "AVG" ,Count(Grade_Pt) AS "Count" ,Count(*) AS "Count *" Student_Table Class_Code IS NULL AVG _____ Count _____
Count * _______
What would the result set be from the above query? The next slide shows answers!
Page 219
Chapter 9
Aggregation
Answer – You calculate the Answer Set in your own Mind Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT
Avg(Grade_Pt) AS "AVG" ,Count(Grade_Pt) AS "Count" ,Count(*) AS "Count *" Student_Table Class_Code IS NULL
FROM WHERE
AVG _____ Count _____ ?
Here are your answers!
Page 220
0
Count * _______ 1
Here are the correct answers
Aggregates ignore Null values
Chapter 9
Aggregation
The 3 Rules of Aggregation Aggregation_Table Employee_No 423400 423401 423402
Salary 100000.00 100000.00 NULL
SELECT AVG(Salary) as "AVG" ,Count(Salary) as SalCnt ,Count(*) as RowCnt FROM Aggregation_Table ;
1) Aggregates Ignore Null Values.
2) Aggregates WANT to come back in one row. 3) You CAN’T mix Aggregates with normal columns unless you use a GROUP BY.
AVG(Salary) = $100000.00
Page 221
Count(Salary) = 2
Count(*) = 3
Chapter 9
Aggregation
There are Five Aggregates There are FIVE AGGREGATES which are the following: MIN – The Minimum Value. MAX – The Maximum Value. AVG – The Average of the Column Values. SUM – The Sum Total of the Column Values. COUNT – The Count of the Column Values.
SELECT MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table ;
“Don’t count the days, make the days count.” – Mohammed Ali
The five aggregates are listed above. Mohammed Ali was way off in his quote. He meant to say, "Don't you count the days, make the data count for you". Page 222
Chapter 9
Aggregation
Quiz – How many rows come back? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT MIN (Salary) AS Minsal ,MAX (Salary) AS Maxsal ,SUM (Salary) AS Sumsal ,AVG (Salary) AS Avgsal ,Count(*) AS Countrows FROM Employee_Table
How many rows will the above query produce in the result set? Page 223
How many rows come back?
Chapter 9
Aggregation
Answer – How many rows come back? SELECT MIN (Salary) AS Minsal ,MAX (Salary) AS Maxsal ,SUM (Salary) AS Sumsal ,AVG (Salary) AS Avgsal ,Count(*) AS Countrows FROM Employee_Table
Minsal ________
Maxsal ________
Sumsal ________
32800.50
64300.00
421039.38
Only one row comes back
Avgsal ________
46782.153333
How many rows will the above query produce in the result set? The answer is one. Page 224
Countrows _________ 9
Chapter 9
Aggregation
Troubleshooting Aggregates Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert
SELECT Dept_No ,MIN (Salary) ,MAX (Salary) ,SUM (Salary) ,AVG (Salary) ,Count(*) FROM Employee_Table ;
Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
NON-Aggregate
Error
If you have a normal column (non aggregate) in your query, you must have a corresponding GROUP BY statement. Page 225
Chapter 9
Aggregation
GROUP BY when Aggregates and Normal Columns Mix
NON-Aggregate
Group By Needed
If you have a normal column (non aggregate) in your query, you must have a corresponding GROUP BY statement.
Page 226
Chapter 9
Aggregation
GROUP BY delivers one row per Group
Group By Needed
Dept_No ________ 10 100 200 300 400 ?
Min __________ 64300.00 48850.00 41888.88 40200.00 36000.00 32800.50
NON-Aggregate SELECT Dept_No ,MIN (Salary) AS "Min ,MAX (Salary) AS "Max" ,SUM (Salary) AS "Sum" ,AVG (Salary) AS "Avg" ,Count(*) AS "Count" FROM Employee_Table GROUP BY Dept_No ORDER BY Dept_No ;
Max __________ 64300.00 48850.00 48000.00 40200.00 54500.00 32800.50
Sum AVG Count __________ ___________ _______ 64300.00 1 64300.00 48850.00 1 48850.00 44944.44 2 89888.88 40200.00 1 40200.00 48333.33 3 145000.00 32800.50 1 32800.50
Group By Dept_No command allow for the Aggregates to be calculated per Dept_No. The data has also been sorted with the ORDER BY statement.
Page 227
Chapter 9
Aggregation
Count_Big
Count_Big has a data type of BIGINT
SELECT Dept_No, COUNT(Salary) AS CountSal, COUNT_BIG(Salary) AS CountSalBig FROM Employee_Table GROUP BY Dept_No ORDER BY Dept_No;
Dept_No __________ CountSal CountSalBig ________ __________ ? 1 1 10 1 1 100 1 1 200 2 2 300 1 1 400 3 3
The Count_Big command is the same as a Count, but the Count_Big uses a data type of BIGINT. The Count uses an Integer data type. The Count_Big is for values > 2000,000,000 (two billion).
Page 228
Chapter 9
Aggregation
Limiting Rows and Improving Performance with WHERE Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT Dept_No, MIN (Salary), MAX (Salary), SUM (Salary) , AVG (Salary) , COUNT(*) WHERE Clause acts FROM Employee_Table as a filter before any WHERE Dept_No IN (200, 400) Calculations are done GROUP BY Dept_No Order by 1 ; Will Dept_No 300 be calculated? Of course you know it will…NOT!
Page 229
Chapter 9
Aggregation
WHERE Clause in Aggregation limits unneeded Calculations SELECT Dept_No , MIN (Salary) as "Min" , MAX (Salary) as "Max" , SUM (Salary) as "Sum" , AVG (Salary) as "Avg" , COUNT(*) as "Count" FROM Employee_Table WHERE Dept_No IN (200, 400) GROUP BY Dept_No Order by 1 ;
WHERE Clause acts as a filter before any Calculations are done
Dept_No __________ Min Max Sum AVG Count ________ __________ __________ ___________ ________ 200 400
41888.88 36000.00
48000.00 54500.00
89888.88 145000.00
44944.44 48333.33
2 3
The system eliminates reading any other Dept_No’s other than 200 and 400. This means that only Dept_No’s of 200 and 400 will come off the disk to be calculated.
Page 230
Chapter 9
Aggregation
Keyword HAVING tests Aggregates after they are Totaled SELECT Dept_No , MIN (Salary) as "Min" , MAX (Salary) as "Max" , SUM (Salary) as "Sum" , AVG (Salary) as "Avg" , COUNT(*) as "Count" FROM Employee_Table WHERE Dept_No IN (200, 400) GROUP BY Dept_No HAVING AVG(Salary) > 45000 Order by 1 ;
HAVING Clause acts as a filter on the totals after the Calculations are done
Dept_No __________ Min Max Sum AVG Count ________ __________ __________ ___________ ________ 200 400
41888.88 36000.00
48000.00 54500.00
89888.88 145000.00
44944.44 48333.33
2 3
The HAVING Clause only works on Aggregate Totals. The WHERE filters rows to be excluded from calculation, but the HAVING filters the Aggregate totals after the calculations, thus eliminating certain Aggregate totals.
Page 231
Chapter 9
Aggregation
Group By Grouping Sets SELECT S.Product_Id ,DATEPART (Month, S.Sale_Date) as "Mo" ,DATEPART (Year, S.Sale_Date) as "Year" ,SUM (Daily_Sales) as Total FROM Sales_Table as S GROUP BY Grouping Sets (S.Product_Id ,DATEPART(Month, S.Sale_Date) ,DATEPART(Year, S.Sale_Date)) ORDER BY 1 , 2 , 3 Product_Id _________ ? ? ? 1000 2000 3000
Mo ____ ? 9 10 ? ? ?
Year _____ 2000 ? ? ? ? ?
Total _________ 862404.35 } 418769.36 443634.99 331204.72 306611.81 224587.82
}
}
Each Grouping Set totals 862404.35 in this example
This query does not work in this release of the Azure SQL Data Warehouse
The example above shows the Group By Grouping Sets. This will show the figures from the Sales_Table many different ways. You will see the total sales for all sales combined, total sales per year, total sales per month and total sales per Product_Id. There are three of these commands, Group By Grouping Sets, Group By Rollup and Group By Cube.
Page 232
Chapter 9
Aggregation
Group By Rollup SELECT S.Product_Id ,DATEPART (Month, S.Sale_Date) as "Mo" ,DATEPART (Year, S.Sale_Date) as "Year" ,SUM (Daily_Sales) as Total FROM Sales_Table as S GROUP BY ROLLUP (S.Product_Id ,DATEPART(Month, S.Sale_Date) ,DATEPART(Year, S.Sale_Date)) ORDER BY 1 , 2 , 3
This query does not work in this release of the Azure SQL Data Warehouse
The example above shows the Group By Rollup. This will show the figures from the Sales_Table many different ways. The Answer set is on the following page. There are three of these command, Group By Grouping Sets, Group By Rollup and Group By Cube. Grouping Sets shows a few different views. Group By Rollup takes it further and Group By Cube even more. Turn the page and see what Rollup does.
Page 233
Chapter 9
Aggregation
Answer Set for Group By Rollup Query Product_Id _________ ? 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000
Mo ____ ? ? 9 9 10 10 ? 9 9 10 10 ? 9 9 10 10
Year _____ ? ? ? 2000 ? 2000 ? ? 2000 ? 2000 ? ? 2000 ? 2000
Total _________ 862404.35 331204.72 139350.69 139350.69 191854.03 191854.03 306611.81 139738.91 139738.91 166872.90 166872.90 224587.82 139679.76 139679.76 84908.06 84908.06
SELECT S.Product_Id ,DATEPART (Month, S.Sale_Date) as "Mo" ,DATEPART (Year, S.Sale_Date) as "Year" ,SUM (Daily_Sales) as Total FROM Sales_Table as S GROUP BY ROLLUP (S.Product_Id ,DATEPART(Month, S.Sale_Date) ,DATEPART(Year, S.Sale_Date)) ORDER BY 1 , 2 , 3
The answer set from the previous page is above. All of the different colors in the Total add up to 862404.35.
Page 234
Chapter 9
Aggregation
Creating a Cube SELECT S.Product_Id ,DATEPART (Month, S.Sale_Date) as "Mo" ,DATEPART (Year, S.Sale_Date) as "Year" ,SUM (Daily_Sales) as Total FROM Sales_Table as S GROUP BY CUBE(S.Product_Id ,DATEPART(Month, S.Sale_Date) ,DATEPART(Year, S.Sale_Date)) ORDER BY 1 , 2 , 3
This query does not work in this release of the Azure SQL Data Warehouse
The example above shows how to create a cube. This will show the figures from the Sales_Table many different ways. You will see the total sales for all sales combined, total sales per year, total sales per month and more. The following page will show the answer set.
Page 235
Chapter 9
Aggregation
Answer Set for Cube Query Product_Id _________ ? ? ? ? ? ? 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 3000 3000 3000 3000 3000 3000
Mo ____ ? ? 9 9 10 10 ? ? 9 9 10 10 ? ? 9 9 10 10 ? ? 9 9 10 10
Year _____ ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000 ? 2000
Total _________ 862404.35 862404.35 418769.36 418769.36 443634.99 443634.99 331204.72 331204.72 139350.69 139350.69 191854.03 191854.03 306611.81 306611.81 139738.91 139738.91 166872.90 166872.90 224587.82 224587.82 139679.76 139679.76 84908.06 84908.06
SELECT S.Product_Id ,DATEPART (Month, S.Sale_Date) as "Mo" ,DATEPART (Year, S.Sale_Date) as "Year" ,SUM (Daily_Sales) as Total FROM Sales_Table as S GROUP BY CUBE (S.Product_Id ,DATEPART(Month, S.Sale_Date) ,DATEPART(Year, S.Sale_Date)) ORDER BY 1 , 2 , 3
The answer set from the previous page is above. All of the different colors in the Total add up to 862404.35.
Page 236
Chapter 9
Aggregation
An Easy Example of Creating a Cube
This is not a great cube example because there is only one customer who placed one order, however, it is done to show you the concept of a cube. At the top of the answer set is what was made in total. Then it is further broken down. Page 237
Chapter 9
Aggregation
Quiz - GROUP BY GROUPING SETS Challenge Course_Table Course_ID Course_Name _________ _________________ Student_Course_Table Student_ID Course_ID
280023 231222 125634 231222 125634 322133 125634 322133 324652 333450 260000 333450 234121 123250
210 210 100 220 200 220 220 300 200 500 400 400 100 100
100 200 210 220 300 400
Credits ______ Seats ____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 V2R3 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16 Student_Table
__________ Student_ID 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
__________ Last_Name Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
First_Name __________ __________ Class_Code Grade_Pt ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
Write SQL that will perform a Group by Grouping Sets. Your mission is to build a report that will show the Average Grade_Pt for three different sets. Those sets are by Class_Code, by Credits and by Course_ID. Sort the final report first by Class_Code (FR, SO, JR, SR) and then by Credits DESC and then by Course_ID Desc. Good Luck!
The answer is on the next page. Page 238
Chapter 9
Aggregation
Answer To Quiz - GROUP BY GROUPING SETS Challenge SELECT Class_Code ,AVG(Grade_Pt) ,Credits ,"c".Course_Id FROM Student_Table s, Student_Course_Table sc, Course_Table "c" WHERE s.Student_Id=sc.Student_Id AND sc.Course_Id="c".Course_Id GROUP BY GROUPING SETS (Class_Code, "c".Course_ID, Credits) ORDER BY CASE Class_Code WHEN 'FR' Then 1 WHEN 'SO' Then 2 WHEN 'JR' Then 3 WHEN 'SR' Then 4 ELSE 5 END, Credits DESC, Course_ID DESC ; Above, is something to enjoy and learn from.
Page 239
Chapter 9
Aggregation
Getting the Average Values Per Column
The first query retrieved the average rows per value for the column Product_ID. The example below did the same, but for the column Sale_Date.
Page 240
Chapter 9
Average Values per Column for all Columns in a Table
The query above retrieved the average rows per value for both columns in the table.
Page 241
Aggregation
Chapter 10
Page 242
Join Functions
Chapter 10
Join Functions
Chapter 10 - Join Functions
“When spider webs unite they can tie up a lion.” - African Proverb
Page 243
Chapter 10
Join Functions
The Azure SQL Data Warehouse Join Quiz Which Statement is NOT true! 1. Each Table in the Azure SQL Data Warehouse has a Distribution Key, unless it is a replicated table. 2. The Distribution Key is the mechanism that allows the Azure SQL Data Warehouse to physically distribute the rows of a table across the Nodes. 3. For two rows to be Joined together the Azure SQL Data Warehouse insists that both rows are physically in the same memory. 4. The Azure SQL Data Warehouse will either Redistribute one or both of the tables or Duplicate the smaller table across all nodes to ensure matching rows are in the same memory, even if it is only for the life of the Join.
Do you know which statement above is False?
Page 244
Chapter 10
Join Functions
The Azure SQL Data Warehouse Join Quiz Answer Which Statement is NOT true!
1. Each Table in the Azure SQL Data Warehouse has a Distribution Key, unless it is a replicated table. 2. The Distribution Key is the mechanism that allows the Azure SQL Data Warehouse to physically distribute the rows of a table across the Nodes. 3. For two rows to be Joined together the Azure SQL Data Warehouse insists that both rows are physically in the same memory. All statements are true 4. The Azure SQL Data Warehouse will either Redistribute one or both of the tables or Duplicate the smaller table across all nodes to ensure matching rows are in the same memory, even if it is only for the life of the Join.
Distribution Customer_Table row
Memory
ACE Consult 555-1212 31323134
Order_Table row
31323134 123552 10/01/1999 5111.47
Join on Customer_Number
All statements above are TRUE! Two joining rows have to be in the same memory of a single node.
Page 245
Chapter 10
Join Functions
Redistribution Distribution Memory ACE Consult 555-1212 31323134
31323134 123552 10/01/1999 5111.47
Join on Customer_Number The Distribution Key for this table is Customer_Number, so it is naturally on this Distribution.
Customer_Number is NOT the Distribution Key so this row was re-hashed by Customer_Number.
SELECT C.Customer_Number, C.Phone_Number ,C.Customer_Name ,O.Customer_Number, O.Order_Number, O.Order_Date, Order_Total FROM Customer_Table as C INNER JOIN Order_Table as O ON C.Customer_Number = O.Customer_Number ;
The Azure SQL Data Warehouse can redistribute data (temporarily) by re-hashing Customer_Number from the Order_Table. Now, all joining rows will be on the same node's memory. That is one of two ways to get matching rows together. Page 246
Chapter 10
Join Functions
Big Table Small Table Join Strategy Distribution
Distribution
Employee_Table has 1 million rows
Employee_Table has 1 million rows
Department_Table
Department_Table
100 Marketing 400 Customer Support
200 Research and Dev 300 Sales
The Department_Table is small. It only has four rows
The Azure SQL Data Warehouse has a special way of dealing with big table and small table joins. Turn the page and be prepared to be amazed! Page 247
Chapter 10
Join Functions
Duplication of the Smaller Table across All-Distributions Duplicate the smaller table in memory
Distribution 100 200 300 400
Marketing Research and Dev Sales Customer Support
Employee_Table has 1 million rows
Distribution 100 200 300 400
Marketing Research and Dev Sales Customer Support
Employee_Table has 1 million rows
Department_Table
Department_Table
100 Marketing 400 Customer Support
200 Research and Dev 300 Sales
The Department_Table is small. It only has four rows
The Azure SQL Data Warehouse took the Department_Table and gathered up all 4-rows (temporarily) and in memory Duplicated the entire 4-row Table across all Nodes. Now the joins can happen! This is the second way to get rows together. If one table is much bigger than the other, the Azure SQL Data Warehouse will duplicate the smaller table on all Nodes, just for the life of the query. Page 248
Chapter 10
Join Functions
If the Join Condition is the Distribution Key no Movement SELECT Last_Name, Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
Is Dept_No the Primary of Employee_Table? YES
Is Dept_No the Primary of Department_Table? YES
If the above tables (being joined by Dept_No) also had Dept_No as their Primary then matching rows would naturally be on the same Node together. See this visually by turning the page!
The Azure SQL Data Warehouse knows that it can only JOIN two rows together if they are physically on the same node. This can occur naturally if the join condition columns are the Distribution Keys of their respective tables, but most likely the Azure SQL Data Warehouse will have to move data to get the matching rows on the same Node. What will the Optimizer decide to do next? Page 249
Chapter 10
Join Functions
Matching Rows That Are On The Same Node Naturally Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
If Dept_No was the Distribution Key of both tables the matching rows would already be on the same Node
Distribution
Distribution
Distribution
Distribution
100 Marketing
200 Research and Dev
300 Sales
400 Customer Support
300 Larkins
400 Harrison 400 Reilly 400 Strickling
100 Chambers 10 Smythe
200 Coffing 200 Smith
If both the Employee_Table and the Department_table (being joined by Dept_No) have Dept_No as their respective Distribution Keys they are considered co-located. Anytime these two tables are joined, there is no need to redistribute or duplicate because the matching rows are naturally on the same Node. That is the brilliance of the Hash Formula. Page 250
Chapter 10
Join Functions
What if the Join Condition Columns are Not Primary Indexes SELECT Last_Name, E.Dept_No, Department_Name FROM Employee_Table as E, Department_Table as D WHERE E.Dept_No = D.Dept_No Order BY 1 ;
Is Dept_No the Primary Index of Employee_Table? NO
Is Dept_No the Primary Index of Department_Table? YES
Redistribute the Employee_Table by Dept_No for this join only.
The Optimizer knows that the Dept_No column is the Distribution Key for the Department_Table. It also knows that the Dept_No column is NOT the Distribution Key for the Employee_Table, so the Optimizer commands the Nodes to Redistribute the entire Employee_Table by Dept_No temporarily. This is equivalent to loading the Employee_Table with a Distribution Key of Dept_No. Now all matching rows can join. Page 251
Chapter 10
Join Functions
Strategy 1 of 4 – The Merge Join The rows to be joined have to be located on a common Distribution's memory Both spools have to be sorted by the ROWID calculated over the join column(s )
1
1
1
2
2
3
2
3
Re-Distribution of one or both spools by ROWHASH or Duplication of the smaller spool to all Nodes Sorting of one or both spools by the ROWID
Relocation of rows to the common node can be done by redistribution of the rows by the join column(s) ROWHASH or by copying the smaller table as a whole to all nodes. Page 252
Chapter 10
Join Functions
Quiz – Redistribute the Employees by their Dept_No Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Distribution
100 Marketing
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Distribution
200 Research and Dev
Dept_No ________________ Department_Name ________
Distribution
300 Sales
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
If the Azure SQL Data Warehouse decided to Redistribute the Employee_Table by Dept_No, which nodes will hold which employee rows? Try and place them yourself.
Distribution
400 Customer Support
Fill in the quiz above. This is a great opportunity to understand the Azure SQL Data Warehouse engine.
Page 253
Chapter 10
Join Functions
Quiz –Dept_No landed on Distribution with Matches Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
The hashing formula is consistent. Notice that all of the Dept_No 400 rows landed on Node 4. That is because one hash formula ensures matches reside together.
Distribution
Distribution
Distribution
Distribution
100 Marketing
200 Research and Dev
300 Sales
400 Customer Support
300 Larkins
400 Harrison 400 Reilly 400 Strickling
100 Chambers 10 Smythe
200 Coffing 200 Smith
Each redistributed row landed on the same Node as its matching row. Notice that Squiggy Jones has a NULL department so the Azure SQL Data Warehouse will not redistribute that row on an Inner Join. Smythe in Dept_No 10 hashes to SPU 1 but has no match. Turn the page. Page 254
Chapter 10
Join Functions
Quiz – Redistribute the Orders to the Proper Distribution Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________
11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
Distribution 11111111 Billy's Best Choice
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Distribution
Distribution
31313131 Acme Products 57896883 XYZ Plumbing
31323134 ACE Consulting 87323456 Databases N-U
If the Azure SQL Data Warehouse decides to Redistribute the Order_Table by Customer_No, which nodes will hold which Orders? Place their Customer_Number and Order_Total on the node after Redistribution.
Fill in the quiz above. This is a great opportunity to understand the Azure SQL Data Warehouse engine.
Page 255
Chapter 10
Join Functions
Answer to Redistribute the Employees by their Dept_No Quiz Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
Distribution 11111111 Billy's Best Choice 11111111 12347.53 11111111 8005.91
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
Distribution
Distribution
31313131 Acme Products 57896883 XYZ Plumbing
31323134 ACE Consulting 87323456 Databases N-U
57896883 23454.84
31323134 5111.47 87323456 15231.62
It is no coincidence that when Customer_Number 11111111 was hashed every 11111111 row went to Distribution 1.
Each row redistributed to the same Distribution as its matching row.
Page 256
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 10
Join Functions
Strategy 2 of 4 – The Hash Join 1) The rows to be joined must be located on a common Distribution 2) The smaller spool is sorted by the ROWHASH calculated over the join column(s) and is kept in the cache (memory) 3) The bigger spool stays unsorted
3
Sorted by Row Hash
1
2
2
4
3
1
Unsorted
5 The bigger spool scanned row by row and then each ROWID from the bigger spool is searched in the smaller spool (by means of a binary search)
The Hash Join takes advantage of memory and loads the entire smaller spool into Cache memory. Then, each row from the bigger spool is joined one at a time by doing a binary search (on the sorted smaller spool).
Page 257
Chapter 10
Join Functions
Strategy 4 of 4 – The Product Join The rows to be joined have to be located on the same Distribution No spool needs to be sorted!
3
Unsorted
2
2
1
4
3
1
5 A full table scan is done on the smaller spool and each qualifying row of spool 1 is compared against each row of spool 2
The Product Join takes is not well received. Keep an eye on it. It could be a bad sign.
Page 258
Chapter 10
Join Functions
A Two-Table Join Using Traditional Syntax Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT Customer_Table.Customer_Number The column ,Customer_Name Customer_Number is in both ,Order_Number tables. It must be fully ,Order_Total qualified with the table name FROM Customer_Table, or it errors. Order_Table WHERE Customer_Table.Customer_Number = Order_Table.Customer_Number ; Customer_Number is the column that has matching data in both tables. This is called the "Join Condition"
A Join combines columns on the report from more than one table. The example above joins the Customer_Table and the Order_Table together. The most complicated part of any join is the JOIN CONDITION. The JOIN CONDITION is which Column from each table is a match. In this case, Customer_Number is a match that establishes the relationship, so this join will happen on matching Customer_Number columns. Page 259
Chapter 10
Join Functions
A two-table join using Non-ANSI Syntax with Table Alias Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT The column Customer_Number is in both tables. It must be fully qualified or it errors.
Cust.Customer_Number ,Customer_Name We alias the table ,Order_Number names to shorten the typing when ,Order_Total fully qualifying a FROM Customer_Table as Cust, column. Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number;
A Join combines columns on the report from more than one table. The example above joins the Customer_Table and the Order_Table together. The most complicated part of any join is the JOIN CONDITION. The JOIN CONDITION means which Column from each table is a match. In this case, Customer_Number is a match that establishes the relationship.
Page 260
Chapter 10
Join Functions
You Can Fully Qualify All Columns Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
The column Customer_Number is in both tables. It must be fully qualified or it errors.
SELECT
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
A good practice is
Cust.Customer_Number to fully qualify all ,Cust.Customer_Name columns in the SELECT list for ,Ord.Order_Number clarity to other ,Ord.Order_Total users. FROM Customer_Table as Cust, Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number ;
Whenever a column is in both tables, you must fully qualify it when doing a join. You don't have to fully qualify tables that are only in one of the tables because the system knows which table that particular column is in. You can choose to fully qualify every column if you like. This is a good practice because it is more apparent which columns belong to which tables for anyone else looking at your SQL. Page 261
Chapter 10
Join Functions
A two-table join using ANSI Syntax Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
ON Keyword is used instead of WHERE
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number ;
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
INNER JOIN Keyword replaces the comma
This is the same join as the previous slide except it is using ANSI syntax. Both will return the same rows with the same performance. Rows are joined when the Customer_Number matches on both tables, but non-matches won’t return. Page 262
Chapter 10
Join Functions
Both Queries have the same Results and Performance Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
Traditional Syntax SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust, Order_Table as ORD WHERE Cust.Customer_Number = Ord.Customer_Number ;
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
ANSI Syntax SELECT Cust.Customer_Number, Customer_Name, Order_Number, Order_Total FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number ;
Both of these syntax techniques bring back the same result set and have the same performance. The INNER JOIN is considered ANSI. Which one does Outer Joins?
Page 263
Chapter 10
Join Functions
Quiz – Can You Finish the Join Syntax? Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON Finish the Join
Finish this join by placing the missing SQL in the proper place!
Page 264
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 10
Join Functions
Answer to Quiz – Can You Finish the Join Syntax? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Primary Key
Foreign Key
SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
This query is ready to run. Page 265
Dept_No is the column that both tables have in common. This is called a Primary Key/Foreign Key relationship
Chapter 10
Join Functions
Quiz – Can You Find the Error? Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________
SELECT First_Name ,Last_Name ,Dept_No ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
This query has an error! Can you find it?
Page 266
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Can you find the error?
Chapter 10
Join Functions
Answer to Quiz – Can You Find the Error? Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
The column Dept_No is in both tables. It needs to be fully qualified as E.Dept_No or D.Dept_No
Department_Table Dept_No ________________ Department_Name ________
SELECT First_Name ,Last_Name ,E.Dept_No ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
If a column in the SELECT list is in both tables, you must fully qualify it.
Page 267
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 10
Join Functions
Super Quiz – Can You Find the Difficult Error? Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
SELECT First_Name ,Last_Name ,E.Dept_No ,Department_Name Can you find FROM Employee_Table as E the error? INNER JOIN Department_Table as D ON Employee_Table.Dept_No = D.Dept_No ; This query has an error! Can you find it?
Page 268
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 10
Join Functions
Answer to Super Quiz – Can You Find the Difficult Error? Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name, Last_Name, E.Dept_No ,Department_Name Once you FROM Employee_Table as E alias a table INNER JOIN (as E) Department_Table as D ON Employee_Table.Dept_No = D.Dept_No ; You must fully qualify with E.Dept_No (Not Employee_Table.Dept_No) (This query thinks there are three tables (E, D, and Employee_Table)
If a column in the SELECT list is in both tables, you must fully qualify it. Once you create an alias you must use the alias.
Page 269
Chapter 10
Join Functions
Quiz – Which rows from both tables won’t Return? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,E.Last_Name ,D.Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
This inner join will return all rows that have a matching Dept_No in both tables. Which rows won't return?
An Inner Join returns matching rows, but did you know an Outer Join returns both matching rows and nonmatching rows? You will understand soon! Page 270
Chapter 10
Join Functions
Answer to Quiz – Which rows from both tables Won’t Return? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,E.Last_Name ,D.Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
1 2 3
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Squiggy Jones has a NULLDept_No Richard Smythe has an invalid Dept_No 10
No Employees work in Department 500
The bottom line is that the three rows excluded did not have a matching Dept_No.
Page 271
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 10
Join Functions
LEFT OUTER JOIN Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
1st Table after FROM is always the LEFT Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
Department_Table Dept_No ________________ Department_Name ________
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Since we are doing a Left Outer Join, the Employee_Table is referred to as the outer table.
This is a LEFT OUTER JOIN. That means that all rows from the LEFT Table will appear in the report regardless if it finds a match on the right table. Page 272
Chapter 10
Join Functions
LEFT OUTER JOIN Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Department_Name ________________ Marketing Customer Support Customer Support Sales Nulls show ? mismatches ? Customer Support Research and Dev Research and Dev
Marketing Research and Dev Sales Customer Support Human Resources
The matching rows return just like an inner join, but orphaned rows from the Left table also return.
A LEFT Outer Join Returns all rows from the LEFT Table including all Matches. If a LEFT row can’t find a match, a NULL is placed on right columns not found! Page 273
Chapter 10
Join Functions
RIGHT OUTER JOIN Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
2nd Table after FROM is always the RIGHT Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E RIGHT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
Department_Table Dept_No ________________ Department_Name ________
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Since we are doing a Right Outer Join, the Department_Table is referred to as the outer table.
This is a RIGHT OUTER JOIN. That means that all rows from the RIGHT Table will appear in the report regardless if it finds a match with the LEFT Table. Page 274
Chapter 10
Join Functions
RIGHT OUTER JOIN Example and Results Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E RIGHT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ; Nulls show mismatches
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
First_Name __________ Department_Name ________________ Mandee Herbert William Loraine Cletus Billy John ?
Marketing Customer Support Customer Support Sales Customer Support Research and Dev Research and Dev Human Resources
The matching rows return just like an inner join, but orphaned rows from the Right table also return.
All rows from the Right Table were returned with matches, but since Dept_No 500 didn’t have a match, the system put a NULL Value for Left Column values. Page 275
Chapter 10
Join Functions
FULL OUTER JOIN Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E FULL OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
Department_Table Dept_No ________________ Department_Name ________
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Since we are doing a Full Outer Join, both tables are referred to as the outer table.
This is a FULL OUTER JOIN. That means that all rows from both the RIGHT and LEFT Table will appear in the report regardless if it finds a match.
Page 276
Chapter 10
Join Functions
FULL OUTER JOIN Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT E.First_Name ,D.Department_Name FROM Employee_Table as E FULL OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No ;
First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John ?
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Department_Name ________________ Marketing Customer Support Customer Support Sales ? ? Customer Support Research and Dev Research and Dev Human Resources
The FULL Outer Join Returns all rows from both Tables. NULLs show the flaws!
Page 277
All rows return from both tables on a Full Outer Join
Chapter 10
Join Functions
Which Tables are the Left and which Tables are Right? Fill in the blank. Is the SELECT Cla.Claim_Id, table a Left Table or a Cla.Claim_Date, Right Table? SUB.Last_Name, SUB.First_Name, Claims __________ "ADD".Phone, Providers __________ Services __________ SER.Service_Pay, Subscribers __________ PRO.Provider_Code, Addresses __________ PRO.Provider_Name FROM CLAIMS Cla LEFT OUTER JOIN PROVIDERS PRO ON Cla.Provider_No = PRO.Provider_Code LEFT OUTER JOIN SERVICES SER ON Cla.Claim_Service = SER.Service_Code LEFT OUTER JOIN SUBSCRIBERS SUB ON Cla.Subscriber_No = SUB.Subscriber_No AND Cla.Member_No = SUB.Member_No LEFT OUTER JOIN ADDRESSES "ADD" ON SUB.Subscriber_No = "ADD".Subscriber_No;
The Can you list which tables above are left tables and which tables are right tables?
Page 278
Chapter 10
Join Functions
Answer - Which Tables are the Left and Which are the Right? Fill in the blank. SELECT Cla.Claim_Id, Is the table a Left Cla.Claim_Date, Table or a Right SUB.Last_Name, Table? SUB.First_Name, Claims Left "ADD".Phone, Providers Right SER.Service_Pay, Services Right PRO.Provider_Code, Subscribers Right PRO.Provider_Name Addresses Right FROM CLAIMS Cla LEFT OUTER JOIN PROVIDERS PRO ON Cla.Provider_No = PRO.Provider_Code LEFT OUTER JOIN SERVICES SER ON Cla.Claim_Service = SER.Service_Code LEFT OUTER JOIN SUBSCRIBERS SUB ON Cla.Subscriber_No = SUB.Subscriber_No AND Cla.Member_No = SUB.Member_No LEFT OUTER JOIN ADDRESSES "ADD" ON SUB.Subscriber_No = "ADD".Subscriber_No;
There is always only one Left table (the first table after the FROM clause) All tables after the first table are each Right Tables.
Tables are joined two at a time. The result from each join remains the Left Table
The first table is always the left table and the rest are right tables. The results from the first two tables being joined becomes the left table.
Page 279
Chapter 10
Join Functions
INNER JOIN with Additional AND Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE E.Dept_No = D.Dept_No AND Department_Name like 'Marke%' ; The additional AND is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating rows that don't qualify. Page 280
Chapter 10
Join Functions
ANSI INNER JOIN with Additional AND Clause Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND Department_Name like 'Marke%' ;
The additional AND is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating after.
Page 281
Chapter 10
Join Functions
ANSI INNER JOIN with Additional WHERE Clause Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D ON E.Dept_No = D.Dept_No WHERE Department_Name like 'Marke%' ;
The additional WHERE is performed first in order to eliminate unwanted data, so the join is less intensive than joining everything first and then eliminating. Page 282
Chapter 10
Join Functions
OUTER JOIN with Additional WHERE Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT First_Name, Last_Name, Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No WHERE E.Dept_No = 100 ;
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
__________ First_Name Department_Name _______________ Marketing Mandee
Only Mandee Chambers is in Dept_No 100
The additional WHERE is performed last on Outer Joins. All rows will be joined first and then the additional WHERE clause filters after the join takes place.
Page 283
Chapter 10
Join Functions
OUTER JOIN with Additional AND Clause Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name ,Department_Name AS Dname FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND E.Dept_No = 100 ;
The additional AND is performed in conjunction with the ON statement on Outer Joins. All rows will be evaluated with the ON clause and the AND combined.
Page 284
Chapter 10
Join Functions
OUTER JOIN with Additional AND Clause Results Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
OUTER Join with additional AND Clause SELECT First_Name ,Department_Name AS Dname FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND E.Dept_No = 100 ;
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
First_Name __________ Mandee Herbert William Loraine Squiggy Richard Cletus Billy John
Dname ________ Marketing ? ? ? ? ? ? ? ?
The additional AND is performed in conjunction with the ON statement on Outer Joins. This can surprise you. Only Mandee is in Dept_No 100, so she showed up like expected, but an outer join returns non-matches also. Ouch!!!
Page 285
Chapter 10
Join Functions
Quiz – Why is this considered an INNER JOIN? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name, Department_Name FROM Employee_Table as E LEFT OUTER JOIN Department_Table as D ON E.Dept_No = D.Dept_No AND D.Dept_No = 400 ;
This is considered an INNER JOIN because we are doing a LEFT OUTER JOIN on the Employee_Table and then filtering with the AND for a column in the right table!
Page 286
Chapter 10
Join Functions
Evaluation Order for Outer Queries SELECT Cou.*, STU1.* FROM COURSE_TABLE Cou LEFT OUTER JOIN STUDENT_COURSE_TABLE STU ON Cou.Course_Id = STU.Course_Id LEFT OUTER JOIN STUDENT_TABLE STU1 ON STU.Student_Id = STU1.Student_Id;
The Order in which Server evaluates Outer Queries
1
The first ON clause in the query (reading from left to right).
2
Any ON clause applies to its immediately preceding join operation.
3
Parenthesis can be used to override the natural left to right order.
When you perform an inner join the Azure SQL Data Warehouse considers this to be both commutative and associative. That means that two tables being inner joined will easily come up with the intended answer. This allows the optimizer to select the best join order between tables. This is because the end result will be the same. Outer Joins are different. They will follow the above three rules for evaluation order by the Optimizer.
Page 287
Chapter 10
Join Functions
The DREADED Product Join Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
No Join Condition Linking the Two Tables!
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE Department_Name like '%m%' Order by 1, 2, 3;
This query becomes a Product Join because it does not possess any JOIN Conditions (Join Keys). Every row from one table is compared to every row of the other table, and quite often, the data is not what you intended to get back.
Page 288
Chapter 10
Join Functions
The DREADED Product Join Results
No Join Condition Linking the Two Tables!
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D WHERE Department_Name like '%m%' Order by 1, 2, 3;
First_Name _________ Last_Name _________ Department_Name ________________
Not all rows are displayed
Billy Billy Billy Billy Cletus Cletus Cletus Cletus Herbert
Coffing Coffing Coffing Coffing Strickling Strickling Strickling Strickling Harrison
Customer Support Human Resources Marketing Research and Development Customer Support Human Resources Marketing Research and Development Marketing
36 Rows came back. Nine employees with each working in three different departments. This data is WRONG!
How can Billy Coffing work in 4 different departments?
A Product Join is often a mistake! 4 Department rows had an ‘m’ in their name, so these were joined to every employee, and the information is worthless.
Page 289
Chapter 10
Join Functions
The Horrifying Cartesian Product Join Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
No WHERE Clause in the join!
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E, Department_Table as D
A Cartesian Product Join is usually a big mistake.
Page 290
Department_Table
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
This joins every row from one table to every row of another table. 9 rows multiplied by 5 rows = 45 rows of complete nonsense!
Chapter 10
Join Functions
The ANSI Cartesian Join will ERROR Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
No ON Clause in the join!
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT First_Name ,Last_Name ,Department_Name FROM Employee_Table as E INNER JOIN Department_Table as D
Dept_No ________________ Department_Name ________ 100 200 300 400 500
This query Errors because ANSI forbids joins without ON clauses.
Error
This causes an error. ANSI won’t let this run unless a join condition is present.
Page 291
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 10
Join Functions
Quiz – Do these Joins Return the Same Answer Set? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Query 1 SELECT First_Name, Department_Name FROM Employee_Table INNER JOIN Department_Table ;
Do these two queries produce the same result?
Page 292
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Query 2 SELECT First_Name, Department_Name FROM Employee_Table, Department_Table ;
Chapter 10
Join Functions
Answer – Do these Joins Return the Same Answer Set? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Query 1 SELECT First_Name, Department_Name FROM Employee_Table INNER JOIN Department_Table ;
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Query 2 SELECT First_Name, Department_Name FROM Employee_Table, Department_Table ; Cartesian product join occurs
This query errors
Do these two queries produce the same result? No, Query 1 Errors due to ANSI syntax and no ON Clause, but Query 2 Product Joins to bring back junk! Page 293
Chapter 10
Join Functions
The CROSS JOIN Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
A Cross Join is the ANSI equivalent to a Product Join Only a WHERE will work. ON Will NOT!
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT Customer_Name, Order_Number FROM Customer_Table CROSS JOIN Order_Table WHERE Order_Number = 123456 ORDER BY 1 ;
This query becomes a Product Join because a Cross Join is an ANSI Product Join. It will compare every row from the Customer_Table to Order_Number 123456 in the Order_Table. Check out the Answer Set on the next page.
Page 294
Chapter 10
Join Functions
The CROSS JOIN Answer Set Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Answer Set
SELECT Customer_Name, Order_Number FROM Customer_Table CROSS JOIN Order_Table WHERE Order_Number = 123456 ORDER BY 1 ;
Customer_Name ______________ Order_Number _____________ Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
This Cross Join produces information that just isn’t worth anything quite often!
Page 295
123456 123456 123456 123456 123456
Chapter 10
Join Functions
The Self Join Employee_Table2 Employee_No Dept_No Last_Name First_Name Salary ____________ _______ _________ _________ _______ 1232578 100 Chambers Mandee 48850.00 54500.00 1256349 400 Harrison Herbert 2341218 400 Reilly William 36000.00 54500.00 1121334 400 Strickling Cletus 2312225 300 Larkins Loraine 40200.00 2000000 ? Jones Squiggy 32800.50 1000234 10 Smythe Richard 32800.00 41888.88 1324657 200 Coffing Billy 48000.00 1333454 200 Smith John SELECT Mgrs.Dept_No , Mgrs.Last_Name as MgrName , Mgrs.Salary as MgrSal , Emps.Last_Name as EmpName , Emps.Salary as Empsal FROM Employee_Table2 as Emps, Employee_Table2 as Mgrs WHERE Emps.Dept_No = Mgrs.Dept_No AND Mgrs.Mgr = 'Y' AND Emps.Salary > Mgrs.Salary ;
Mgr ____ Y N Y N Y N N N Y
Which Workers make a bigger Salary than their Manager?
A Self Join gives itself 2 different Aliases, which is then seen as two different tables. Page 296
Chapter 10
Join Functions
The Self Join with ANSI Syntax Employee_Table2 Employee_No Dept_No Last_Name First_Name Salary ____________ _______ _________ _________ _______ 1232578 100 Chambers Mandee 48850.00 54500.00 1256349 400 Harrison Herbert 2341218 400 Reilly William 36000.00 54500.00 1121334 400 Strickling Cletus 2312225 300 Larkins Loraine 40200.00 2000000 ? Jones Squiggy 32800.50 1000234 10 Smythe Richard 32800.00 41888.88 1324657 200 Coffing Billy 48000.00 1333454 200 Smith John SELECT Mgrs.Dept_No , Mgrs.Last_Name as MgrName , Mgrs.Salary as MgrSal , Emps.Last_Name as EmpName , Emps.Salary as Empsal FROM Employee_Table2 as Emps INNER JOIN Employee_Table2 as Mgrs ON Emps.Dept_No = Mgrs.Dept_No WHERE Mgrs.Mgr = 'Y' AND Emps.Salary > Mgrs.Salary ;
Mgr ____ Y N Y N Y N N N Y
Which Workers make a bigger Salary than their Manager?
A Self Join gives itself 2 different Aliases, which is then seen as two different tables.
Page 297
Chapter 10
Join Functions
Quiz – Will both queries bring back the same Answer Set? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1; Will both queries bring back the same result set?
Page 298
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;
Chapter 10
Join Functions
Answer – Will both queries bring back the same Answer Set? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
SELECT * FROM Customer_Table as Cust INNER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;
Will both queries bring back the same result set? Yes! Because they’re both inner joins.
Page 299
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 10
Join Functions
Quiz – Will both queries bring back the same Answer Set? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;
Will both queries bring back the same result set? Page 300
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;
Chapter 10
Join Functions
Answer – Will both queries bring back the same Answer Set? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number WHERE Customer_Name like 'Billy%' ORDER BY 1;
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
SELECT * FROM Customer_Table as Cust LEFT OUTER JOIN Order_Table as ORD ON Cust.Customer_Number = Ord.Customer_Number AND Customer_Name like 'Billy%' ORDER BY 1;
Will both queries bring back the same result set? NO! The WHERE is performed last.
Page 301
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 10
Join Functions
How would you Join these two tables? Course_Table Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16 Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
How would you join these two tables together? You can't do it. There is no matching column with like data. There is no Primary Key/Foreign Key relationship between these two tables. That is why you are about to be introduced to a bridge table. It is formally called an Associative table or a Lookup table. Page 302
Chapter 10
Join Functions
An Associative Table is a Bridge that Joins Two Tables Associative
Course_Table
Table
Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16
Student_Course_Table Student_ID Course_ID 280023 231222 125634 231222 125634 322133 125634 322133 324652 333450 260000 333450 234121 123250
210 210 100 220 200 220 220 300 200 500 400 400 100 100
Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
The Associative Table is a bridge between the Course_Table and Student_Table.
Page 303
Chapter 10
Join Functions
Quiz – Can you write the 3-Table Join? Associative
Course_Table
Table
Course_ID Course_Name Credits _________ _________________ ______ Seats ____ 100 Database Concepts 3 50 200 Introduction to SQL 3 20 210 Advanced SQL 3 22 220 V2R3 SQL Features 2 25 300 Physical Database Design 4 20 400 Database Administration 4 16
Student_Course_Table Student_ID Course_ID 280023 231222 125634 231222 125634 322133 125634 322133 324652 333450 260000 333450 234121 123250
210 210 100 220 200 220 220 300 200 500 400 400 100 100
Student_Table Student_ID __________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name __________ Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
First_Name Class_Code Grade_Pt __________ __________ ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
SELECT ALL Columns from the Course_Table and Student_Table and Join them. Page 304
Chapter 10
Join Functions
Answer to Quiz – Can you Write the 3-Table Join? Student_Course_Table Student_Table
Student_ID Last_Name First_Name Class_Code Grade_Pt
Course_Table Student_ID Course_ID
SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ;
Course_ID Course_Name Credits Seats
Notice the * technique of getting ALL columns from both tables!
The Associative Table is a bridge between the Course_Table and Student_Table, and its sole purpose is to join these two tables together. Page 305
Chapter 10
Join Functions
Quiz – Can you write the 3-Table Join to ANSI Syntax? Student_Course_Table Student_Table
Student_ID Last_Name First_Name Class_Code Grade_Pt
Course_Table Student_ID Course_ID
Course_ID Course_Name Credits Seats
SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ; Convert this query to ANSI syntax Please re-write the above query using ANSI Syntax.
Page 306
Chapter 10
Join Functions
Answer – Can you Write the 3-Table Join to ANSI Syntax? Student_Course_Table
Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt
Course_Table Student_ID Course_ID
Course_ID Course_Name Credits Seats
ANSI Syntax Traditional Syntax SELECT S.*, C.* FROM Student_Table as S, Course_Table as C, Student_Course_Table as SC Where S.Student_ID = SC.Student_ID AND C.Course_ID = SC.Course_ID ;
Select S.*, C.* From Student_Table as S INNER JOIN Student_Course_Table as SC ON S.Student_ID = SC.Student_ID INNER JOIN Course_Table as C ON C.Course_ID = SC.Course_ID;
The above queries show both traditional and ANSI form for this three table join.
Page 307
Chapter 10
Join Functions
Quiz – Can you Place the ON Clauses at the End? Student_Course_Table Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt
Course_Table Student_ID Course_ID
Course_ID Course_Name Credits Seats
ANSI Syntax Select S.*, C.* From Student_Table as S INNER JOIN Student_Course_Table as SC ON S.Student_ID = SC.Student_ID INNER JOIN Course_Table as C ON C.Course_ID = SC.Course_ID; Please re-write the above query and place both ON Clauses at the end.
Page 308
Can you rewrite this and place all of the ON clauses at the end?
Chapter 10
Join Functions
Answer – Can you Place the ON Clauses at the End? Student_Course_Table Student_Table Student_ID Last_Name First_Name Class_Code Grade_Pt
Course_Table Student_ID Course_ID
Course_ID Course_Name Credits Seats
Select S.*, C.* The trick is to From Student_Table as S put the first ON INNER JOIN clause for the Student_Course_Table as SC last join and go INNER JOIN backwards Course_Table as C ON C.Course_ID = SC.Course_ID ON SC.Student_ID = S.Student_ID;
This is tricky. The only way it works is to place the ON clauses backwards. The first ON Clause represents the last INNER JOIN and then moves backwards. Page 309
Chapter 10
Join Functions
The 5-Table Join – Logical Insurance Model Addresses
Subscriber_No
Subscribers
Claims
Subscriber_No
Subscriber_No
Member_No
Member_No
Services Service_Code
Claim_Service
Providers Provider_Code
Provider_No
Above, is the logical model for the insurance tables showing the Primary Key and Foreign Key relationships (PK/FK).
Page 310
Chapter 10
Join Functions
Quiz - Write a Five Table Join Using ANSI Syntax Addresses
Subscriber_No
Subscribers
Claims
Subscriber_No
Subscriber_No
Member_No
Member_No
Services Service_Code
Claim_Service
Providers Provider_Code
Provider_No
Your mission is to write a five table join selecting all columns using ANSI syntax.
Page 311
Chapter 10
Join Functions
Answer - Write a Five Table Join Using ANSI Syntax SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM CLAIMS AS cla1 INNER JOIN SUBSCRIBERS AS sub1 ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No INNER JOIN ADDRESSES AS add1 ON sub1.Subscriber_No = add1.Subscriber_No INNER JOIN PROVIDERS AS pro1 ON cla1.Provider_No = pro1.Provider_Code INNER JOIN SERVICES AS ser1 ON cla1.Claim_Service = ser1.Service_Code ;
Above, is the example writing this five table join using ANSI syntax.
Page 312
Chapter 10
Join Functions
Quiz - Write a Five Table Join Using Non-ANSI Syntax Addresses
Subscriber_No
Subscribers
Claims
Subscriber_No
Subscriber_No
Member_No
Member_No
Services Service_Code
Claim_Service
Providers Provider_Code
Provider_No
Your mission is to write a five table join selecting all columns using Non-ANSI syntax.
Page 313
Chapter 10
Join Functions
Answer - Write a Five Table Join Using Non-ANSI Syntax
SELECT FROM
WHERE AND AND AND AND
cla1.*, sub1.*, add1.* ,pro1.*, ser1.* CLAIMS AS cla1, SUBSCRIBERS AS sub1, ADDRESSES AS add1, PROVIDERS AS pro1, SERVICES AS ser1 cla1.Subscriber_No = sub1.Subscriber_No cla1.Member_No = sub1.Member_No sub1.Subscriber_No = add1.Subscriber_No cla1.Provider_No = pro1.Provider_Code cla1.Claim_Service = ser1.Service_Code ;
Above, is the example writing this five table join using Non-ANSI syntax.
Page 314
Chapter 10
Join Functions
Quiz –Re-Write this putting the ON clauses at the END SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM CLAIMS AS cla1 INNER JOIN SUBSCRIBERS AS sub1 ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No INNER JOIN ADDRESSES AS add1 ON sub1.Subscriber_No = add1.Subscriber_No INNER JOIN PROVIDERS AS pro1 ON cla1.Provider_No = pro1.Provider_Code INNER JOIN SERVICES AS ser1 ON cla1.Claim_Service = ser1.Service_Code ;
Above, is the example writing this five table join using Non-ANSI syntax.
Page 315
Chapter 10
Join Functions
Answer –Re-Write this putting the ON clauses at the END SELECT cla1.*, sub1.*, add1.* ,pro1.*, ser1.* FROM PROVIDERS AS pro1 INNER JOIN ADDRESSES AS add1 INNER JOIN SUBSCRIBERS AS sub1 INNER JOIN SERVICES AS ser1 INNER JOIN CLAIMS as cla1 ON cla1.Claim_Service = ser1.Service_Code ON cla1.Subscriber_No = sub1.Subscriber_No AND cla1.Member_No = sub1.Member_No ON sub1.Subscriber_No =add1.Subscriber_No ON cla1.Provider_No = pro1.Provider_Code ;
Above is the example writing this five table join using ANSI syntax with the ON clauses at the end. We had to move the tables around also to make this happen. Notice that the first ON clause represents the last two tables being joined, and then it works backwards.
Page 316
Chapter 11
Page 317
Date Functions
Chapter 11
Date Functions
Chapter 11 – Date Function
"An inch of time cannot be bought with an inch of gold." - Chinese Proverb
Page 318
Chapter 11
Date Functions
Current_Timestamp
Above, is the keyword Current_Timestamp that allows a user to get the timestamp. This is a reserved word and so the system will deliver the timestamp to you when requested.
Page 319
Chapter 11
Date Functions
Getdate This example uses the Getdate() function to return the timestamp.
SELECT Getdate() as "The Date"; The Date -----------03/30/2015 8:46:04.567
“Not all who wander are lost.” – J. R. R. Tolkien
The Getdate command will return today's date and time just like the Current_Timestamp command. This is not ANSI.
Page 320
Chapter 11
Date Functions
Date and Time Keywords SELECT GETDATE() AS [GETDATE] , CURRENT_TIMESTAMP AS [CURRENT_TIMESTAMP] , GETUTCDATE() AS [GETUTCDATE] GETDATE CURRENT_TIMESTAMP 03/30/2015 8:42:04.833 03/30/2015 8:42:04.833 Date and Time
Date and Time ANSI
SELECT SYSDATETIME() ,SYSUTCDATETIME()
Date and Time UTC
AS [SYSDATETIME] AS [SYSUTCDATETIME]
SYSDATETIME 2015-03-30 08:42:04.8355769 Date and Time
GETUTCDATE 03/30/2015 1:42:04.833
SYSUTCDATETIME 2015-03-30 13:42:04.8355769 Date and Time UTC
The above examples show how to get the date and time. The GETDATE and CURRENT_TIMESTAMP are equivalent, but CURRENT_TIMESTAMP is ANSI compliant. The differences between the top and bottom examples are that the top has a data type of DATETIME and the bottom DATETIME2, which is an expanded form of DATETIME. Page 321
Chapter 11
Date Functions
SYSDATETIMEOFFSET Provides the Timezone Offset SELECT SYSUTCDATETIME() AS [SYSUTCDATETIME] ,SYSDATETIMEOFFSET() AS [SYSDATETIMEOFFSET]; GETUTCDATE 2015-03-30 13:42:04.8355769 Date and Time UTC
SYSDATETIMEOFFSET 2015-03-30 08:42:04.8355769 -05:00 Date and Time with a Timezone offset
The CETUTCDATE function will provide a Current_Timestamp, but in Universal Time Coordinate (UTC) time. The SYSDATETIMEOFFSET shows the timezone difference between UTC and the local Current_Timestamp.
Page 322
Chapter 11
SYSDATETIMEOFFSET Provides the Timezone Offset
This is how you can get just the current_date and the current_time..
Page 323
Date Functions
Chapter 11
Date Functions
Using both CAST and CONVERT in Literal Values SELECT CAST('20150216' AS DATE) as "Date YMD"; Date YMD 2015-02-16
SELECT CONVERT(CHAR(8), CURRENT_TIMESTAMP, 112) AS "Converted" ; Converted
20150330 This converts the current date and time to CHAR(8) by using style 112 ('YYYYMMDD')
This is an example of using the CAST function with a date literal. The first SQL example converts the character string literal ‘20150216’ to a DATE data type. The second SQL example converts the current date and current time to a CHAR (8) data type using the style 112, which is a 'YYYYMMDD' format.
Page 324
Chapter 11
Date Functions
Using Both CAST and CONVERT in Literal Values
Converts the current date and time value to a CHAR(12) by using style 114 ('hh:mm:ss.nnn').
This example converts the current date and time value to CHAR (12) by using style 114 ('hh:mm:ss.nnn').
Page 325
Chapter 11
Date Functions
Using both CAST and CONVERT in Literal Values SELECT SYSDATETIME() as "Local Time Eastern" ,SWITCHOFFSET(SYSDATETIMEOFFSET(), '-06:00') as "Timestamp Central" ,SWITCHOFFSET(SYSDATETIMEOFFSET(), '-07:00') as "Timestamp Mountain" ,SWITCHOFFSET(SYSDATETIMEOFFSET(), '-08:00') as "Timestamp Pacific" ; 2015-03-30 11:03:38.9877064 2015-03-30 10:03:38.9877064 -06:00 2015-03-30 09:03:38.9877064 -07:00 2015-03-30 08:03:38.9877064 -08:00
Local Time Eastern Timestamp Central Timestamp Mountain Timestamp Pacific
The times above are the converted times, but they are displayed vertically to save space on the screen
The SWITCHOFFSET function can be used to adjust an input DATETIMEOFFSET value to a specified time zone. We are showing in the example SQL above how to convert to Central, Mountain and Pacific time. Page 326
Chapter 11
Date Functions
The DATEADD Function
Valid values for the part input include year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, and nanosecond. You can also specify the part in abbreviated form, such as yy instead of year.
The syntax for the DATEADD function is DATEADD (part, n, date_value). Valid values for the part are year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, and nanosecond.
Page 327
Chapter 11
Date Functions
The DATEDIFF Function
The syntax for the DATEDIFF function is DATEDIFF (part, dt_val1, dt_val2). Above, we have used the literal dates of '2014-01-30 (January 30, 2014) and '2015-06-30' (June 30, 2015). We then can see the differences in the number of years, months, days, hours, minutes and seconds. Page 328
Chapter 11
Date Functions
DATEADD Function SELECT Order_Date ,DateAdd (Day, 60,Order_Date) as "Due Date" ,Order_Total ,DateAdd (Day, 50,Order_Date) as Discount ,Cast(Order_Total *.98 as Decimal(8,2)) as Discount_Total FROM Order_Table ORDER BY 1 ;
Order_Date __________ 05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999
Due Date Order_Total _________ Discount _____________ Discount_Total _________ __________ 07/03/1998 03/02/1999 11/08/1999 11/30/1999 12/09/1999
12347.53 8005.91 23454.84 5111.47 15231.62
06/23/1998 02/20/1999 10/29/1999 11/20/1999 11/29/1999
12,100.58 7,845.79 22,985.74 5,009.24 14,926.99
Valid values for the part argument include year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond, TZoffset, and ISO_WEEK. Page 329
Chapter 11
Date Functions
A Real World Example for DateAdd Using the Order Table SELECT Order_Date ,DateAdd (Day, 60,Order_Date) as "Due Date" ,Order_Total ,DateAdd (Day, 50,Order_Date) as Discount ,Cast(Order_Total *.98 as Decimal(8,2)) as Discount_Total FROM Order_Table ORDER BY 1 ;
Order_Date __________ 05/04/1998 01/01/1999 09/09/1999 10/01/1999 10/10/1999
Due Date Order_Total _________ Discount _____________ Discount_Total _________ __________ 07/03/1998 03/02/1999 11/08/1999 11/30/1999 12/09/1999
12347.53 8005.91 23454.84 5111.47 15231.62
06/23/1998 02/20/1999 10/29/1999 11/20/1999 11/29/1999
The example above uses a real world example from the Order_Table.
Page 330
12,100.58 7,845.79 22,985.74 5,009.24 14,926.99
Chapter 11
Date Functions
DATEPART Function SELECT Order_Date ,DateAdd (Day, 60,Order_Date) as "Due Date" ,Order_Total ,DateAdd (Day, 50,Order_Date) as Discount ,Cast(Order_Total *.98 as Decimal(8,2)) as Discount_Total FROM Order_Table WHERE DATEPART(Month, Order_Date) = 10 ORDER BY 1 ;
Order_Date __________
Due Date Order_Total _________ Discount _____________ Discount_Total _________ __________
10/01/1999 11/30/1999 10/10/1999 12/09/1999
5111.47 11/20/1999 15231.62 11/29/1999
5,009.24 14,926.99
This example only looks for orders that happened in October. This is done by using the DATEPART function in the WHERE clause. Valid values for the part argument include year, quarter, month, dayofyear, day, week, weekday, hour, minute, second, millisecond, microsecond, nanosecond, TZoffset, and ISO_WEEK.
Page 331
Chapter 11
Date Functions
DATEPART Function Examples SELECT * FROM Order_Table WHERE DATEPART(Year, Order_Date) = 1998 ;
Year = 1998
SELECT * FROM Order_Table WHERE DATEPART(Quarter, Order_Date) = 4 ;
Quarter = 4th
SELECT * FROM Order_Table WHERE DATEPART(Month, Order_Date) = 10 ;
Month = October
SELECT * FROM Order_Table WHERE DATEPART(Day, Order_Date) = 4 ;
Day = 4th day of the month
SELECT * FROM Order_Table WHERE DATEPART(DayofYear, Order_Date) = 1 ;
Day of year = January 1st
SELECT * FROM Order_Table WHERE DATEPART(Week, Order_Date) = 1 ;
Week = 1st week of year
SELECT * FROM Order_Table WHERE DATEPART(WeekDay, Order_Date) = 1 ;
Week Day = Sunday
Above, are some excellent examples to pull from using the DATEPART function. Page 332
Chapter 11
Date Functions
YEAR, MONTH, and DAY Functions SELECT Order_Date ,Year(Order_Date) as "Yr" ,Month(Order_Date) as "Mo" ,Day(Order_Date) as "Day" FROM Order_Table ORDER BY 1 ;
Order_Date ____ Yr Mo __________ ___ 1998-05-04 1999-01-01 1999-09-09 1999-10-01 1999-10-10
1998 1999 1999 1999 1999
5 1 9 10 10
Day ____ 4 1 9 1 10
The YEAR, MONTH, and DAY functions are abbreviations for the DATEPART function. Page 333
Chapter 11
Date Functions
A Better Technique for YEAR, MONTH, and DAY Functions SELECT Order_Number, Customer_Number, Order_Date, Order_Total FROM Order_Table WHERE YEAR(order_date) = 1999 AND MONTH(order_date) = 10;
SELECT Order_Number, Customer_Number, Order_Date, Order_Total FROM Order_Table This approach is more efficient for SQL WHERE order_date >= '19991001' Server and Azure SQL Data Warehouse. AND order_date < '19991101' Indexes can take advantage of this technique!
Both queries above do the same thing and deliver the same result set, but the bottom query could be much faster.
Order_Number ________________ Customer_Number Order_Date _____________ __________ Order_Total __________ 123552 123585
31323134 87323456
1999-10-01 1999-10-10
5111.47 15231.62
Above, is the tale of two queries. The top query applies manipulation on the filtered column. In most cases the Azure SQL Data Warehouse can’t use an index efficiently when using this technique. The bottom query uses a range filter instead. Page 334
Chapter 11
Date Functions
DATENAME Function SELECT Order_Date ,DATENAME(Year, Order_Date) as "Yr" ,DATENAME(Month, Order_Date) as "Mo" ,DATENAME(Day, Order_Date) as "Day" FROM Order_Table ORDER BY 1 ;
Order_Date ____ Yr __________ 1998-05-04 1999-01-01 1999-09-09 1999-10-01 1999-10-10
1998 1999 1999 1999 1999
Mo _________
Day ____
May January September October October
4 1 9 1 10
The DATENAME function returns the name of the requested part rather than the number. Notice above that only the Month returns the actual name of the month, but both the Year and the Day still return the integer values.
Page 335
Chapter 11
Date Functions
ISDATE Function
T The ISDATE function accepts a character string as input and returns a Boolean. ISDATE returns a 1 if it is convertible to a date and time data type. It returns a 0 if it is not convertible to a date and time data type. Above, we have used the date of February 29th. This is only a valid date during a leap year. It only returns a 1 when the date is valid.
Page 336
Chapter 12
Page 337
Temporary Tables
Chapter 12
Temporary Tables
Chapter 12 - Temporary Tables
“Graffiti’s always been a temporary art form. You make your mark and then they scrub it off.” - Banksy
Page 338
Chapter 12
Temporary Tables
Temporary Tables Derived Tables • • • •
Is a SELECT Statement with a SELECT Statement Is purely logical as opposed to physical Exists only within a query Has its execution optimized at run time Temporary Table
Is always created as #tablename Space comes from tempdb Can only be used by the connection that created the table Can be created by the User, then populated with an INSERT/SELECT Table and Data are deleted after the connection that created the table is closed
Derived tables exist for the life of a single query, but the database tempdb is used by the Azure SQL Data Warehouse system for local temporary tables. A local temporary table is created using a (pound sign) # prefix before the table name. Each temporary table that is created can only be accessed by the user who created it and only in the session that created it.
Page 339
Chapter 12
Temporary Tables
CREATING A Derived Table • • • •
Is a SELECT Statement with a SELECT Statement Is purely logical as opposed to physical Exists only within a query Has its execution optimized at run time along with the rest of the query
SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; A query within a query.
AVGSAL ________ 46782.15
Answer Set
The SELECT Statement that creates and populates the Derived table is always inside Parentheses.
Page 340
Chapter 12
Temporary Tables
Naming the Derived Table SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ;
The name of the Derived Table is TeraTom
AVGSAL ________ 46782.15
Answer Set
In the example above, TeraTom is the name we gave the Derived Table. It is mandatory that you always name the table or its errors.
Page 341
Chapter 12
Temporary Tables
Aliasing the Column Names in the Derived Table SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; AVGSALis the Column Name in the derived table named TeraTom
AVGSAL ________
46782.15
Answer Set
AVGSAL is the name we gave to the column in our Derived Table that we call TeraTom. Our SELECT (which builds the columns) shows we are only going to have one column in our derived table, and we have named that column AVGSAL.
Page 342
Chapter 12
Temporary Tables
Multiple Ways to Alias the Columns in a Derived Table 1
SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) AS TeraTom(AVGSAL) ; The derived table must always be named
2
SELECT * FROM (SELECT AVG(salary) AS AVGSAL FROM Employee_Table) AS TeraTom ; The derived table must always be named
Page 343
Alias CAN be done here
Alias CAN be done inside the derived SELECT statement
Chapter 12
Temporary Tables
CREATING a Derived Table using the WITH Command Create the Derived Table while we run the query!
WITH TeraTom(AVGSAL) AS (SELECT AVG(salary)FROM Employee_Table) SELECT * FROM TeraTom ;
AVGSAL ________ 46782.15
Answer Set
When using the WITH Command, we can CREATE our Derived table while running the main query.
Page 344
Chapter 12
Temporary Tables
The Same Derived Query shown Three Different Ways
1
SELECT * FROM (SELECT AVG(salary) FROM Employee_Table) TeraTom (AVGSAL) ; Alias CAN be done here or here
2
3
Page 345
SELECT * FROM (SELECT AVG(salary) as AVGSAL FROM Employee_Table) TeraTom ;
WITH TeraTom(AVGSAL) AS (SELECT AVG(salary)FROM Employee_Table) SELECT * FROM TeraTom ;
Chapter 12
Temporary Tables
MULTIPLE Derived Tables using the WITH Command 1st Derived Table
2nd Derived Table
WITH WellPaid(Employee_No, Last_Name) AS (SELECT Employee_No, Last_Name FROM Employee_Table WHERE Salary > (SELECT AVG(Salary) FROM Employee_Table)) ,DeptMgr(Mgr_No, Department_Name) AS (SELECT Mgr_No, Department_Name FROM Department_Table INNER JOIN WellPaid ON (Employee_No = Mgr_No)) SELECT Last_Name AS WellPaidMgr ,Department_Name FROM WellPaid INNER JOIN DeptMgr ON (Employee_No = Mgr_No) ;
Using the WITH Command, we can CREATE multiple Derived tables that can be referenced elsewhere in the query.
Page 346
Chapter 12
Temporary Tables
Column Alias Can Default For Normal Columns I don't need to alias this SELECT E.*, AVGSAL because it can default to its FROM Employee_Table as E current name INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table is built first
In a derived table, you will always have a SELECT query in parenthesis, and you will always name the table. You have options when aliasing the columns. As in the example above, you can let normal columns default to their current name. Page 347
Chapter 12
Temporary Tables
Most Derived Tables Are Used To Join To Other Tables SELECT E.*, AVGSAL The SELECT is the FROM Employee_Table as E Derived Table INNER JOIN (SELECT Dept_No, AVG(salary) FROM Employee_Table GROUP BY Dept_No) AS TeraTom (Dept_No, AVGSAL) ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
The derived table name is TeraTom
The columns are aliased
Employee_No _______ Dept_No Last_Name First_Name ______ Salary ___________ ________ ________ 1000234 1232578 1324657 1333454 2312225 1121334 1256349 2341218
10 100 200 200 300 400 400 400
Smythe Chambers Coffing Smith Larkins Strickling Harrison Reilly
Richard Mandee Billy John Loraine Cletus Herbert William
64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 54500.00 36000.00
AVGSAL _______ 64300.00 48850.00 44944.44 44944.44 40200.00 48333.33 48333.33 48333.33
The first five columns in the Answer Set came from the Employee_Table. AVGSAL came from the derived table named TeraTom
Page 348
Chapter 12
Temporary Tables
A Join Example Showing Different Column Alias Styles SELECT E.*, AVGSAL This does not need an alias because it can default to its FROM Employee_Table as E current name INNER JOIN (SELECT Dept_No as Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom This must have ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
an alias because it is an aggregate
Employee_No ________ Dept_No _________ Last_Name _________ First_Name _______ Salary AVGSAL __________ _______ 1000234 1232578 1324657 1333454 2312225 1121334 1256349 2341218
Page 349
10 100 200 200 300 400 400 400
Smythe Chambers Coffing Smith Larkins Strickling Harrison Reilly
Richard Mandee Billy John Loraine Cletus Herbert William
64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 54500.00 36000.00
64300.00 48850.00 44944.44 44944.44 40200.00 48333.33 48333.33 48333.33
Chapter 12
Temporary Tables
The Three Components of a Derived Table SELECT E.*, Salary, AVGSAL FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table is optimized with the rest of the query.
1
A derived table is a SELECT query. The SELECT query always starts with an open parenthesis and ends with a close parenthesis.
2
The derived table must be given a name. Above we called our derived table TeraTom.
3
You will need to define (alias) the columns in the derived table. Above we could allow Dept_No to default to Dept_No, but we had to specifically alias AVG(Salary) as AVGSAL.
Every derived table must have the three components listed above
Page 350
TeraTom
Chapter 12
Temporary Tables
Visualize This Derived Table SELECT E.*, (Salary - AVGSAL) as PlusMinAvg FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) AS TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
Employee_No ____________ Dept_No ________ 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 1256349 400 2341218 400
TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33 The derived table is built first
Last_Name Salary PlusMinAvg ___________ First_Name ___________ ________ ___________ Smythe Richard 64300.00 0.00 Chambers Mandee 48850.00 0.00 Coffing Billy 41888.88 -3055.56 Smith John 48000.00 3055.56 Larkins Loraine 40200.00 0.00 Strickling Cletus 54500.00 6166.67 Harrison Herbert 54500.00 6166.67 Reilly William 36000.00 -12333.33
Our example above shows the data in the derived table named TeraTom. This query allows us to see each employee and the plus or minus avg of their salary compared to the other workers in their department.
Page 351
Chapter 12
Temporary Tables
Our Join Example With The WITH Syntax WITH TeraTom (Dept_No, AVGSAL) AS (SELECT Dept_No , AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT E.*, AVGSAL FROM Employee_Table as E INNER JOIN TeraTom ON E.Dept_No = TeraTom.Dept_No ORDER BY E.Dept_No ;
Now, the lower portion of the query refers to TeraTom Almost like it is a permanent table, but it is not!
Page 352
TeraTom Dept_No AVGSAL ________ ________ ? 32800.50 10 64300.00 100 48850.00 200 44944.44 300 40200.00 400 48333.33
Chapter 12
Temporary Tables
Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;
1) What is the name of the derived table? 2) How many columns are in the derived table? 3) What is the name of the derived table columns?
4) Is there more than one row in the derived table? 5) What common keys join the Employee and Derived? 6) Why were the join keys named differently?
Page 353
Chapter 12
Temporary Tables
Answer to Quiz - Answer the Questions SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty ;
1) What is the name of the derived table? TeraTom 2) How many columns are in the derived table? 2
3) What’s the name of the derived columns? Depty and AVGSAL 4) Is their more than one row in the derived table? Yes 5) What keys join the tables? Dept_No and Depty 6) Why were the join keys named differently? If both were named Dept_No, we would error unless we full qualified.
Page 354
Chapter 12
Temporary Tables
Clever Tricks on Aliasing Columns in a Derived Table SELECT Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table Alias Here INNER JOIN
1
(SELECT Dept_No as Depty, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON Dept_No = Depty ;
SELECT E.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN Alias Here
2
(SELECT Dept_No, AVG(Salary) as AVGSAL FROM Employee_Table GROUP BY Dept_No) as TeraTom ON E.Dept_No = TeraTom.Dept_No ;
Page 355
Chapter 12
Temporary Tables
A Derived Table lives only for the lifetime of a single query
First query
1
Second query
WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, First_Name, Last_Name, AVGSAL FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No ;
SELECT * FROM T ;
2
The semi-colon (;) indicates the end of the query.
Page 356
Error – Query Fails…. T does Not exist.
Chapter 12
Temporary Tables
An Example of Two Derived Tables in a Single Query WITH T (Dept_No, AVGSAL) AS (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) SELECT T.Dept_No, E.First_Name, E.Last_Name, T.AVGSAL, S.Counter FROM Employee_Table as E INNER JOIN T ON E.Dept_No = T.Dept_No INNER JOIN (SELECT Employee_No, Row_Number() OVER (PARTITION BY Dept_No ORDER BY Dept_No, Last_Name) FROM Employee_Table) as S (Employee_No, Counter) ON E.Employee_No = S.Employee_No ORDER BY T.Dept_No;
Page 357
Chapter 12
Temporary Tables
RECURSIVE Derived Table Hierarchy TeraTom Coffing CEO
Jane Stevens VP North
Ricardo Gonzales VP South
Hitesh Patel North Manager
Inquayee Mumba South Manager
North Analysts
South Analysts
Robert Pantelle Ming Zao Constantine Mikas
Betty Boston Kelly Roberts Brett Valens
Above, is a company hierarchy and this is what we will use to perform our WITH Recursive query.
Page 358
Chapter 12
Temporary Tables
RECURSIVE Derived Table Query WITH TeraTom (Emp, Mgr, LastN, Pos_Name, DEPTH) AS (SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, 0 FROM Hierarchy_Table WHERE Mgr_Employee_No IS NULL UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No ) SELECT * FROM TeraTom ORDER BY 5,2,1 ; Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is the WITH Recursive query. Page 359
Chapter 12
Temporary Tables
RECURSIVE Derived Table Definition This is a recursive query
The recursive derived table's name
The recursive derived table is defined with 5 columns. They are Emp, Mgr, LastN, Pos_Name, DEPTH
WITH TeraTom (Emp, Mgr, LastN, Pos_Name, DEPTH) AS (SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, 0 FROM Hierarchy_Table WHERE Mgr_Employee_No IS NULL UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No ) SELECT * FROM TeraTom ORDER BY 5,2,1 ; Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is the WITH Recursive query and the highlighted part explains the recursive derived table definition itself.
Page 360
Chapter 12
Temporary Tables
WITH RECURSIVE Derived Table Seeding WITH TeraTom This entire (Emp, Mgr, LastN, Pos_Name, DEPTH) AS highlighted (SELECT Employee_No, Mgr_Employee_No, section will Last_Name, Position_Name, 0 produce only FROM Hierarchy_Table a single row in our WHERE Mgr_Employee_No IS NULL derived table UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No ) SELECT * FROM TeraTom ORDER BY 5,2,1 ; One row is Emp Mgr ________ LastN Pos_Name ______ Depth ____ ____ __________
1
?
Coffing
CEO
0
placed in our derived table. That is called "seeding the Table".
Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is the WITH Recursive query and the highlighted part explains how the first row is placed inside the derived table. The only employee with no manager is the CEO, Tom Coffing. His Mgr_Employee_No is NULL. The table is now seeded! Page 361
Chapter 12
Temporary Tables
WITH RECURSIVE Derived Table Looping WITH TeraTom (Emp, Mgr, LastN, Pos_Name, DEPTH) AS (SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, 0 FROM Hierarchy_Table WHERE Mgr_Employee_No IS NULL UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No ) SELECT * FROM TeraTom ORDER BY 5,2,1 ;
The highlighted section joins the derived table to the Hierarchy_Table and loops until finished
Emp Mgr ________ LastN Pos_Name ______ Depth ____ ____ __________ 1
?
Coffing
CEO
0
Recursive queries are not supported in the first release of The Azure SQL Data Warehouse
Above, is the WITH Recursive query and the highlighted part explains how the derived table is joined to the Hierarchy_Table in a looping fashion. The highlighted part keeps looping and adding rows until it loops and adds no rows. Then, it is done. Page 362
Chapter 12
Temporary Tables
RECURSIVE Derived Table Looping in Slow Motion UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom The first loop INNER JOIN places two more Hierarchy_Table ON Emp= Mgr_Employee_No rows inside the derived table
TeraTom Emp Mgr ________ LastN Pos_Name ______ Depth ____ ____ __________
1 10 20
? 1 1
Coffing Stevens Gonzales
CEO VP NORTH VP SOUTH
0 1 1
Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is the WITH Recursive query and the highlighted part explains how the derived table is joined to the Hierarchy_Table in a looping fashion. The highlighted part keeps looping and adding rows until it loops and adds no rows. Then, it is done. This is the first loop and as you can see two rows were added. That is because our join condition is Emp = Mgr_Employee_No. Both Stevens and Gonzales report to a manager with an Emp = 1. Page 363
Chapter 12
Temporary Tables
RECURSIVE Derived Table Looping Continued UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom The second loop INNER JOIN places two more Hierarchy_Table ON Emp= Mgr_Employee_No rows inside the derived table
TeraTom Emp Mgr ________ LastN Pos_Name ______ Depth ____ ____ __________ 1 10 20 100 200
? 1 1 10 20
Coffing Stevens Gonzales Patel Mumba
CEO VP NORTH VP SOUTH North Manager South Manager
0 1 1 2 2
Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is our WITH Recursive query and the highlighted part explains how the derived table is joined to the Hierarchy_Table in a looping fashion. The highlighted part keeps looping and adding rows until it loops and adds no rows. Then, it is done. This is the second loop and as you can see two rows were added. That is because our join condition is Emp=Mgr_Employee_No. Both Patel and Mumba report to a manager inside our recursive derived table. Page 364
Chapter 12
Temporary Tables
RECURSIVE Derived Table Looping Continued UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No
Emp Mgr ____ ____
The third loop places six more TeraTom rows inside the LastN Pos_Name Depth ________ __________ ______ derived table
1 10 20 100 200 1000 3000 5000 2000 4000 6000
Coffing Stevens Gonzales Patel Mumba Mikas Zao Pantelle Valens Roberts Boston
? 1 1 10 20 100 100 100 200 200 200
CEO VP NORTH VP SOUTH North Manager South Manager Analyst North Analyst North Analyst North Analyst South Analyst South Analyst South
0 1 1 2 2 3 3 3 3 3 3
Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Six rows are added in the third loop. Page 365
Chapter 12
Temporary Tables
RECURSIVE Derived Table Ends the Looping UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom INNER JOIN Hierarchy_Table ON Emp= Mgr_Employee_No The fourth loop added no rows!
TeraTom Emp Mgr ________ LastN Pos_Name ______ Depth ____ ____ __________ 1 10 20 100 200 1000 3000 5000 2000 4000 6000
? 1 1 10 20 100 100 100 200 200 200
Coffing Stevens Gonzales Patel Mumba Mikas Zao Pantelle Valens Roberts Boston
CEO VP NORTH VP SOUTH North Manager South Manager Analyst North Analyst North Analyst North Analyst South Analyst South Analyst South
0 1 1 2 2 3 3 3 3 3 3
Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
No rows were added in the fourth loop. This loop is done!
Page 366
The loop is finished
Chapter 12
Temporary Tables
RECURSIVE Derived Table Definition WITH TeraTom (Emp, Mgr, LastN, Pos_Name, DEPTH) AS (SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, 0 FROM Hierarchy_Table WHERE Mgr_Employee_No IS NULL UNION ALL SELECT Employee_No, Mgr_Employee_No, Last_Name, Position_Name, DEPTH+1 FROM TeraTom When the loop INNER JOIN failed to add Hierarchy_Table a row the system knows it is done ON Emp= Mgr_Employee_No looping ) SELECT * Now it runs the FROM TeraTom final SELECT to ORDER BY 5,2,1 ; get the answer set. Recursive queries are not supported in the first release of the Azure SQL Data Warehouse
Above, is the WITH Recursive query and the highlighted part is now run so the final answer set can be delivered.
Page 367
Chapter 12
RECURSIVE Derived Table Answer Set
The answer set is delivered.
Page 368
Temporary Tables
Chapter 12
Temporary Tables
What is TEMPDB? TEMPDB is a database similar to all other SQL Server databases
It is recreated every time SQL Server is started
Allows for transactions to be rolled back, but does not allow for database recovery
Because of limited logging, operations in TEMPDB can be much faster than in other databases
Is the storage location for private, global, and direct temporary tables; as well as table variables
Like most things in life, TEMPDB is temporary. It is wonderful for temporary data storage. Just do not count on it having data that will be there for you in the future.
Page 369
Chapter 12
Temporary Tables
Creating a Temporary Table CREATE TABLE #Emp_Temp ( Employee_No INTEGER ,Dept_No SMALLINT It is mandatory you put in the ,First_Name VARCHAR(12) LOCATION = USER_DB ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) WITH (LOCATION = USER_DB, DISTRIBUTION = HASH (Employee_No)) ; Populate the temp table with an INSERT/SELECT statement
INSERT INTO #Emp_Temp SELECT * FROM Employee_Table;
SELECT AVG(Salary) as AVGSAL FROM #Emp_Temp ; AVGSAL ___________ 46782.153333
You create a local temporary table by using the # prefix before the table name. The temporary table can only be accessed from its own session. You cannot create partitions, views, or non-clustered indexes on a temporary table, nor can you have two temporary tables with the same name in the same session. Page 370
Chapter 12
Temporary Tables
The Three Steps to Use a Private Temporary Table CREATE TABLE #Dept_Agg_Vol ( Dept_no Integer 1 ,Sum_Salary Decimal(10,2) ) WITH (Location=User_DB) ; INSERT INTO #Dept_Agg_Vol SELECT Dept_no ,SUM(Salary) 2 FROM Employee_Table GROUP BY Dept_no ;
3 SELECT * FROM #Dept_Agg_Vol ORDER BY 1;
Only you can see this data because your session number is associated with your Private Temporary Tables. You can’t even see this table if you login and query it from another session!
1) A USER Creates a Private Temporary Table and populates it with an INSERT/SELECT Statement, and then queries it until Logging off.
Page 371
Chapter 12
Temporary Tables
Creating a Temporary Table With a Clustered Index CREATE TABLE TEMPDB.#Dept_Agg_Vol3 ( Dept_no Integer ,Sum_Salary Decimal(10,2) ) WITH (Location=User_DB) ; CREATE CLUSTERED INDEX IDX_Dept_Agg_Vol_Dept_no ON TEMPDB.#Dept_Agg_Vol3 (Dept_No) ; Temporary Tables can have clustered and non-clustered indexes just like “regular “tables. Both the tables and their indexes are stored in tempdb.
You can have clustered indexes on a temporary table.
Page 372
Chapter 12
Temporary Tables
Creating a Columnstore Temporary Table From a CTAS CREATE TABLE #Order_Columnar WITH ( LOCATION=USER_DB, CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = Hash (Order_Number) ) AS SELECT * FROM Order_Table; This Temporary Table has been created from the Order_Table, but the temporary table is a columnstore. We used a CREATE TABLE AS (CTAS) statement.
You can use a Create Table As (CTAS) statement to create a temporary table that is a columnstore.
Page 373
Chapter 13
Page 374
Sub-query Functions
Chapter 13
Sub-query Functions
Chapter 13 – Sub-query Functions
“An invasion of Armies can be resisted, but not an idea whose time has come.” - Victor Hugo
Page 375
Chapter 13
Sub-query Functions
An IN List is much like a Subquery Employee_Table
Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200) ; Employee_No ____________ Dept_No ________ 1232578 100 1324657 200 1333454 200
Last_Name _________ Chambers Coffing Smith
First_Name _______ Salary __________ Mandee 48850.00 Billy 41888.88 John 48000.00
This query is very simple and easy to understand. It uses an IN List to find all Employees who are in Dept_No 100 or Dept_No 200.
Page 376
Chapter 13
Sub-query Functions
An IN List Never has Duplicates – Just like a Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert
Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
SELECT * FROM Employee_Table WHERE Dept_No IN (100, 100,200, 200) ;
What is going on with this IN List? Why in the world are their duplicates in there? Will this query even work? What will the result set look like? Turn the page!
Page 377
Chapter 13
Sub-query Functions
An IN List Ignores Duplicates Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Duplicate values in SELECT * a list are irrelevant FROM Employee_Table WHERE Dept_No IN (100, 100,200, 200) ;
Employee_No ____________ Dept_No ________ 1232578 100 The answer set still 1324657 200 produced only 3 rows 1333454 200
Last_Name _________ Chambers Coffing Smith
First_Name _______ Salary __________ Mandee 48850.00 Billy 41888.88 John 48000.00
Duplicate values are ignored here. We got the same rows back as before, and it is as if the system ignored the duplicate values in the IN List. That is exactly what happened.
Page 378
Chapter 13
Sub-query Functions
The Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
There is a Top Query and a Bottom Query!
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table
Dept_No ________________ Department_Name ________
SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Which Query Runs First?
The query above is a Subquery which means there are multiple queries in the same SQL. The bottom query runs first, and its purpose in life is to build a distinct list of values that it passes to the top query. The top query then returns the result set. This query solves the problem: Show all Employees in Valid Departments! Page 379
Chapter 13
Sub-query Functions
The Three Steps of How a Basic Subquery Works Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
SELECT * FROM Employee_Table 1 WHERE Dept_No IN ( SELECT Dept_No The Bottom Query runs first! FROM Department_Table) ;
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
2 The result is passed to the top query!
3 SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200, 300, 400, 500) ;
The top query runs using the bottom query answer set
The bottom query runs first and builds a distinct IN list. Then the top query runs using the list.
Page 380
Chapter 13
Sub-query Functions
These are Equivalent Queries Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
1
2
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;
SELECT * FROM Employee_Table WHERE Dept_No IN (100, 200, 300, 400, 500) ;
Both queries above are the same. Query 2 has values in an IN list. Query 1 runs a subquery to build the values in the IN list. Page 381
Chapter 13
Sub-query Functions
The Final Answer Set from the Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400 Remember that a subquery never has columns return in the final answer set
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources Notice that No employees are in dept 500
SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ; Employee_No Dept_No ____________ ________ 1232578 100 1324657 200 1333454 200 2312225 300 1256349 400 2341218 400 1121334 400
Page 382
Department_Table
Last_Name __________ Chambers Coffing Smith Larkins Harrison Reilly Strickling
First_Name __________ Mandee Billy John Loraine Herbert William Cletus
Salary ________ 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
Chapter 13
Sub-query Functions
Quiz- Answer the Difficult Question Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
How are Subqueries similar to Joins between two tables?
A great question was asked above. Do you know the key to answering? Turn the page!
Page 383
Chapter 13
Sub-query Functions
Answer to Quiz- Answer the Difficult Question Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Primary Key
Foreign Key
How are Subqueries similar to Joins between two tables?
A Subquery between two tables or a Join between two tables will each need a common key that represents the relationship. This is called a Primary Key/Foreign Key relationship.
A Subquery will use a common key linking the two tables together very similar to a join! When subquerying between two tables, look for the common link between the two tables. Most of the time they both have a column with the same name, but not always. Page 384
Chapter 13
Sub-query Functions
Should you use a Subquery of a Join? Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Department_Table
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
When do I Subquery? SELECT * FROM Employee_Table WHERE Dept_No IN ( SELECT Dept_No FROM Department_Table) ;
Dept_No ________________ Department_Name ________ 100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
When do I perform a Join? SELECT E.*, Department_Name FROM Employee_Table as E Inner Join Department_Table as D ON E.Dept_No = D.Dept_No;
If you only want to see a report where the final result set has only columns from one table, use a Subquery. Obviously, if you need columns on the report where the final result set has columns from both tables, you have to do a Join. Page 385
Chapter 13
Sub-query Functions
Quiz- Write the Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________
11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery
Select all columns in the Customer_Table if the customer has placed an order!
Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the Customer_Table if the customer has placed an order in the Order_Table. Good luck! Advice: Look for the common key among both tables!
Page 386
Chapter 13
Sub-query Functions
Answer to Quiz- Write the Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________
11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery
Select all columns in the Customer_Table if the customer has placed an order!
SELECT * FROM Customer_Table WHERE Customer_Number IN ( SELECT Customer_Number FROM Order_Table) ;
Customer_Number ________________ 31323134 57896883 11111111 87323456
Customer_Name ______________ ACE Consulting XYZ Plumbing Billy's Best Choice Databases N-U
The common key among both tables is Customer_Number. The bottom query runs first and delivers a distinct list of Customer_Numbers which the top query uses in the IN List! Page 387
Chapter 13
Sub-query Functions
Quiz- Write the More Difficult Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery Select all columns in the Customer_Table if the customer has placed an order over $10,000.00 Dollars!
Here is your opportunity to show how smart you are. Write a Subquery that will bring back everything from the Customer_Table if the customer has placed an order in the Order_Table that is greater than $10,000.00.
Page 388
Chapter 13
Sub-query Functions
Answer to Quiz- Write the More Difficult Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery Select all columns in the Customer_Table if the customer has placed an order over $10,000.00 Dollars!
SELECT * FROM Customer_Table WHERE Customer_Number IN ( SELECT Customer_Number FROM Order_Table WHERE Order_Total > 10000.00) ;
Here is your answer!
Page 389
Customer_Number Customer_Name _______________ _______________ 11111111 Billy's Best Choice 57896883 XYZ Plumbing 87323456 Databases N-U
Chapter 13
Sub-query Functions
Quiz – Write the Extreme Subquery Course_Table Course_ID Course_Name _________ _________________ Student_Course_Table Student_ID Course_ID 280023 210 231222 210 125634 100 231222 220 125634 200 322133 220 125634 220 322133 300 324652 200 333450 500 260000 400 333450 400 234121 100 123250 100
100 200 210 220 300 400
Credits ______ Seats ____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 V2R3 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16 Student_Table
__________ Student_ID 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
__________ Last_Name Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
__________ First_Name __________ Class_Code Grade_Pt ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
Write SQL that will bring back an answer set that selects all columns from the Student_Table if that student is taking a course that has four (4) credits.
Use a subquery to get the answer set requested above. The answer is on the next page. Page 390
Chapter 13
Sub-query Functions
Answer to Quiz – Write the Extreme Subquery SELECT S.* FROM Student_Table as S WHERE Student_ID IN (SELECT Student_ID FROM Student_Course_Table WHERE Course_ID IN (SELECT Course_ID FROM Course_Table WHERE Credits=4))
Student_ID _________ 260000 322133 333450
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Johnson Bond Smith
Above, is something to enjoy and learn from.
Page 391
Stanley Jimmy Andy
? JR SO
? 3.95 2.00
Chapter 13
Sub-query Functions
Quiz- Write the Subquery with an Aggregate Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert
Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
Write the Subquery Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary.
Another opportunity knocking! Would someone please answer the query door?
Page 392
Chapter 13
Sub-query Functions
Answer to Quiz- Write the Subquery with an Aggregate Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Write the Subquery Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary. SELECT * FROM Employee_Table WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table) ;
Page 393
Chapter 13
Sub-query Functions
Quiz- Write the Correlated Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Write the Correlated Subquery
Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary (within their own Department).
Another opportunity knocking! This is a tough one, and only the best get this written correctly.
Page 394
Chapter 13
Sub-query Functions
Answer to Quiz- Write the Correlated Subquery Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Write the Correlated Subquery Select all columns in the Employee_Table if the employee makes a greater Salary than the AVERAGE Salary (within their own Department). SELECT * FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No) ;
Page 395
Chapter 13
Sub-query Functions
The Basics of a Correlated Subquery The Top Query is Co-Related (Correlated) with the Bottom Query. The table name from the top query and the table name from the bottom query are given a different alias.
The bottom query WHERE clause co-relates Dept_No from Top and Bottom. The top query is run first. The bottom query is run one time for each distinct value delivered from the top query. SELECT * FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No) ;
A correlated subquery breaks all the rules. It is the top query that runs first. Then, the bottom query is run one time for each distinct column in the bottom WHERE clause. In our example, this is the column Dept_No. This is because in our example, the WHERE clause is comparing the column Dept_No. After the top query runs and brings back its rows, the bottom query will run one time for each distinct Dept_No. If this is confusing, it is not you. These take a little time to understand, but I have a plan to make you an expert. Keep reading!
Page 396
Chapter 13
Sub-query Functions
The Top Query always runs first in a Correlated Subquery The Top Query runs first (colored in blue)
SELECT * FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No)
EE.Dept_No = EEEE.Dept_No
SELECT * FROM Employee_Table as EE Employee_No ____________ Dept_No ________ Last_Name _________ Null is 2000000 skipped ? Jones 1000234 10 Smythe 1232578 100 Chambers 1324657 200 Coffing 1333454 200 Smith 2312225 300 Larkins 1121334 400 Strickling 2341218 400 Reilly 1256349 400 Harrison
First_Name _______ Salary _________ Squiggy 32800.50 Richard 64300.00 Mandee 48850.00 Billy 41888.88 John 48000.00 Loraine 40200.00 Cletus 54500.00 William 36000.00 Herbert 54500.00
Dept_No ________ 10 100 200 300 400
Employee_No ________ Dept_No __________ Last_Name __________ First_Name _______ Salary ____________ 1333454 1256349 1121334
Page 397
200 400 400
Smith Harrison Strickling
John Herbert Cletus
The bottom Query (in red) runs 1 time for each distinct Dept_No
48000.00 54500.00 54500.00
AVGSAL ________ 64300.00 48850.00 44944.44 40200.00 48333.33
Only these three employees make more than the AVG salary within their own department
Chapter 13
Sub-query Functions
Correlated Subquery Example vs. a Join with a Derived Table SELECT Last_Name, Dept_No, Salary FROM Employee_Table as EE WHERE Salary > ( SELECT AVG(Salary) FROM Employee_Table as EEEE WHERE EE.Dept_No = EEEE.Dept_No) ;
SELECT E.*, AVGSAL FROM Employee_Table as E INNER JOIN (SELECT Dept_No, AVG(Salary) FROM Employee_Table GROUP BY Dept_No) as TeraTom (Depty, AVGSAL) ON Dept_No = Depty AND Salary > AVGSAL ;
Correlated Subquery Last_Name ________ Dept_No _______ Salary __________ Smith 200 48000.00 Harrison 400 54500.00 Strickling 400 54500.00
Join with a Derived Table Last_Name ________ Dept_No _______ Salary AVGSAL _________ ________ Smith 200 48000.00 44944.44 Harrison 400 54500.00 48333.33 Strickling 400 54500.00 48333.33
Both queries above will bring back all employees making a salary that is greater than the average salary in their department. The biggest difference is that the Join with the Derived Table also shows the Average Salary in the result set.
Page 398
Chapter 13
Sub-query Functions
Quiz- A Second Chance to Write a Correlated Subquery Sales_Table
Product_ID _________ Sale_Date __________ 1000 10/02/2000 1000 09/30/2000 1000 10/01/2000 All Rows are 2000 10/04/2000 NOT 2000 10/02/2000 Displayed 2000 09/28/2000 3000 10/04/2000 3000 10/02/2000 3000 10/03/2000
Daily_Sales __________ 32800.50 36000.07 40200.43 32800.50 36021.93 41888.88 15675.33 19678.94 21553.79
Write the Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Product_ID.
Another opportunity knocking! This is your second chance. I will even give you a third chance.
Page 399
Chapter 13
Sub-query Functions
Answer - A Second Chance to Write a Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Product_ID. SELECT * FROM Sales_Table as TopS WHERE Daily_Sales > ( SELECT AVG(Daily_Sales) FROM Sales_Table as BotS WHERE TopS.Product_ID = BotS.Product_ID) ORDER BY Product_ID, Sale_Date ; Product_ID _________ Sale_Date __________ Daily_Sales __________
Answer Set
Page 400
1000 1000 1000 1000 2000 2000 2000 3000 3000 3000
09/28/2000 09/29/2000 10/03/2000 10/04/2000 09/29/2000 09/30/2000 10/01/2000 09/28/2000 09/29/2000 09/30/2000
48850.40 54500.22 64300.00 54553.10 48000.00 49850.03 54850.29 61301.77 34509.13 43868.86
Chapter 13
Sub-query Functions
Quiz- A Third Chance to Write a Correlated Subquery Sales_Table
Product_ID _________ Sale_Date __________ 1000 10/02/2000 1000 09/30/2000 1000 10/01/2000 All Rows are 2000 10/04/2000 NOT 2000 10/02/2000 Displayed 2000 09/28/2000 3000 10/04/2000 3000 10/02/2000 3000 10/03/2000
Daily_Sales __________ 32800.50 36000.07 40200.43 32800.50 36021.93 41888.88 15675.33 19678.94 21553.79
Write the Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Sale_Date.
Another opportunity knocking! There is just one minor adjustment and you are home free.
Page 401
Chapter 13
Sub-query Functions
Answer - A Third Chance to Write a Correlated Subquery Select all columns in the Sales_Table if the Daily_Sales column is greater than the Average Daily_Sales within its own Sale_Date. SELECT * FROM Sales_Table as TopS WHERE Daily_Sales > ( SELECT AVG(Daily_Sales) FROM Sales_Table as BotS WHERE TopS.Sale_Date = BotS.Sale_Date) ORDER BY Sale_Date ; Product_ID _________ Sale_Date __________ Daily_Sales __________
Answer Set
Page 402
3000 2000 1000 3000 2000 2000 2000 1000 2000 1000 1000
09/28/2000 09/29/2000 09/29/2000 09/30/2000 09/30/2000 10/01/2000 10/02/2000 10/02/2000 10/03/2000 10/03/2000 10/04/2000
61301.77 48000.00 54500.22 43868.86 49850.03 54850.29 36021.93 32800.50 43200.18 64300.00 54553.10
Chapter 13
Sub-query Functions
Quiz- Last Chance To Write a Correlated Subquery Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
Write the Correlated Subquery Select all columns in the Student_Table if the Grade_Pt column is greater than the Average Grade_Pt within its own Class_Code.
Another opportunity knocking! There is just one minor adjustment and you are home free.
Page 403
Chapter 13
Sub-query Functions
Answer – Last Chance to Write a Correlated Subquery Select all columns in the Student_Table if the Grade_Pt column is greater than the Average Grade_Pt within its own Class_Code.
SELECT * FROM Student_Table as TopS WHERE Grade_Pt > ( SELECT AVG(Grade_Pt) FROM Student_Table as BotS WHERE TopS. Class_Code = BotS.Class_Code ) ORDER BY Class_Code ;
Answer Set Student_ID Last_Name First_Name __________ __________ __________ Class_Code __________ Grade_Pt ________ 234121 125634 322133 231222 324652
Page 404
Thomas Hanson Bond Wilson Delaney
Wendy Henry Jimmy Susie Danny
FR FR JR SO SR
4.00 2.88 3.95 3.80 3.35
Chapter 13
Sub-query Functions
Quiz – Write the Extreme Correlated Subquery Course_Table Course_ID Course_Name _________ _________________ Student_Course_Table Student_ID Course_ID 280023 210 231222 210 125634 100 231222 220 125634 200 322133 220 125634 220 322133 300 324652 200 333450 500 260000 400 333450 400 234121 100 123250 100
100 200 210 220 300 400
Credits ______ Seats ____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 V2R3 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16 Student_Table
__________ Student_ID 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
__________ Last_Name Larkins Wilson McRoberts Bond Hanson Smith Delaney Johnson Thomas Phillips
First_Name __________ __________ Class_Code Grade_Pt ________ Michael FR 0.00 Susie SO 3.80 Richard JR 1.90 Jimmy JR 3.95 Henry FR 2.88 Andy SO 2.00 Danny SR 3.35 Stanley ? ? Wendy FR 4.00 Martin SR 3.00
Write a correlated subquery that will bring back an answer set that returns all columns from the Course_Table if that course is being taken by a student who has a greater than average grade point within their own class code.
Use a subquery to get the answer set requested above. The answer is on the next page.
Page 405
Chapter 13
Sub-query Functions
Answer To Quiz – Write the Extreme Correlated Subquery SELECT * FROM Course_Table WHERE Course_ID IN (SELECT Course_ID FROM Student_Course_Table WHERE Student_ID IN (SELECT Student_ID FROM Student_Table AS s1 WHERE Grade_Pt > (SELECT AVG(Grade_Pt) FROM Student_Table AS s2 WHERE s1.Class_Code=s2.Class_Code) ) ); Course_ID _________ 200 100 220 300 210
Course_Name _____________________ Credits ______ Seats _____ Introduction to SQL 3 20 Database Concepts 3 50 V2R3 SQL Features 2 25 Physical Database Design 4 20 Advanced SQL 3 22
Above, is something to enjoy and learn from. Page 406
Chapter 13
Sub-query Functions
Quiz- Write the NOT Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
Write the Subquery Select all columns in the Customer_Table if the Customer has NOT placed an order.
Another opportunity knocking! Write the above query
Page 407
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 13
Sub-query Functions
Answer to Quiz- Write the NOT Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________
11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Select all columns in the Customer_Table if the Customer has NOT placed an order. SELECT * FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table WHERE Customer_Number IS NOT NULL) ;
Page 408
Nulls are a NOT IN nightmare. Notice how I account for them!
Chapter 13
Sub-query Functions
Quiz- Write the Subquery using a WHERE Clause Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
Write the Subquery Select all columns in the Order_Table that were placed by a customer with ‘Bill’ anywhere in their name.
Another opportunity to show your brilliance is ready for you to make it happen.
Page 409
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 13
Sub-query Functions
Answer - Write the Subquery using a WHERE Clause Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
Write the Subquery Select all columns in the Order_Table that were placed by a customer with ‘Bill’ anywhere in their name.
SELECT * FROM Order_Table WHERE Customer_Number IN (SELECT Customer_Number FROM Customer_Table WHERE Customer_Name like '%Bill%') ;
Page 410
12347.53 8005.91 5111.47 15231.62 23454.84
Chapter 13
Sub-query Functions
Quiz – Write the Triple Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery
What is the Customer_Name who has the highest dollar order among all customers? This query will have multiple Subqueries!
Good luck in writing this. Remember that this will involve multiple Subqueries.
Page 411
Chapter 13
Sub-query Functions
Answer to Quiz – Write the Triple Subquery Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Write the Subquery What is the Customer_Name who has the highest dollar order among all customers? This query will have multiple Subqueries! SELECT Customer_Name XYZ Plumbing FROM Customer_Table WHERE Customer_Number IN 58796883 This runs (SELECT Customer_Number FROM Order_Table second WHERE Order_Total IN (SELECT Max(Order_Total) FROM Order_Table)) ; 23454.84 This runs first This runs third
The query is above and, of course, the answer is XYZ Plumbing.
Page 412
Chapter 13
Sub-query Functions
Quiz – How many rows return on a NOT IN with a NULL? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777 000099
11111111 11111111 31323134 87323456 57896883 NULL
We added a Null Value to the Order_Table
12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL
SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table ) ;
How many rows return from the query now that a NULL value is in a Customer_Number? We really didn’t place a new row inside the Order_Table with a NULL value for the Customer_Number column, but in theory, if we had, how many rows would return?
Page 413
Chapter 13
Sub-query Functions
Answer – How many rows return on a NOT IN with a NULL? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777 000099
11111111 11111111 31323134 87323456 57896883 NULL
We added a Null Value to the Order_Table
12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL
SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table ) ;
How many rows return from the query now that a NULL value is in a Customer_Number? ZERO rows will return
The answer is no rows come back. This is because when you have a NULL value in a NOT IN list, the system doesn’t know the value of NULL, so it returns nothing.
Page 414
Chapter 13
Sub-query Functions
How to handle a NOT IN with Potential NULL Values Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777 000099
11111111 11111111 31323134 87323456 57896883 NULL
We added a Null Value to the Order_Table
12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL
SELECT Customer_Name FROM Customer_Table WHERE Customer_Number NOT IN (SELECT Customer_Number FROM Order_Table WHERE Customer_Number IS NOT NULL) ;
How many rows return NOW from the query? 1 Acme Products
You can utilize a WHERE clause that tests to make sure Customer_Number IS NOT NULL. This should be used when a NOT IN could encounter a NULL.
Page 415
Chapter 13
Sub-query Functions
Using a Correlated Exists Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Use EXISTS to find which Customers have placed an Order?
SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ; The EXISTS command will determine via a Boolean if something is True or False. If a customer placed an order, it EXISTS, and using the Correlated Exists statement, only customers who have placed an order will return in the answer set. EXISTS is different than IN as it is less restrictive as you will soon understand.
Page 416
Chapter 13
Sub-query Functions
How a Correlated Exists matches up Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Does not Acme Products Exist in ACE Consulting Order_Table XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ; Customer_Number ________________
________________ Customer_Name
11111111 31323134 57896883 87323456
Billy’s Best Choice ACE Consulting XYZ Plumbing Databases N-U
Only customers who placed an order return with the above Correlated EXISTS.
Page 417
Chapter 13
Sub-query Functions
The Correlated NOT Exists Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Use NOT EXISTS to find which Customers have NOT placed an Order? SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;
The EXISTS command will determine via a Boolean if something is True or False. If a customer placed an order, it EXISTS, and using the Correlated Exists statement, only customers who have placed an order will return in the answer set. EXISTS is different than IN as it is less restrictive as you will soon understand.
Page 418
Chapter 13
Sub-query Functions
The Correlated NOT Exists Answer Set Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
12347.53 8005.91 5111.47 15231.62 23454.84
Use NOT EXISTS to find which Customers have NOT placed an Order? SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;
Customer_Number Customer_Name ________________ ______________ 31313131
Acme Products
The only customer who did NOT place an order was Acme Products.
Page 419
Chapter 13
Sub-query Functions
Quiz – How many rows come back from this NOT Exists? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777 000099
11111111 11111111 31323134 87323456 57896883 NULL
We added a Null Value to the Order_Table
12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL
SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ;
How many rows return from the query?
A NULL value in a list for queries with NOT IN returned nothing, but you must now decide if that is also true for the NOT EXISTS. How many rows will return?
Page 420
Chapter 13
Sub-query Functions
Answer – How many rows come back from this NOT Exists? Customer_Table
Order_Table
Customer_Number Customer_Name Order_Number ______________ Customer_Number _________ Order_Total _____________ ______________ ___________ 11111111 31313131 31323134 57896883 87323456
Billy’s Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
123456 123512 123552 123585 123777 000099
11111111 11111111 31323134 87323456 57896883 NULL
We added a Null Value to the Order_Table
12347.53 8005.91 5111.47 15231.62 23454.84 9999.99 NULL
SELECT Customer_Number, Customer_Name FROM Customer_Table as Top1 WHERE NOT EXISTS (SELECT * FROM Order_Table as Bot1 Where Top1.Customer_Number = Bot1.Customer_Number ) ; How many rows return from the query? One row Acme Products
NOT EXISTS is unaffected by a NULL in the list, that’s why it is more flexible
Page 421
Chapter 14
Page 422
Window Functions OLAP
Chapter 14
Window Functions OLAP
Chapter 14 – Window Functions OLAP
“Don’t count the days, make the days count.” - Mohammad Ali
Page 423
Chapter 14
Window Functions OLAP
The Row_Number Command SELECT Product_ID ,Sale_Date , Daily_Sales, ROW_NUMBER() OVER (ORDER BY Product_ID, Sale_Date) AS Seq_Number FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID __________ Sale_Date ________
Not all rows are displayed
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01
Daily_Sales ___________ Seq_Number _________ 48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29
1 2 3 4 5 6 7 8 9 10 11
The ROW_NUMBER() Keyword(s) caused Seq_Number to increase sequentially. Notice that this does NOT have a Rows Unbounded Preceding, and it still works! Page 424
Chapter 14
Window Functions OLAP
Quiz – How did the Row_Number Reset? SELECT Product_ID ,Sale_Date , Daily_Sales, ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date ) AS StartOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID ________ Sale_Date ________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
What Keyword(s) caused StartOver to reset? Page 425
Daily_Sales _________
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
StartOver _______
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Chapter 14
Window Functions OLAP
Quiz – How did the Row_Number Reset? SELECT Product_ID ,Sale_Date , Daily_Sales, ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date ) AS StartOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID ________
Sale_Date ________
Daily_Sales _________
StartOver _______
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
1 2 3 4 5 6 7 1 2 3 4 5 6 7
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement. Page 426
Chapter 14
Window Functions OLAP
Using a Derived Table and Row_Number WITH Results AS ( SELECT ROW_NUMBER() OVER(ORDER BY Product_ID, Sale_Date) AS RowNumber, Product_ID, Sale_Date FROM Sales_Table ) SELECT * FROM Results WHERE RowNumber BETWEEN 8 AND 14 RowNumber __________ Product_ID _________ Sale_Date _________ 8 2000 2000-09-28 9 2000 2000-09-29 10 2000 2000-09-30 11 2000 2000-10-01 12 2000 2000-10-02 13 2000 2000-10-03 14 2000 2000-10-04
In the example above, we are using a derived table called Results and then using a WHERE clause to only take certain Row Numbers. Page 427
Chapter 14
Window Functions OLAP
Ordered Analytics OVER SELECT TOP (9) Product_ID as Prod ,Sale_Date ,Daily_Sales ,SUM(Daily_Sales) OVER(PARTITION BY Sale_Date) AS Total ,AVG(Daily_Sales) OVER(PARTITION BY Sale_Date) AS Avg ,COUNT(Daily_Sales) OVER(PARTITION BY Sale_Date) AS Cnt ,MIN(Daily_Sales) OVER(PARTITION BY Sale_Date) AS Min ,MAX(Daily_Sales) OVER(PARTITION BY Sale_Date) AS Max FROM Sales_Table Prod ____ 1000 2000 3000 3000 2000 1000 1000 2000 3000
Sale_Date __________ Daily_Sales ________ Total Avg Cnt Min _________ ________ ___ ________ 2000-09-28 48850.40 152041.05 50680.35 3 41888.88 2000-09-28 41888.88 152041.05 50680.35 3 41888.88 2000-09-28 61301.77 152041.05 50680.35 3 41888.88 2000-09-29 34509.13 137009.35 45669.78 3 34509.13 2000-09-29 48000.00 137009.35 45669.78 3 34509.13 2000-09-29 54500.22 137009.35 45669.78 3 34509.13 2000-09-30 36000.07 129718.96 43239.65 3 36000.07 2000-09-30 49850.03 129718.96 43239.65 3 36000.07 2000-09-30 43868.86 129718.96 43239.65 3 36000.07
Above, is an example of the Ordered Analytics using the keyword OVER.
Page 428
Max _______ 61301.77 61301.77 61301.77 54500.22 54500.22 54500.22 49850.03 49850.03 49850.03
Chapter 14
Window Functions OLAP
RANK and DENSE RANK SELECT TOP (5) Product_ID, Daily_Sales, RANK() OVER (ORDER BY Daily_Sales ASC) as [Rank], DENSE_RANK() OVER(Order By Daily_Sales ASC) as [DenseRank] FROM Sales_Table WHERE Product_ID in(1000, 2000)
Prod ____ 1000 2000 1000 2000 1000
Daily_Sales Rank DenseRank __________ _____ __________ 32800.50 1 1 32800.50 1 1 36000.07 3 2 36021.93 4 3 40200.43 5 4
Above is an example of the RANK and DENSE_RANK commands. Notice the difference in the ties and the next ranking. Page 429
Chapter 14
Window Functions OLAP
RANK Defaults to Ascending Order SELECT Product_ID ,Sale_Date , Daily_Sales, RANK() OVER (ORDER BY Daily_Sales) AS Rank1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID _________
Sale_Date ________
1000 2000 1000 2000 1000 Not all 2000 rows 2000 are displayed 2000 1000 2000 1000 1000 2000
10/02/2000 10/04/2000 09/30/2000 10/02/2000 10/01/2000 09/28/2000 10/03/2000 09/29/2000 09/28/2000 09/30/2000 09/29/2000 10/04/2000 10/01/2000
The RANK() OVER command defaults the Sort to ASC. Page 430
Daily_Sales Rank1 _________ _____ 1 32800.50 1 32800.50 3 36000.07 4 36021.93 5 40200.43 6 41888.88 7 43200.18 8 48000.00 9 48850.40 10 49850.03 11 54500.22 12 54553.10 13 54850.29
Chapter 14
Window Functions OLAP
Getting RANK to Sort in DESC Order SELECT Product_ID ,Sale_Date , Daily_Sales, RANK() OVER (ORDER BY Daily_Sales DESC) AS Rank1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID _________ 1000 2000 1000 1000 2000 1000 2000 2000 2000 1000 2000 1000 2000 1000
Sale_Date ________
Daily_Sales _________
10/03/2000 10/01/2000 10/04/2000 09/29/2000 09/30/2000 09/28/2000 09/29/2000 10/03/2000 09/28/2000 10/01/2000 10/02/2000 09/30/2000 10/04/2000 10/02/2000
64300.00 54850.29 54553.10 54500.22 49850.03 48850.40 48000.00 43200.18 41888.88 40200.43 36021.93 36000.07 32800.50 32800.50
Rank1 _____ 1 2 3 4 5 6 7 8 9 10 11 12 13 13
Utilize the DESC keyword in the ORDER BY statement to rank in descending order. Page 431
Chapter 14
Window Functions OLAP
RANK() OVER and PARTITION BY SELECT Product_ID ,Sale_Date , Daily_Sales, RANK() OVER (PARTITION BY Product_ID ORDER BY Daily_Sales DESC) AS Rank1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID ________ Sale_Date ________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
10/03/2000 10/04/2000 09/29/2000 09/28/2000 10/01/2000 09/30/2000 10/02/2000 10/01/2000 09/30/2000 09/29/2000 10/03/2000 09/28/2000 10/02/2000 10/04/2000
Daily_Sales Rank1 _________ _____ 64300.00 54553.10 54500.22 48850.40 40200.43 36000.07 32800.50 54850.29 49850.03 48000.00 43200.18 41888.88 36021.93 32800.50
1 2 3 4 5 6 7 1 2 3 4 5 6 7
What does the PARTITION Statement in the RANK() OVER do? It resets the rank. Page 432
Chapter 14
Window Functions OLAP
Cumulative Sum SELECT Product_ID , Sale_Date, Daily_Sales, SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date) AS CsumAnsi FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;
Product_ID Sale_Date ___________ Daily_Sales __________ _________ 2000 2000-09-28 41888.88 1000 2000-09-28 48850.40 2000 2000-09-29 48000.00 Not all rows 1000 2000-09-29 54500.22 are displayed 1000 2000-09-30 36000.07 in this 49850.03 answer set 2000 2000-09-30 1000 2000-10-01 40200.43 2000 2000-10-01 54850.29 1000 2000-10-02 32800.50 2000 2000-10-02 36021.93
CsumAnsi ________ 41888.88 90739.28 138739.28 193239.50 229239.57 279089.60 319290.03 374140.32 406940.82 442962.75
The keywords Rows Unbounded Preceding determines that this is a cumulative sum (CSUM). There are only a few different statements and Rows Unbounded Preceding is the main one. It means start calculating at the beginning row, and continue calculating until the last row. Page 433
Chapter 14
Window Functions OLAP
The ANSI CSUM – Getting a Sequential Number SELECT Product_ID , Sale_Date, Daily_Sales, SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date) SUMOVER, SUM(1) OVER (ORDER BY Product_ID, Sale_Date) AS Seq_Number FROM Sales_Table ;
Product_ID Daily_Sales ___________ SUM OVER ___________ Seq_Number __________ Sale_Date _________ __________ 1000 1000 Not all rows 1000 are displayed 1000 in this 1000 answer set 1000 1000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03
48850.40 103350.62 139350.69 179551.12 212351.62 276651.62 331204.72 373093.60 421093.60 470943.63
1 2 3 4 5 6 7 8 9 10
With “Seq_Number”, because you placed the number 1 in the area which calculates the cumulative sum, it’ll continuously add 1 to the answer for each row. Page 434
Chapter 14
Window Functions OLAP
Troubleshooting The ANSI OLAP on a GROUP BY SELECT Product_ID , Sale_Date, Daily_Sales, SUM(Daily_Sales) OVER (ORDER BY Sale_Date) AS AnsiCsum FROM Sales_Table GROUP BY Product_ID ;
Error! Why?
Never GROUP BY in a SUM()Over or with any ANSI Syntax OLAP command. If you want to reset you use a PARTITION BY Statement, but never a GROUP BY.
Page 435
Chapter 14
Window Functions OLAP
Reset with a PARTITION BY Statement SELECT Product_ID , Sale_Date, Daily_Sales, SUM(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date) AS SumANSI FROM Sales_Table ;
Product_ID Sale_Date ________ ________
Not all rows are displayed in this answer set
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30
Daily_Sales SumANSI _________ ________ 48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03
48850.40 103350.62 139350.69 179551.12 212351.62 276651.62 331204.72 41888.88 89888.88 139738.91
CSUM Resets on Product_ID break
The PARTITION Statement is how you reset in ANSI. This will cause the SUMANSI to start over (reset) on its calculating for each NEW Product_ID.
Page 436
Chapter 14
Window Functions OLAP
PARTITION BY only Resets a Single OLAP not ALL of them SELECT Product_ID , Sale_Date, Daily_Sales, SUM(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date) Subtotal, SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date) GrandTotal FROM Sales_Table ;
Product_ID ________ Sale_Date Daily_Sales Subtotal GrandTotal _________ _________ ________ ________
Not all rows are displayed in this answer set
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03
48850.40 103350.62 139350.69 179551.12 212351.62 276651.62 331204.72 41888.88 89888.88 139738.91
48850.40 103350.62 139350.69 179551.12 212351.62 276651.62 331204.72 373093.60 421093.60 470943.63
Above, are two OLAP statements. Only one has PARTITION BY, so only it resets. The other continuously does a CSUM. Page 437
Chapter 14
Window Functions OLAP
Sorting in DESC Order SELECT Product_ID, Sale_Date ,Daily_Sales ,SUM(Daily_Sales) OVER (ORDER BY Product_ID DESC, Sale_Date) AS CumulativeTotal FROM Sales_Table
Product_ID ________ Sale_Date Daily_Sales CumulativeTotal _________ _________ ____________
Not all rows are displayed in this answer set
3000 3000 3000 3000 3000 3000 3000 2000 2000 2000 2000
10/04/2000 10/03/2000 10/02/2000 10/01/2000 09/30/2000 09/29/2000 09/28/2000 10/04/2000 10/03/2000 10/02/2000 10/01/2000
15675.33 21553.79 19678.94 28000.00 43868.86 34509.13 61301.77 32800.50 43200.18 36021.93 54850.29
15675.33 37229.12 56908.06 84908.06 128776.92 163286.05 224587.82 257388.32 300588.50 336610.43 391460.72
Above we used the DESC keyword in the ORDER BY statement for the Product_ID. Notice that the Product_ID is reversed. We see the Product_ID of 3000 first. Page 438
Chapter 14
Window Functions OLAP
Moving Average SELECT Product_ID , Sale_Date, Daily_Sales, AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date MovAvg FROM Sales_Table ;
Product_ID Sale_Date _________ Daily_Sales __________ MovAvg _________ _________ 48850.40 48850.400000 1000 2000-09-28 54500.22 51675.310000 1000 2000-09-29 36000.07 46450.230000 1000 2000-09-30 40200.43 44887.780000 1000 2000-10-01 Not all rows 32800.50 42470.324000 1000 2000-10-02 are 64300.00 46108.603333 1000 2000-10-03 displayed 54553.10 47314.960000 1000 2000-10-04 41888.88 46636.700000 2000 2000-09-28 48000.00 46788.177777 2000 2000-09-29 49850.03 47094.363000 2000 2000-09-30
The AVG () Over allows you to get the moving average of a certain column. Page 439
Chapter 14
Window Functions OLAP
Casting a Moving Average SELECT Product_ID , Sale_Date, Daily_Sales, CAST(AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date ) as Decimal (8,2)) AS CastAvg FROM Sales_Table ;
Product_ID ________ Sale_Date _________ Daily_Sales _________
CastAvg _______
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30
48850.40 51675.31 46450.23 43566.91 36333.67 45766.98 50551.20 53580.66 48147.33 46579.64
Not all rows are displayed
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03
Above, we have used a CAST statement to change the data type of the moving average to a Decimal(8,2) data type.
Page 440
Chapter 14
Window Functions OLAP
Partition By Resets an ANSI OLAP SELECT Product_ID , Sale_Date, Daily_Sales, AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date ) AS AVG3, AVG(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date ) AS Continuous FROM Sales_Table; ANSI reset much Like a GROUP BY
Product_ID _________ Sale_Date Daily_Sales _______ AVG3 Continuous _________ _________ ___________
Not all rows are displayed
1000 1000 1000 1000 1000 1000 1000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00
48850.40 51675.31 46450.23 43566.91 36333.67 45788.98 50551.20 53580.66 48147.33
48850.400000 51675.310000 46450.230000 44887.780000 42470.324000 46108.603333 47314.960000 41888.880000 44944.440000
Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets the column using the statement. Notice that only Continuous resets.
Page 441
Chapter 14
Window Functions OLAP
COUNT OVER for a Sequential Number SELECT Product_ID ,Sale_Date , Daily_Sales, COUNT(*) OVER (ORDER BY Product_ID, Sale_Date ) Seq_Number FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID _________ Sale_Date _________ Daily_Sales Seq_Number ________ __________
Not all rows are displayed
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29
1 2 3 4 5 6 7 8 9 10 11
This is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s) ROWS UNBOUNDED PRECEDING causes Seq_Number to start at the beginning and increase sequentially to the end.
Page 442
Chapter 14
Window Functions OLAP
Quiz – What caused the COUNT OVER to Reset? SELECT Product_ID ,Sale_Date , Daily_Sales, COUNT(*) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date ) AS StartOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID Sale_Date _________ _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
What Keyword(s) caused StartOver to reset? Page 443
Daily_Sales _________
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
StartOver _______
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Chapter 14
Window Functions OLAP
Answer to Quiz – What caused the COUNT OVER to Reset? SELECT Product_ID ,Sale_Date , Daily_Sales, COUNT(*) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date ) AS StartOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID ________ Sale_Date ________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
Daily_Sales _________ 48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
StartOver _______ 1 2 3 4 5 6 7 1 2 3 4 5 6 7
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement. Page 444
Chapter 14
Window Functions OLAP
The MAX OVER Command SELECT Product_ID ,Sale_Date , Daily_Sales, MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date) MaxOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID ________ Sale_Date _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
Daily_Sales _________
MaxOver _______
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
48850.40 54500.22 54500.22 54500.22 54500.22 64300.00 64300.00 64300.00 64300.00 64300.00 64300.00 64300.00 64300.00 64300.00
After the sort, the Max() Over shows the Max Value up to that point. Page 445
Chapter 14
Window Functions OLAP
MAX OVER with PARTITION BY Reset SELECT Product_ID ,Sale_Date , Daily_Sales, MAX(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date) AS MaxOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID ________ Sale_Date _________ 1000 1000 1000 1000 Not all 1000 rows 1000 are displayed 1000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01
Daily_Sales _________
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29
MaxOver ________
48850.40 54500.22 54500.22 54500.22 54500.22 64300.00 64300.00 41888.88 48000.00 49850.03 54850.29
The largest value is 64300.00 in the column MaxOver. Once it was evaluated, it did not continue until the end because of the PARTITION BY reset.
Page 446
Chapter 14
Window Functions OLAP
MAX OVER Without Rows Unbounded Preceding SELECT Product_ID ,Sale_Date , Daily_Sales, MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date ) AS MaxOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Daily_Sales ________ MaxOver __________ Sale_Date ________ __________
Not all rows are displayed
1000 1000 1000 1000 1000 1000 1000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03
You don't need the Rows Unbounded Preceding with the MAX OVER.
Page 447
48850.40 54500.22 54500.22 54500.22 54500.22 64300.00 64300.00 64300.00 64300.00 64300.00
Chapter 14
Window Functions OLAP
The MIN OVER Command SELECT Product_ID, Sale_Date ,Daily_Sales ,MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date) AS MinOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
Sale_Date ________ 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
Daily_Sales _________
MinOver _______
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
48850.40 48850.40 36000.07 36000.07 32800.50 32800.50 32800.50 32800.50 32800.50 32800.50 32800.50 32800.50 32800.50 32800.50
After the sort, the MIN () Over shows the Max Value up to that point. Page 448
Chapter 14
Window Functions OLAP
Quiz – Fill in the Blank SELECT Product_ID ,Sale_Date , Daily_Sales, MIN(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date) AS MinOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID _________ Sale_Date Daily_Sales MinOver ________ _________ ________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
The last two answers (MinOver) are blank, so you can fill in the blank. Page 449
48850.40 48850.40 36000.07 36000.07 32800.50 32800.50 32800.50 41888.88 41888.88 41888.88 41888.88 36021.93
Chapter 14
Window Functions OLAP
Answer – Fill in the Blank SELECT Product_ID ,Sale_Date , Daily_Sales, MIN(Daily_Sales) OVER (PARTITION BY Product_ID ORDER BY Product_ID, Sale_Date) AS MinOver FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID _________ Sale_Date Daily_Sales ________ _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04 2000-09-28 2000-09-29 2000-09-30 2000-10-01 2000-10-02 2000-10-03 2000-10-04
The last two answers (MinOver) are filled in.
Page 450
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
MinOver ________ 48850.40 48850.40 36000.07 36000.07 32800.50 32800.50 32800.50 41888.88 41888.88 41888.88 41888.88 36021.93 36021.93 32800.50
Chapter 14
Window Functions OLAP
How Ntile Works SELECT Product_ID, Sale_Date, Daily_Sales ,NTILE (4) OVER (ORDER BY Daily_Sales , Sale_Date ) AS "Quartiles" FROM Sales_Table WHERE Product_ID = 1000;
Product_ID Sale_Date __________ Daily_Sales ________ Quartiles __________ _________ 1000 1000 1000 1000 1000 1000 1000
10/02/2000 09/30/2000 10/01/2000 09/28/2000 09/29/2000 10/04/2000 10/03/2000
32800.50 36000.07 40200.43 48850.40 54500.22 54553.10 64300.00
1 1 2 2 3 3 4
Assigning a different value to the indicator of the Ntile function changes the number of partitions established. Each Ntile partition is assigned a number starting at 1 increasing to a value that is one less than the partition number specified. So, with an Ntile of 4 the partitions are 1 through 4. Then, all the rows are distributed as evenly as possible into each partition from highest to lowest values. Normally, extra rows with the lowest value begin back in the lowest numbered partitions. Page 451
Chapter 14
Window Functions OLAP
Ntile SELECT Last_Name, Grade_Pt, NTILE(5) OVER (ORDER BY Grade_Pt) as "Tile" FROM Student_Table ORDER BY "Tile" DESC;
Last_Name Grade_Pt ____ Tile ________ _________ 3.95 5 Bond 4.00 5 Thomas 3.35 4 Delaney 3.80 4 Wilson 2.88 3 Hanson 3.00 3 Phillips 1.90 2 McRoberts 2.00 2 Smith ? 1 Johnson 0.00 1 Larkins
The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. For example, the example above has 10 rows, so NTILE(5) splits the 10 rows into five equally sized tiles. There are 2 rows in each tile in the order of the OVER() clause's ORDER BY. Page 452
Chapter 14
Window Functions OLAP
Ntile Continued SELECT Dept_No, EmployeeCount, NTILE(2) OVER (ORDER BY EmployeeCount) as "Tile" FROM (SELECT Dept_No, COUNT(*) as EmployeeCount FROM Employee_Table GROUP BY Dept_No ) AS Q ORDER BY "Tile" DESC; Dept_No ________ EmployeeCount _____________ Tile ____ 1 2 300 2 2 200 3 2 400 1 1 ? 1 1 10 1 1 100 The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. For example, the example above has 6 rows, so NTILE(2) splits the 10 rows into 2 equally sized tiles. There are 3 rows in each tile in the order of the OVER() clause's ORDER BY. Page 453
Chapter 14
Window Functions OLAP
Ntile Percentile SELECT Claim_ID, Claim_Date, ClaimCount, NTILE(100) OVER (ORDER BY ClaimCount) as Percentile FROM (SELECT Claim_ID, Claim_Date, COUNT(*) as ClaimCount FROM Claims GROUP BY Claim_ID, Claim_Date ) AS Q ORDER BY Percentile DESC Claim_ID _________ 1302111 4307444 3306333 1304111 2303222 4305444 4303555 3402222 3308333
Claim_Date ClaimCount ___________ __________ 2003-03-01 4 2003-07-05 3 2003-06-28 3 2003-04-28 2 2003-03-12 2 2003-05-12 2 2004-03-01 2 2004-02-28 2 2003-08-01 2
Percentile _________ 26 25 24 23 22 21 20 19 18
Not all rows are displayed
The Ntile function organizes rows into n number of groups. These groups are referred to as tiles. The tile number is returned. Above, is a way to get the percentile.
Page 454
Chapter 14
Window Functions OLAP
Another Ntile Example This example determines the percentile for every row in the Sales table based on the daily sales amount and sorts it into sequence by the value being categorized, which here is daily sales. SELECT Product_ID, Sale_Date, Daily_Sales ,NTILE(100) OVER (ORDER BY Daily_Sales) AS "Quantile" FROM Sales_Table WHERE Product_ID < 2000 ;
Product_ID _________ 1000 1000 1000 1000 1000 1000 1000 Above, is another Ntile example. Page 455
Sale_Date _________
Daily_Sales ________ Quantile __________
10/02/2000 09/30/2000 10/01/2000 09/28/2000 09/29/2000 10/04/2000 10/03/2000
32800.50 36000.07 40200.43 48850.40 54500.22 54553.10 64300.00
1 2 3 4 5 6 7
Chapter 14
Window Functions OLAP
Using Quartiles (Partitions of Four) SELECT Product_ID, Sale_Date, Daily_Sales ,NTILE (4) OVER (Order by Daily_Sales , Sale_Date ) AS "Quartiles" FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
Product_ID __________ 1000 2000 1000 2000 1000 2000 2000 2000 1000 2000 1000 1000 2000 1000
Sale_Date __________ Daily_Sales ________ Quartiles _________ 10/02/2000 32800.50 1 10/04/2000 32800.50 1 09/30/2000 36000.07 1 10/02/2000 36021.93 1 10/01/2000 40200.43 2 09/28/2000 41888.88 2 10/03/2000 43200.18 2 09/29/2000 48000.00 2 09/28/2000 48850.40 3 09/30/2000 49850.03 3 09/29/2000 54500.22 3 10/04/2000 54553.10 4 10/01/2000 54850.29 4 10/03/2000 64300.00 4
Instead of 100 the example above uses a quartile (QUANTILE based on 4 partitions). Page 456
Chapter 14
Window Functions OLAP
NTILE Buckets SELECT Product_ID ,Sale_Date , Daily_Sales, NTILE(4) OVER (ORDER BY Daily_Sales) AS Bucket FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID ________ Sale_Date ________ 1000 2000 1000 1000 2000 1000 2000 2000 2000 1000 2000 1000 1000 2000
10/03/2000 10/01/2000 10/04/2000 09/29/2000 09/30/2000 09/28/2000 09/29/2000 10/03/2000 09/28/2000 10/01/2000 10/02/2000 09/30/2000 10/02/2000 10/04/2000
Daily_Sales _________ 64300.00 54850.29 54553.10 54500.22 49850.03 48850.40 48000.00 43200.18 41888.88 40200.43 36021.93 36000.07 32800.50 32800.50
Bucket ________ 1 1 1 1 2 2 2 2 3 3 3 4 4 4
The NTILE() function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is omitted, the entire input will be sorted using the ORDER BY clause, and then divided into the number of buckets specified. Page 457
Chapter 14
Window Functions OLAP
NTILE Using a Value of 10 SELECT Product_ID ,Sale_Date , Daily_Sales, NTILE(10) OVER (ORDER BY Daily_Sales) AS Bucket FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Sale_Date __________ _________ 1000 10/03/2000 2000 10/01/2000 1000 10/04/2000 1000 09/29/2000 2000 09/30/2000 1000 09/28/2000 2000 09/29/2000 2000 10/03/2000 2000 09/28/2000 1000 10/01/2000 2000 10/02/2000 1000 09/30/2000 1000 10/02/2000 2000 10/04/2000
Daily_Sales Bucket __________ _____ 64300.00 54850.29 54553.10 54500.22 49850.03 48850.40 48000.00 43200.18 41888.88 40200.43 36021.93 36000.07 32800.50 32800.50
1 1 2 2 3 3 4 4 5 6 7 8 9 10
The NTILE() function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is omitted, the entire input will be sorted using the ORDER BY clause, and then divided into the number of buckets specified. This example uses a value of 10 in the NTILE. Page 458
Chapter 14
Window Functions OLAP
NTILE With a Partition SELECT Product_ID ,Sale_Date , Daily_Sales, NTILE(3) OVER (PARTITION BY Product_ID ORDER BY Daily_Sales) AS Bucket FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Product_ID Sale_Date Daily_Sales __________ _________ __________ 32800.50 1000 10/02/2000 36000.07 1000 09/30/2000 40200.43 1000 10/01/2000 48850.40 1000 09/28/2000 54500.22 1000 09/29/2000 54553.10 1000 10/04/2000 64300.00 1000 10/03/2000 32800.50 2000 10/04/2000 36021.93 2000 10/02/2000 41888.88 2000 09/28/2000 43200.18 2000 10/03/2000 48000.00 2000 09/29/2000 49850.03 2000 09/30/2000 54850.29 2000 10/01/2000
Bucket ______ 1 1 1 2 2 3 3 1 1 1 2 2 3 3
The NTILE() function divides the rows into buckets as evenly as possible. In this example, because PARTITION BY is listed, the data will first be sorted by Product_ID and then sorted using the ORDER BY clause (within Product_ID), and then divided into the number of buckets specified. This example uses a value of 3 in the NTILE. Notice that the PARTITION BY statement causes the answer set to reset when the Product_ID goes from 1000 to 2000. Page 459
Chapter 14
Window Functions OLAP
Using LAG and LEAD Compatibility: SQL Server and Azure SQL Data Warehouse Extension The LAG and LEAD functions allow you to compare different rows of a table by specifying an offset from the current row. You can use these functions to analyze change and variation. Syntax for LAG and LEAD: {LAG | LEAD} (, [ [, ]]) OVER ([PARTITION BY [,...]] ORDER BY [ASC | DESC] [,...] ) ;
The above provides information and the syntax for LAG and LEAD.
Page 460
Chapter 14
Window Functions OLAP
Using LEAD SELECT Last_Name, Dept_No ,LEAD(Dept_No) OVER (ORDER BY Dept_No, Last_Name) as "Lead All" ,LEAD(Dept_No) OVER (PARTITION BY Dept_No ORDER BY Dept_No, Last_Name) as "Lead Partition" FROM Employee_Table; LAST_NAME Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling
DEPT_NO ? 10 100 200 200 300 400 400 400
Lead All 10 100 200 200 300 400 400 400 ?
Lead Partition ? ? ? 200 ? ? 400 400 ?
As you can see, the first LEAD brings back the value from the next row except for the last which has no row following it. The offset value was not specified in this example, so it defaulted to a value of 1 row.
Page 461
Chapter 14
Window Functions OLAP
Using LEAD With and Offset of 2 SELECT Last_Name, Dept_No ,LEAD(Dept_No,2) OVER (ORDER BY Dept_No, Last_Name) as "Lead All" ,LEAD(Dept_No,2) OVER (PARTITION BY Dept_No ORDER BY Dept_No, Last_Name) as "Lead Partition" FROM Employee_Table;
LAST_NAME Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling
DEPT_NO ? 10 100 200 200 300 400 400 400
Lead All 100 200 200 300 400 400 400 ? ?
Lead Partition ? ? ? ? ? ? 400 ? ?
Above, each value in the first LEAD is 2 rows away, and the partitioning only shows when values are contained in each value group with 1 more than offset value.
Page 462
Chapter 14
Window Functions OLAP
LEAD SELECT Product_ID ,Sale_Date , Daily_Sales, Daily_Sales - LEAD(Daily_Sales, 1, 0) OVER (ORDER BY Product_ID, Sale_Date) AS Lead1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Sale_Date ________ _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000 09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000
Daily_Sales _________
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
Lead1 ________
-5649.82 18500.15 -4200.36 7399.93 -31499.50 9746.90 12664.22 -6111.12 -1850.03 -5000.26 18828.36 -7178.25 10399.68 32800.50
Above, we compute the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the next row's Daily_Sales, or one whose Daily_Sales is the same). The expression LEAD(Daily_Sales, 1, 0) tells LEAD() to evaluate the expression Daily_Sales on the row that is positioned one row following the current row. If there is no such row (as is the case on the last row of the partition or relation), then the default value of 0 is used. Page 463
Chapter 14
Window Functions OLAP
LEAD With Partitioning SELECT Product_ID ,Sale_Date , Daily_Sales, Daily_Sales - LEAD(Daily_Sales, 1, 0) OVER (PARTITION BY Product_ID ORDER BY Sale_Date) AS Lead1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Sale_Date ________ ________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000 09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000
Daily_Sales _________ 48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
Lead1 ________ -5649.82 18500.15 -4200.36 7399.93 -31499.50 9746.90 54553.10 -6111.12 -1850.03 -5000.26 18828.36 -7178.25 10399.68 32800.50
Above, we compute the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the next row's Daily_Sales, or one whose Daily_Sales is the same). We also partitioned the data by Product_ID. Page 464
Chapter 14
Window Functions OLAP
Using LAG SELECT Last-Name, Dept_No ,LAG(Dept_No) OVER (ORDER BY Dept_No, Last_Name) as "Lag All" ,LAG(Dept_No) OVER (PARTITION BY Dept_No ORDER BY Dept_No, Last_Name) as "Lag Partition" FROM Employee_Table;
LAST_NAME DEPT_NO Jones ? Smythe 10 Chambers 100 Coffing 200 Smith 200 Larkins 300 Harrison 400 Reilly 400 Strickling 400
Lag All ? ? 10 100 200 200 300 400 400
Lag Partition ? ? ? ? 200 ? ? 400 400
From the example above, you see that LAG uses the value from a previous row and makes it available in the next row. For LAG, the first row(s) will contain a null based on the value in the offset, here it defaulted to 1. The first null comes from the function where as the second row gets the null from the first row. Page 465
Chapter 14
Window Functions OLAP
Using LAG With an Offset of 2 SELECT Last_Name, Dept_No ,LAG(Dept_No,2) OVER (ORDER BY Dept_No, Last_Name) as "Lag All" ,LAG(Dept_No,2) OVER (PARTITION BY Dept_No ORDER BY Dept_No, Last_Name) as "Lag Partition" FROM Employee_Table; LAST_NAME Jones Smythe Chambers Coffing Smith Larkins Harrison Reilly Strickling
DEPT_NO ? 10 100 200 200 300 400 400 400
Lag All ? ? ? 10 100 200 200 300 400
Lag Partition ? ? ? ? ? ? ? ? 400
For this example, the first two rows have a null because there is not a row two rows before these. The number of nulls will always be the same as the offset value. There is a third null because Jones Dept_No is null.
Page 466
Chapter 14
Window Functions OLAP
LAG SELECT Product_ID ,Sale_Date , Daily_Sales, Daily_Sales - LAG(Daily_Sales, 1, 0) OVER (ORDER BY Product_ID, Sale_Date) AS Lag1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Sale_Date ________ _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000 09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000
Daily_Sales _________
Lag1 _______
48850.40 54500.22 36000.07 40200.43 32800.50 64300.00 54553.10 41888.88 48000.00 49850.03 54850.29 36021.93 43200.18 32800.50
48850.40 5649.82 -18500.15 4200.36 -7399.93 31499.50 -9746.90 -12664.22 6111.12 1850.03 5000.26 -18828.36 7178.25 -10399.68
Above, we compute the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the previous row's Daily_Sales, or one whose Daily_Sales is the same). The expression LAG(Daily_Sales, 1, 0) tells LAG() to evaluate the expression Daily_Sales on the row that is positioned one row before the current row. If there is no such row (as is the case on the first row of the partition or relation), then the default value of 0 is used. Page 467
Chapter 14
Window Functions OLAP
LAG with Partitioning SELECT Product_ID ,Sale_Date , Daily_Sales, Daily_Sales - LAG(Daily_Sales, 1, 0) OVER (PARTITION BY Product_ID ORDER BY Sale_Date) AS Lag1 FROM Sales_Table WHERE Product_ID IN (1000, 2000) ; Product_ID Sale_Date _________ _________ 1000 1000 1000 1000 1000 1000 1000 2000 2000 2000 2000 2000 2000 2000
09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000 09/28/2000 09/29/2000 09/30/2000 10/01/2000 10/02/2000 10/03/2000 10/04/2000
Daily_Sales Lag1 _________ _______ 48850.40 48850.40 54500.22 5649.82 36000.07 -18500.15 40200.43 4200.36 32800.50 -7399.93 64300.00 31499.50 54553.10 -9746.90 41888.88 41888.88 48000.00 6111.12 49850.03 1850.03 54850.29 5000.26 36021.93 -18828.36 43200.18 7178.25 32800.50 -10399.68
Above, we compute the difference between a product's Daily_Sales and that of the next Daily_Sales in the sort order (which will be the previous row's Daily_Sales, or one whose Daily_Sales is the same). The expression LAG(Daily_Sales, 1, 0) tells LAG() to evaluate the expression Daily_Sales on the row that is positioned one row before the current row. If there is no such row (as is the case on the first row of the partition or relation), then the default value of 0 is used. Page 468
Chapter 14
Window Functions OLAP
SUM(SUM(n)) SELECT Product_ID , SUM(Daily_Sales) as Summy, SUM(SUM(Daily_Sales)) OVER (ORDER BY Sum(Daily_Sales) ) AS Prod_Sales_Running_Sum FROM Sales_Table GROUP BY Product_ID ;
Product_ID __________ Summy _______ Prod_Sales_Running_Sum ___________________ 3000 2000 1000
224587.82 306611.81 331204.72
224587.82 531199.63 862404.35
Window functions can compute aggregates of aggregates, as in the example above.
Page 469
Chapter 15
Page 470
Working with Strings
Chapter 15
Working with Strings
Chapter 15 - Working with Strings
“It’s always been and always will be the same in the world: the horse does the work and the coachman is tipped.” - Anonymous
Page 471
Chapter 15
Working with Strings
The ASCII Function The example below shows you how to convert characters into the integer ASCII value. Syntax: ASCII (string)
SELECT ASCII('H') as AsciiH ,ASCII('o') as AsciiO ,ASCII('w') as AsciiW ,ASCII('d') as AsciiD ,ASCII('y') as AsciiY
AsciiH AsciiO ______ AsciiW _______ AsciiD ______ AsciiY ______ ______ 72
111
119
100
121
The example above shows you how to convert characters into the integer ASCII value. Page 472
Chapter 15
Working with Strings
The CHAR Function The example below shows you how to convert the integer ASCII value into characters. Syntax: CHAR (integer)
SELECT CHAR(72) As CharH ,CHAR(111) As CharO ,CHAR(119) As CharW ,CHAR(100) As CharD ,CHAR(121) As CharY ;
CharH CharO CharW ______ CharD ______ CharY _____ _____ _____ H
o
w
d
y
The example above shows you how to convert the integer ASCII value into characters.
Page 473
Chapter 15
Working with Strings
The UNICODE Function The UNICODE function returns the Unicode integer value for the first character of the character or input expression. Syntax: UNICODE (string)
SELECT UNICODE('H') AS UniH ,UNICODE('o') AS UniO ,UNICODE('w') AS UniW ,UNICODE('d') AS UniD ,UNICODE('y') AS UniY ;
UniH _____ UniO _____
72
111
UniW _____
119
UniD _____ UniY _____
100
121
The example above shows you how to convert characters into the UNICODE value. Page 474
Chapter 15
Working with Strings
The NCHAR Function The NCHAR function takes the integer values and converts them back into characters.
Syntax: NCHAR (Integer)
SELECT NCHAR(72) ,NCHAR(111) ,NCHAR(119) ,NCHAR(100) ,NCHAR(121)
AS NcaH AS NcaO AS NcaW AS NcaD AS NcaY ;
NcaH _____ NcaO _____
NcaW _____
NcaD _____ NcaY _____
H
w
d
o
The example above shows you how to convert integers back to characters.
Page 475
y
Chapter 15
Working with Strings
The LEN Function The LEN function returns the number of characters in an input string. (Ending spaces are automatically excluded for CHAR data types) Syntax: LEN (string) SELECT First_Name ,LEN(First_Name) AS Lnth ,Last_Name ,LEN(Last_Name) AS Lnth FROM Employee_Table First_Name __________
Lnth ____
Last_Name __________
Richard Cletus Mandee Herbert Billy John Squiggy Loraine William
7 6 6 7 5 4 7 7 7
Smythe Strickling Chambers Harrison Coffing Smith Jones Larkins Reilly
Lnth ____ 6 10 8 8 7 5 5 7 6
The LEN function returns the number of characters in the input string and not necessarily the number of bytes.
Page 476
Chapter 15
Working with Strings
The DATALENGTH Function The DATALENGTH function returns the number of characters in an input string. (Ending spaces are automatically included for CHAR data types) Syntax: DATALENGTH (string) SELECT First_Name ,DATALENGTH(First_Name) AS Lnth ,Last_Name ,DATALENGTH(Last_Name) AS Lnth FROM Employee_Table First_Name __________
Lnth ____
Last_Name __________
Lnth ____
Richard Cletus Mandee Herbert Billy John Squiggy Loraine William
7 6 6 7 5 4 7 7 7
Smythe Strickling Chambers Harrison Coffing Smith Jones Larkins Reilly
20 20 20 20 20 20 20 20 20
The DATALENGTH function returns the number of characters in the input string and not necessarily the number of bytes. The difference between the LEN and the DATALENGTH functions is that the LEN function excludes trailing spaces. The DATALENGTH function counts them. Notice that each length is 20 characters for the Last_Name lengths. Page 477
Chapter 15
Working with Strings
Concatenation
The + sign means concatenate
SELECT First_Name ,Last_Name ,First_Name A space + '' + Last_Name as Full_Name FROM Employee_Table WHERE First_Name = 'Squiggy'
First_Name _________
Last_Name Full_Name _________ ___________
Squiggy
Jones
Squiggy Jones concatenated
See those + signs? Those represent concatenation. That allows you to combine multiple columns into one column. The + in this example has combined the first name, then a single space, and then the last name to get a new column called ‘Full name’. We brought back the full name of Squiggy Jones. Page 478
Chapter 15
Working with Strings
The RTRIM and LTRIM Command trims Spaces RTRIM Query
SELECT Last_Name ,RTRIM(Last_Name) AS Trim_Trailing_Spaces FROM Employee_Table ;
LTRIM Query SELECT Last_Name ,LTRIM(Last_Name) AS Trim_Leading_Spaces FROM Employee_Table ;
Trimming Both Leading and Trailing Spaces Query SELECT Last_Name ,LTRIM(RTRIM(Last_Name)) AS Trim_Spaces_Leading_Trailing FROM Employee_Table ; The RTRIM command trims trailing spaces from a character string. The LTRIM trims leading spaces from a character string. The LTRIM(RTRIM) combination trims both leading and trailing spaces from a character string.. Page 479
Chapter 15
Working with Strings
The SUBSTRING Command SELECT First_Name, SUBSTRING (First_Name, 2, 3) AS Quiz FROM Employee_Table ; Start in position 2
First_Name __________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine
Go for 3 positions
Quiz ______ qui ohn ich erb and let ill ill ora
This is a SUBSTRING. The substring is passed two parameters, and they are the starting position of the string and the number of positions to return (from the starting position). The above example will start in position 2 and go for 3 positions! Page 480
Chapter 15
Working with Strings
Using SUBSTRING to move Backwards SELECT First_Name, SUBSTRING (First_Name , 0 , 6) AS Before1 FROM Employee_Table ; Start in Position 0 (one space before)
First_Name Before1 __________ ________ Squiggy Squig John John Richard Richa Herbert Herbe Mandee Mande Cletus Cletu William Willi Billy Billy Loraine Lorai A starting position of zero moves one space in front of the beginning. Notice that our FOR Length is 6 so ‘Squiggy’ turns into ‘ Squig’. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other examples. Page 481
Chapter 15
Working with Strings
How SUBSTRING Works with a Starting Position of -1 SELECT First_Name, SUBSTRING (First_Name , -1 , 3) AS Before2 FROM Employee_Table ; Start in Position -1. This is two spaces before.
First_Name Before2 __________ ________ Squiggy S John J Richard R Herbert H Mandee M Cletus C William W Billy B Loraine L A starting position of -1 moves two spaces in front of the beginning. Notice that our FOR Length is 3, so each name delivers only the first initial. The point being made here is that both the starting position and ending positions can move backwards which will come in handy as you see other examples. Page 482
Chapter 15
Working with Strings
How SUBSTRING Works with an Ending Position of 0 SELECT First_Name, SUBSTRING (First_Name , 3 , 0) AS WhatsUp FROM Employee_Table ; Go for 0 positions
First_Name WhatsUp __________ ________ Squiggy John Richard Herbert Mandee Cletus William Billy Loraine In our example above, we start in position 3, but we go for zero positions, so nothing is delivered in the column. That is what’s up!
Page 483
Chapter 15
Working with Strings
Concatenation and SUBSTRING A Period (.) and a space
SELECT First_Name ,Last_Name ,Substring(First_Name, 1, 1) + '. ' + Last_Name as Full_Name FROM Employee_Table
First_Name _________ Last_Name ____________ Full_Name _________ Richard Smythe R. Smythe Cletus Strickling C. Strickling Mandee Chambers M. Chambers Herbert Harrison H. Harrison Billy Coffing B. Coffing John Smith J. Smith Squiggy Jones S. Jones Loraine Larkins L. Larkins William Reilly W. Reilly Of the three items being concatenated together, what is the first item of concatenation in the example above? The first initial of the First_Name. Then, we concatenated a literal space and a period. Next, we concatenated the Last_Name. Page 484
Chapter 15
Working with Strings
SUBSTRING and Different Aliasing SELECT Phone_Number ,First3digits = SUBSTRING(Phone_Number, 1, 3) ,Exchange = SUBSTRING(Phone_Number, 5,4) FROM Customer_Table WHERE Phone_Number LIKE '[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'
Phone_Number __________ First3digits _________ Exchange ____________ 555-1234 555 1234 555-1111 555 1111 555-1212 555 1212 347-8954 347 8954 322-1012 322 1012
Above, we are using the Substring commands to extract certain portions of the Phone_Number. Notice that the column names are materialized at the beginning of the line. This is almost like a reverse alias.
Page 485
Chapter 15
Working with Strings
The LEFT and RIGHT Functions The LEFT and RIGHT functions are abbreviations of the SUBSTRING function. They return a requested number of characters from the left or right end of the input string. Syntax: LEFT(string, n), RIGHT(string, n)
SELECT First_Name ,LEFT (First_Name , 1) AS First_Initial ,Last_Name ,Right (RTRIM(Last_name), 2) AS "Last Two Letters" FROM Employee_Table WHERE Dept_No in (400) ; First_Name __________
First_Initial __________
Last_Name Last Two Letters __________ ______________
Cletus Herbert William
C H W
Strickling Harrison Reilly
ng on ly
In our example above, our result set will have the First_Name and Last_Name coming back, but we also use the LEFT and RIGHT functions to produce the first letter of the First_Name and the last two letters of the Last_Name. We filtered the rows with an additional WHERE clause to only bring back three rows. Notice the RTRIM of Last_Name. This is necessary because the Last_Name column has a data type of Character 20. This is padded with spaces. Page 486
Chapter 15
Working with Strings
Four Concatenations Together CHAR(20)
VARCHAR(12)
SELECT First_Name ,Last_Name ,RTRIM(Last_Name) + ' ' + Substring(First_Name, 1, 1) + '.' AS Last_Name_1st FROM Employee_Table ;
First_Name Last_Name_1st __________ Last_Name _________ _____________ Richard Cletus Mandee Herbert Billy John Squiggy Loraine William
Smythe Strickling Chambers Harrison Coffing Smith Jones Larkins Reilly
Smythe R. Strickling C. Chambers M. Harrison H. Coffing B. Smith J. Jones S. Larkins L. Reilly W.
Why did we TRIM the Last_Name? To get rid of the spaces or the output would have looked odd. How many items are being concatenated in the example above? There are 4 items concatenated. We start with the Last_Name (after we trim it), then we have a single space, then we have the First Initial of the First Name, and then we have a Period. Page 487
Chapter 15
Working with Strings
The DATALENGTH Function and RTRIM The DATALENGTH function returns the number of characters in an input string. (Ending spaces are automatically included for CHAR data types) Syntax: DATALENGTH (string) SELECT First_Name ,DATALENGTH(First_Name) AS Lnth ,Last_Name ,DATALENGTH(RTRIM(Last_Name)) AS Lnth FROM Employee_Table First_Name __________
Lnth ____
Last_Name __________
Richard Cletus Mandee Herbert Billy John Squiggy Loraine William
7 6 6 7 5 4 7 7 7
Smythe Strickling Chambers Harrison Coffing Smith Jones Larkins Reilly
Lnth ____ 6 10 8 8 7 5 5 7 6
The DATALENGTH function returns the number of characters in the input string and not necessarily the number of bytes. The difference between the LEN and the DATALENGTH functions is that the LEN function excludes trailing spaces. However, the DATALENGTH function counts them. Use either the LEN function or merely RTRIM with DATALENGTH. Page 488
Chapter 15
Working with Strings
A Visual of the TRIM Command Using Concatenation Concatenation without Trim and with Trim SELECT Last_Name concatenate ,First_Name ,Last_Name + First_Name as NameBackwards ,RTRIM(Last_Name) + First_Name as TrimNameBackwards FROM Employee_Table
Last_Name First_Name __________ __________ Jones Squiggy Smith John Smythe Richard Harrison Herbert Chambers Mandee Strickling Cletus Reilly William Coffing Billy Larkins Loraine
NameBackwards TrimNameBackwards ______________________ __________________ Jones Squiggy JonesSquiggy Smith John SmithJohn Smythe Richard SmytheRichard Harrison Herbert HarrisonHerbert Chambers Mandee ChambersMandee Strickling Cletus StricklingCletus Reilly William ReillyWilliam Coffing Billy CoffingBilly Larkins Loraine LarkinsLoraine
When you use the RTRIM command on a column, that column will have trailing spaces removed.
Page 489
Chapter 15
Working with Strings
CHARINDEX Function Finds a Letter(s) Position in a String Tell this function what character(s) to look for in a string, and optionally, what starting position to first start looking. If it does not find the character(s) in the string it returns a 0. It also only reports the first occurrence. Syntax: CHARINDEX(substring, string[, start_pos]) SELECT Last_Name ,CHARINDEX ('e', Last_Name) AS Find_E ,CHARINDEX ('f', Last_Name) AS Find_F ,CHARINDEX ('th', Last_Name) AS Find_TH ,CHARINDEX ('in', Last_Name, 6) AS Find_es_after_6 FROM Employee_Table WHERE Last_Name IN ('Smith', 'Smythe', 'Strickling', 'Coffing') ORDER BY 1 DESC; Last_Name _________ Strickling Smythe Smith Coffing
Find_E ______ 0 6 0 0
Find_F ______ 0 0 0 3
Find_TH ________ Find_ing_after_6 ______________ 0 4 4 0
8 0 0 0
Strickling does not have an 'e', 'f' or 'th' in it, but it does have an 'in' starting in position 8. Coffing shows only the first 'f' in position 3, but notice that Coffing also has an 'in'. However, we stated to start looking in position 6, thus a zero was returned to indicate it didn't find an occurrence. Smith and Smythe both have a 'th' starting in position 4. Page 490
Chapter 15
Working with Strings
The CHARINDEX Command is brilliant with SUBSTRING Starting position is a subquery. Find the first space and subtract two.
SELECT Last_Name ,SUBSTRING (Last_Name, CHARINDEX(' ', Last_name) -2 , 2) as Last_Two_Letters FROM Employee_Table; Last_Name _________
Smythe Strickling Chambers Harrison Coffing Smith Jones Larkins Reilly
Last_Two_Letters _____________
he ng rs on ng th es ns ly
What was the starting position of the Substring in the above query? It uses a subquery. Page 491
Chapter 15
Working with Strings
The CHARINDEX Command Using a Literal The phrase we are seeking to find
The 1st character of the phrase starts here
SELECT CHARINDEX('May flowers', 'April showers bring May flowers') ;
(No column name) _____________ 21
We are looking for the phrase May flowers. This starts in position 21 of the substring Page 492
Chapter 15
Working with Strings
PATINDEX Function The PATINDEX, better named "Pattern Index" will find patterns in an argument somewhat similar to the LIKE command. The following example will show how to find the first occurrence of a digit within a string. Syntax: PATINDEX(pattern, string) SELECT PATINDEX('%[0-9]%', 'July 4th Holiday') as Number_Position; Give me the position of any number between 0-9 in the string
Number_Position _______________ 6
The "Pattern Index", referred to as PATINDEX will look for a pattern in a string and give you the position of the first character in the pattern. Above, we are using the literal 'July 4th Holiday', but we could have used a column value. The number 4 is in the 6th position of the value. Page 493
Chapter 15
Working with Strings
PATINDEX Function to Find a Character Pattern The PATINDEX, better named "Pattern Index" will find patterns in an argument somewhat similar to the LIKE command. The example below will find any occurrence where the column Street has a 3 before the St.
Syntax: PATINDEX(pattern, string) SELECT Subscriber_No, Street, PATINDEX('%[3]%St%', Street) As "Street_3" FROM Addresses Subscriber_No ____________ 5555555 2222222 4444444 1111111 3333333
Street Street_3 __________________ ________ 121 Jump St. 123 Some St. 12 Jump St. 123 Any St. 2468 Appreciate Ave.
0 3 0 3 0
The "Pattern Index", referred to as PATINDEX will look for a pattern in a string and give you the position of the first character in the pattern. Above, we are using the column Street to see if there is a 3 before the St. Notice that we have two hits and they are both in the 3rd position of the column Street. Page 494
Chapter 15
Working with Strings
SOUNDEX Function to Find a Sound The SOUNDEX, better named "Sound" will display similar sounding items. The example below will find any Last_Name that sounds like 'Smith'.
Syntax: SOUNDEX(String)
SELECT DISTINCT SOUNDEX(Last_Name) SoundsLike1 ,SOUNDEX('Smith') SoundsLike2 ,Last_Name FROM Employee_Table WHERE SOUNDEX(Last_Name) = SOUNDEX('Smith') SoundsLike1 Last_Name ___________ SoundsLike2 ___________ _________ S530 S530 Smith S530 S530 Smythe
Call center employees often look up customers by last name while speaking with the customer on the phone. The employees would like to guess at the spelling of the name to narrow the search results and then work with the customer to determine the appropriate spelling. This is what the SOUNDEX function does. Above, we are looking at anyone who has a name that sounds like 'Smith'. We got two results back in 'Smith' and 'Smythe'. Page 495
Chapter 15
Working with Strings
DIFFERENCE Function to Quantile a Sound The DIFFERENCE function will display similar sounding items and give them a quantile of 4 (high similarity) to a low of 0 (low similarity).
SELECT DISTINCT SOUNDEX(Last_Name) AS Sound1 ,SOUNDEX('smith') AS Sound_Smith ,DIFFERENCE(Last_Name, 'Smith') as High4Low0 ,Last_Name FROM Employee_Table ORDER BY 3 DESC ; Sound1 Sound_Smith __________ High4Low0 __________ Last_Name ______ ___________ 4 Smith S530 S530 Sounds a lot 4 Smythe S530 S530 like 'Smith' 2 Jones J520 S530 2 Reilly R400 S530 2 Strickling S362 S530 1 Coffing C152 S530 1 Chambers C516 S530 H625 S530 Sounds nothing 1 Harrison like 'Smith' 1 Larkins L625 S530
Call center employees often look up customers by last name while speaking with the customer on the phone. The employees would like to guess at the spelling of the name to narrow the search results and then work with the customer to determine the appropriate spelling. The SOUNDEX and DIFFERENCE functions can both be used. Above, we are using the DIFFERENCE function to show how close the name 'Smith' is to other Last_Name values. Page 496
Chapter 15
Working with Strings
The REPLACE Function The REPLACE function replaces all occurrences of substring1 in the string with substring2. Syntax: REPLACE(string, substring1, substring2) SELECT Customer_Name ,REPLACE (Customer_Name, ' ', '_') AS Under_Score ,Phone_Number ,REPLACE (Phone_Number, '-', ' ') AS No_Dash FROM Customer_table Customer_Name Under_Score ________________ ________________ Phone_Number _______________ No_Dash _________ Billy's Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
Billy's_Best_Choice Acme_Products ACE_Consulting XYZ_Plumbing Databases_N-U
Replace spaces with underscores
555-1234 555-1111 555-1212 347-8954 322-1012
555 1234 555 1111 555 1212 347 8954 322 1012
Replace dashes with spaces
The RELACE function replaces a value for another in a string. Above, we have replaced the spaces in a Customer Name with underscores. In the Phone Number we have replace the dashes (-) with a space. Page 497
Chapter 15
Working with Strings
LEN and REPLACE Functions for Number of Occurrences SELECT Last_Name ,LEN(Last_Name) - LEN(REPLACE(Last_Name, 'r', '')) AS Num_of_Occur FROM Employee_Table WHERE Last_Name LIKE '%r%' Two single quotes Last_Name ____________ Num_of_Occur _______________ Strickling Chambers Harrison Larkins Reilly
1 1 2 1 1
The LEN function returns the number of characters in an input string.
Syntax: LEN (string) The REPLACE function replaces all occurrences of substring1 in the string with substring2. Syntax: REPLACE(string, substring1, substring2)
The RELACE function and LEN function can be combined to find the number of occurrences of a character. You can use the REPLACE function to count the number of occurrences of a character within a string. To do this, you replace all occurrences of the character with an empty string (zero characters) and calculate the original length of the string minus the new length. Page 498
Chapter 15
Working with Strings
REPLICATE Function The REPLICATE function replicates a string a requested number of times. Syntax: REPLICATE(string, n) SELECT Last_Name ,Class_Code ,REPLICATE(Class_Code, 3) AS Repeat_3_Times ,REPLICATE('Go Wildcats! ', 2) AS UofA FROM Student_Table Last_Name __________ Phillips Hanson Wilson Thomas Johnson McRoberts Bond Delaney Smith Larkins
Class_Code __________ SR FR SO FR ? JR JR SR SO FR
Repeat_3_Times ______________ SRSRSR FRFRFR SOSOSO FRFRFR ? JRJRJR JRJRJR SRSRSR SOSOSO FRFRFR
UofA ______________________ Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats! Go Wildcats!
The REPLICATE function replicates a string a number of times. Above, notice we replicated the class_code column 3 times. Also, notice that we replicated a literal value of 'Go Wildcats! ' 2 times. Did you notice that Johnson had a null value for his Class_Code? The Null value did not replicate. Page 499
Chapter 15
Working with Strings
STUFF Function The STUFF function works on a character string and will put STUFF where you want STUFF after deleting STUFF.
Syntax: STUFF(string, pos, delete_length, insertstring)
SELECT Start in Delete Put in nd position 2 1 Character 'enior' First_Name ,Class_Code ,STUFF (Class_Code, 2, 1, 'enior') As Full_Class_Code FROM Student_Table WHERE Class_Code = 'SR' First_Name __________ Martin Danny
Class_Code __________ SR SR
Full_Class_Code _______________ Senior Senior
The STUFF function operates on an input parameter string. It deletes as many characters as the number specified in the delete_length parameter, starting at the character position specified in the pos input parameter. The function inserts the string specified in the insertstring parameter in position pos. If you decide to insert a string and not delete anything, you can specify a length of 0 as the third argument. Page 500
Chapter 15
Working with Strings
STUFF without Deleting Function The STUFF function works on a character string and will put STUFF where you want STUFF after deleting STUFF. Syntax: STUFF(string, pos, delete_length, insertstring)
Start in 1st position
Delete 0 Characters
Put in 'Course: '
SELECT Course_Name ,STUFF (Course_Name, 1, 0, 'Course: ') As Course_Added FROM Course_Table Course_Name _____________________ Advanced SQL Database Administration Introduction to SQL Physical Database Design SQL Server Concepts V2R3 SQL Features
Course_Added ____________________________ Course: Advanced SQL Course: Database Administration Course: Introduction to SQL Course: Physical Database Design Course: SQL Server Concepts Course: V2R3 SQL Features
Above, we decided not to delete anything, but to insert a string called 'Course: ', so we specified a length of 0 as the third argument. The STUFF function operates on an input parameter string. It deletes as many characters as the number specified in the delete_length parameter, starting at the character position specified in the pos input parameter. The function inserts the string specified in the insertstring parameter in position pos. Page 501
Chapter 15
Working with Strings
UPPER and lower Functions The UPPER and LOWER functions convert the input string to either all uppercase or lowercase characters. Syntax: UPPER(string), LOWER(string)
SELECT First_Name ,UPPER (First_Name) as "Upper Case" ,lower(First_Name) as "Lower Case" FROM Student_Table First_Name __________ Martin Henry Susie Wendy Stanley Richard Jimmy Danny Andy Michael
Upper Case Lower Case __________ __________ MARTIN martin HENRY henry SUSIE susie WENDY wendy STANLEY stanley RICHARD richard JIMMY jimmy DANNY danny ANDY andy MICHAEL michael
The UPPER and LOWER functions convert the input string to either all uppercase or lowercase characters. Page 502
Chapter 16
Page 503
Interrogating the Data
Chapter 16
Interrogating the Data
Chapter 16 - Interrogating the Data
"The difference between genius and stupidity is that genius has its limits" - Albert Einstein
Page 504
Chapter 16
Interrogating the Data
Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ;
Can you guess what would return in the Answer Set? Using the Student_Table above, and try and predict what the answer will be if this query was running on the system. Page 505
Chapter 16
Interrogating the Data
Answer to Quiz – What would the Answer be? Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / (Grade_Pt * 2 ) as Math1 FROM Student_Table ORDER BY 1,2 ; Error – Division by zero
You get an error when you DIVIDE by ZERO! Let’s turn the page and fix it!
Page 506
Chapter 16
Interrogating the Data
The NULLIF Command Student_Table Student_ID _________ 423400 231222 280023 322133 125634 333450 324652 260000 234121 123250
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Wilson Susie SO 3.80 McRoberts Richard JR 1.90 Bond Jimmy JR 3.95 Hanson Henry FR 2.88 Smith Andy SO 2.00 Delaney Danny SR 3.35 Johnson Stanley ? ? Thomas Wendy FR 4.00 Phillips Martin SR 3.00
SELECT Class_Code ,Grade_Pt / ( NULLIF (Grade_pt,0) * 2 ) AS Math1 FROM Student_Table; SELECT Class_Code ,Grade_Pt / ( NULLIF( (Grade_pt) * 2, 0 ) ) AS Math1 FROM Student_Table;
If you have a calculation where a ZERO could kill the operation, and you don’t want that, you can use the NULLIF command to convert any zero value to a null value. Both queries above bring back the same result. Page 507
Chapter 16
Interrogating the Data
Quiz – Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 __________ ____ Larkins Phillips Thomas
GP2 ____
What would the above Answer Set produce from your analysis? Page 508
GP3 ____
Chapter 16
Interrogating the Data
Answer– Fill in the Answers for the NULLIF Command Student_Table Student_ID _________ 423400 123250 234121
Last_Name First_Name Grade_Pt __________ __________ Class_Code __________ ________ Larkins Michael FR 0.00 Phillips Martin SR 3.00 Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after ,NULLIF(Grade_Pt, 0) AS GP1 looking at the table ,NULLIF(Grade_Pt, 3.0) AS GP2 and the query. ,NULLIF(Grade_Pt, 4.0) AS GP3 FROM Student_Table WHERE Student_ID IN (423400, 123250, 234121) ORDER BY Last_Name ; Last_Name GP1 GP2 __________ ____ ____ ? 0.00 Larkins 3.00 ? Phillips 4.00 4.00 Thomas
GP3 ____ 0.00 3.00 ?
Look at the answers above, and if it doesn’t make sense, go over it again until it does.
Page 509
Chapter 16
Interrogating the Data
The COALESCE Command – Fill In the Answers Student_Table Student_ID _________ 423400 260000 234121
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00
SELECT Fill in the Answer Last_Name Set below after looking at the table ,Grade_Pt and the query. ,Class_Code ,COALESCE (Grade_Pt, Class_Code) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ; Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas
? 0.00 4.00
Class_Code __________ ValidStudents ___________ ? FR FR
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Page 510
Chapter 16
Interrogating the Data
The COALESCE Answer Set Student_Table Student_ID _________ 423400 260000 234121
Last_Name First_Name __________ __________ Class_Code __________ Grade_Pt ________ Larkins Michael FR 0.00 Johnson Stanley ? ? Thomas Wendy FR 4.00
SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Class_Code) as ValidStudents FROM Student_Table WHERE Last_Name IN ('Johnson', 'Larkins', 'Thomas') ORDER BY 1 ;
Last_Name Grade_Pt __________ ________ Johnson Larkins Thomas
? 0.00 4.00
Class_Code __________ ValidStudents ___________ ? FR FR
? 0.00 4.00
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null.
Page 511
Chapter 16
Interrogating the Data
COALESCE is Equivalent to This CASE Statement SELECT Last_Name ,Grade_Pt ,Class_Code ,COALESCE (Grade_Pt, Class_Code) as ValidStudents FROM Student_Table ; SELECT Last_Name ,Grade_Pt ,Class_Code , CASE WHEN Grade_Pt IS NOT NULL THEN Grade_Pt WHEN Class_Code IS NOT NULL THEN Class_Code ELSE NULL END as ValidStudents FROM Student_Table ;
Coalesce returns the first non-Null value in a list, and if all values are Null, returns Null. Above, are two queries that return the exact same answer set. These examples are designed to give you a better idea of how Coalesce works
Page 512
Chapter 16
Interrogating the Data
The Basics of CAST (Convert and Store) CAST will convert a column or value’s data type temporarily into another data type. Below is the syntax:
SELECT CAST( AS [()] ) FROM ; Convert smallint to character
Examples using CAST:
CAST ( CAST ( CAST ( CAST (
AS CHAR(5) ) AS INTEGER ) AS VARCHAR(5) ) AS FLOAT )
Truncates decimals
Data can be converted from one type to another by using the CAST function. As long as the data involved does not break any data rules (i.e. placing alphabetic or special characters into a numeric data type), the conversion works. The name of the CAST function comes from the Convert And STore operation that it performs.
Page 513
Chapter 16
Interrogating the Data
Some Great CAST (Convert and Store) Examples SELECT CAST('ABCDE' AS CHAR(1) ) AS Trunc ,CAST(128 AS CHAR(3) ) AS This_Is_OK ,CAST(127 AS INTEGER ) AS Bigger ;
_____ ____ Trunc This_Is_OK ______ A 128
Bigger ______ 127
The first CAST truncates the five characters (left to right) to form the single character ‘A’. In the second CAST, the integer 128 is converted to three characters and left justified in the output. The 127 was initially stored in a SMALLINT (5 digits - up to 32767) and then converted to an INTEGER. Hence, it uses 11 character positions for its display, ten numeric digits and a sign (positive assumed) and right justified as numeric.
Page 514
Chapter 16
Interrogating the Data
Some Great CAST (Convert and Store) Examples SELECT CAST(121.53 AS SMALLINT) AS Whole ,CAST(121.53 AS DECIMAL(3,0)) AS Rounder ;
______ _______ Whole Rounder 121 122
The value of 121.53 was initially stored as a DECIMAL as 5 total digits with 2 of them to the right of the decimal point. Then, it is converted to a SMALLINT using CAST to remove the decimal positions. Therefore, it truncates data by stripping off the decimal portion. It does not round data using this data type. On the other hand, the CAST in the fifth column called Rounder is converted to a DECIMAL as 3 digits with no digits (3,0) to the right of the decimal, so it will round data values instead of truncating. Since .53 is greater than .5, it is rounded up to 122.
Page 515
Chapter 16
Interrogating the Data
A Rounding Example SELECT CAST(.014 ,CAST(.016 ,CAST(.015 ,CAST(.0150 ,CAST(.0250 ,CAST(.0159
Digit to Right of rounding digit < 5 (no change)
.014 ____ 0.01
AS Decimal(3,2)) AS Decimal(3,2)) AS Decimal(3,2)) AS Decimal(3,2)) AS Decimal(3,2)) AS Decimal(3,2))
AS ".014" AS ".016" AS ".015" AS ".0150" AS ".0250" AS ".0159"
Digit to Right of rounding digit > 5 (increase 1)
.016 ____ 0.02
.015 ____ 0.02
.0150 _____ 0.02
.0250 _____ 0.03
Above, is an example of what you might expect to see in similar rounding examples.
Page 516
.0159 _____ 0.02
Chapter 16
Interrogating the Data
Quiz - CAST Examples SELECT Order_Number as OrdNo ,Customer_Number as CustNo ,Order_Date ,Order_Total ,CAST(Order_Total as integer) as Chopped ,CAST(Order_Total as Decimal(5,0)) as Rounded FROM Order_Table ORDER BY 1 ; Fill in the Answer Set below after looking at the data and the query.
OrdNo _________ CustNo Order_Date Order_Total _______ __________ __________ Chopped _______ 123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
1998-05-04 1999-01-01 1999-10-01 1999-10-10 1999-09-09
Rounded _______
12347.53 8005.91 5111.47 15231.62 23454.84
The Column Chopped takes Order_Total (a Decimal (10,2) and CASTs it as an integer which chops off the decimals. Rounded CASTs Order_Total as a Decimal (5,0), which takes the decimals and rounds up if the decimal is .50 or above. Page 517
Chapter 16
Interrogating the Data
Answer to Quiz - CAST Examples SELECT Order_Number as OrdNo ,Customer_Number as CustNo ,Order_Date ,Order_Total ,CAST(Order_Total as integer) as Chopped ,CAST(Order_Total as Decimal(5,0)) as Rounded FROM Order_Table ORDER BY 1 ;
OrdNo _________ CustNo Order_Date Order_Total _______ __________ __________ Chopped _______
123456 123512 123552 123585 123777
11111111 11111111 31323134 87323456 57896883
1998-05-04 1999-01-01 1999-10-01 1999-10-10 1999-09-09
12347.53 8005.91 5111.47 15231.62 23454.84
12347 8005 5111 15231 23454
Rounded _______
12348 8006 5111 15232 23455
The Column Chopped takes Order_Total (a Decimal (10,2) and CASTs it as an integer which chops off the decimals. Rounded CASTs Order_Total as a Decimal (5,0), which takes the decimals and rounds up if the decimal is .50 or above.
Page 518
Chapter 16
Interrogating the Data
Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Physical Database Design SQL Features
This is a CASE STATEMENT which allows you to evaluate a column in your table, and from that, come up with a new answer for your report. Every CASE begins with a CASE, and they all must end with a corresponding END. What would the answer be? Page 519
Chapter 16
Interrogating the Data
Answer to Quiz - The Basics of the CASE Statements Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ ? Physical Database Design Two Credits SQL Features
The answer for the Physical Database Design class is null. This is because it fell through the case statement. The answer for the SQL Features course is Two Credits. Once a case statement gets a match, it leaves the statement and gets the next row. Page 520
Chapter 16
Interrogating the Data
Using an ELSE in the Case Statement Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' ELSE 'Four Credits' END AS CreditAlias FROM Course_Table WHERE Course_ID IN (220, 300) ; Course_Name ______________________ CreditAlias ____________ Four Credits Physical Database Design Two Credits SQL Features
Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through.
Page 521
Chapter 16
Interrogating the Data
Using an ELSE as a Safety Net Course_Table Course_ID _________ 100 200 210 220 300 400
Course_Name Credits _____________________ ______ Seats _____ Database Concepts 3 50 Introduction to SQL 3 20 Advanced SQL 3 22 SQL Features 2 25 Physical Database Design 4 20 Database Administration 4 16
SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' WHEN 4 THEN 'Four Credits' ELSE 'Do not know' END AS CreditAlias FROM Course_Table ; Now that we have an ELSE in our case statement we are guaranteed that nothing will fall through. An ELSE should be used in case you forgot a possibility and there was no match.
Page 522
Chapter 16
Interrogating the Data
Rules For a Valued Case Statement SELECT Course_Name ,CASE Credits WHEN 1 THEN 'One Credit' WHEN 2 THEN 'Two Credits' WHEN 3 THEN 'Three Credits' Else 'Credits not found' END AS CreditAlias FROM Course_Table ;
The column Credits (in blue) follows the word CASE. This is a valued case statement. The value is the column Credits.
Rules for a Valued CASE: 1. You can only check for equality 2. You can only check the value of the column Credits
There are two types of CASE statements. There is the Valued CASE and the Searched CASE. Above, are the rules for the Valued CASE statement.
Page 523
Chapter 16
Interrogating the Data
Rules for a Searched Case Statement SELECT Course_Name No Value follows the ,CASE word CASE. This is WHEN Credits = 1030 is in page 3.
Page 549
Chapter 17
Table Create and Data Types
The Building of a B-Tree for a Clustered Index (3 of 3) 1001
Intermediate Node 1001
Header
1030
Header
2000
Header
3000
6000
Intermediate Node 3000
Header
4000
Header
5000
Header
Root Node
Intermediate Node 6000
Header
7000
Header
8000
Header
Leaf Pages containing the actual data rows
Let's look at this B-Tree starting at the leaf level. Each leaf is an 8 K page that contains data rows. Each data row has a RowID containing the FileID:PageNo:RowNum, which takes up 8 bytes. The rows are sorted in each page by Employee_No. Each Intermediate node has a pointer to the first RowID and Employee_No for every leaf it is responsible for. The Root node has a pointer to the first RowID and Employee_No for each Intermediate node. As a leaf adds rows and expands past 8 K, it splits. As an Intermediate node adds leafs and expands past 8 K, it splits into two more Intermediate nodes. As a Root node continues to add more Intermediate node pointers and expands past 8 K, it splits into two Root nodes. The reason they call it a B-Tree (Balanced Tree) is because every row can be retrieved at the exact same speed. . Page 550
Chapter 17
Table Create and Data Types
The Row Offset Array is the Guidance System For Every Row Previous Page# - 1
PAGE 2
Next Page# - 3
1000 2 1 1
1001 100 Rafael Minal 90000
1000 2 2 1
1004 400 Kyle
1000 2 3 1
1007 200 Sushma Davis 50000
1000 2 4 1
1020 200 May
1000 2 5 1
1030 500 Dawn Wilson 50000
1000 2 6 1
1040 300 Red
Saylor 40000
1000 2 7 1
1050 300 Rex
Mason 60000
1000 2 8 1
1060 400 Kit
Wagner 50000
Row Offset Array (ROA)
Stover 60000 Jones
60000
The Row Offset Array guides every search. It holds the starting position of every row within the page. It is always in perfect descending order. (The first row (yellow) on the right represents the starting position of the first row on the page (also in yellow).
2 Bytes for each ROA slot
798 698 598 498 398 298 198 98
When a page of data is moved from disk into memory it is ready to be searched to produce an answer set. Every read of every row will first go through the Row Offset Array. The page above holds eight rows. In each case, the Row Offset Array will be read and that will guide the Azure SQL Data Warehouse directly to the offset of the row. For example, to read the first row on the page the Azure SQL Data Warehouse will go to the last slot in the Row Offset Array (in yellow) and it will know that the first row starts in byte 98. It will then go to byte 98 and read the row. Page 551
Chapter 17
Table Create and Data Types
The Row Offset Array Provides Two Search Options (1 of 2) Previous Page# - 1
PAGE 2
Next Page# - 3
1000 2 1 1
1001 100 Rafael Minal 90000
1000 2 2 1
1004 400 Kyle
1000 2 3 1
1007 200 Sushma Davis 50000
1000 2 4 1
1020 200 May
1000 2 5 1
1030 500 Dawn Wilson 50000
1000 2 6 1
1040 300 Red
Saylor 40000
1000 2 7 1
1050 300 Rex
Mason 60000
1000 2 8 1
1060 400 Kit
Wagner 50000
Row Offset Array (ROA)
Stover 60000
Jones
60000
2 Bytes for each ROA slot
1 The first search option, which is the slowest is a sequential search.
Each row will be read starting from the first row to the last. This is done when a query does not use an index. All Full Table Scans are sequential searches.
798 698 598 498 398 298 198 98
When a page of data is moved from disk into memory it is ready to be searched to produce an answer set. Every read of every row will first go through the Row Offset Array. The slowest search happens when there is no index being used. This is often called a Full Table Scan. There are eight rows in the above example. The Row Offset Array will be used with each read. The Azure SQL Data Warehouse will read the last offset first (yellow color) and then read the first row in the page starting at byte 98. The Azure SQL Data Warehouse will then read the second offset (pink color) and then read the second row in offset 198, and so on. Stay with me because there are two more reasons this design is always used. Page 552
Chapter 17
Table Create and Data Types
The Row Offset Array Provides Two Search Options (2 of 2) Previous Page# - 1
PAGE 2
Next Page# - 3
1000 2 1 1
1001 100 Rafael Minal 90000
1000 2 2 1
1004 400 Kyle
1000 2 3 1
1007 200 Sushma Davis 50000
1000 2 4 1
1020 200 May
1000 2 5 1
1030 500 Dawn Wilson 50000
1000 2 6 1
1040 300 Red
Saylor 40000
1000 2 7 1
1050 300 Rex
Mason 60000
1000 2 8 1
1060 400 Kit
Wagner 50000
Row Offset Array (ROA)
The second search option, which is the fastest is a Binary search. 2
Stover 60000 Jones
60000
2 Bytes for each ROA slot
798 698 598 498 398 298 198 98
This search uses an index and it is like using a phone book.
The first row read will be in the middle of the page. The system will then know whether to move up or down because the rows are sorted.
When a page of data is moved from disk into memory it is ready to be searched to produce an answer set. Every read of every row will first go through the Row Offset Array. When the data on the page is sorted using a clustered index, a binary search is fast. The Azure SQL Data Warehouse reads the Row Offset Array to find the row in the middle. It can then move up or down depending on if it is too high or too low. It always cuts the remaining search in half. Imagine if we the query was searching for Employee 1050. The Row Offset Array would first go to the middle are read the row for employee 1020 (red arrow). It would then realize it was too low. It would then read the Row Offset Array to move to employee 1040. It would still be too low and then use the Row Offset Array to continue cutting the remaining rows in half, and then choose the row for Employee 1050. Found it! A binary search can be used on queries that take advantage of an index on a sorted page.
Page 553
Chapter 17
Table Create and Data Types
The Row Offset Array Helps With Inserts Previous Page# - 1
PAGE 2
Next Page# - 3
1000 2 1 1 1001 100 Rafael Minal 90000 1000 2 2 1 1004 400 Kyle
Stover 60000
1000 2 3 1 1007 200 Sushma Davis 50000 1000 2 4 1 1020 200 May
Jones
60000
1000 2 5 1 1030 500 Dawn Wilson 50000
1000 2 6 1 1040 300 Red
Saylor 40000
1000 2 7 1 1050 300 Rex
Mason 60000
1000 2 8 1 1060 400 Kit
Wagner 50000
1000 2 9 1 1002 100 Bill Row Offset Array (ROA)
Mason 75000
2 Bytes for each ROA slot
798 698 598 498 398 298 198
898
The new row just inserted logically sorts as row 2, but the Azure SQL Data Warehouse places it at the end of the page, but logically places it second in the Row Offset Array.
The row for Employee_No 1002 has just been inserted.
98
When a page is sorted using a clustered index the rows are sorted physically and logically. Let me explain. The Row Offset Array is always in perfect descending order. Above, you can see that each row is sorted physically by Employee_No. In a perfect world, the Row Offset Array logically lists the rows on the page in perfect descending order and the rows are physically in perfect order within the page. However, when SQL is used for an insert statement, it will often write the row physically as the last row on the page (for speed), but it will still list the row in the Row Offset Array in perfect logical order. Notice, we added a new row (in black) at the end of the page. Since this table is sorted with a clustered index on Employee_No you should notice that the row in black has an Employee_No of 1002. It should physically be the second row on the page. It is the second row according the Row Offset Array. Any sequential search will read the black row second. Page 554
Chapter 17
Table Create and Data Types
What is a Uniquefier? PAGE 2
Previous Page# - 1
Next Page# - 3
1000 2 1 1
1001 100 Rafael Minal 90000
1000 2 2 1
1004 400 Kyle
1000 2 3 1
1007 200 Sushma Davis 50000
1000 2 4 1
1020 200 May
1000 2 5 1
1030 500 Dawn Wilson 50000
1000 2 6 1
1040 300 Red
Saylor 40000
1050 300 Rex
Mason 60000
1060 400 Kit
Wagner 50000
1060 300 Will
Day
1000 2 7 1 1000 2 8 1
1000 2 9 2 The Uniquefier Identifies duplicate values in a clustered index
1060 1060
Stover 60000
Jones
60000
75000
2 Bytes for each ROA slot
898 798 698 598 498 398 298 198 98
When a page is sorted using a clustered index the rows are sorted physically and logically. Since, the Azure SQL Data Warehouse does not allow for Unique Clustered Indexes, a Uniquefier is added to the Row ID. Above, we have two individuals who have an Employee_No of 1060. The first employee gets the Uniquefier of 1 and the second of 2.
Page 555
Chapter 17
Table Create and Data Types
Adding an Index CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) ;
1
CREATE UNIQUE CLUSTERED INDEX Idx1 ON Emp_Intl (Employee_No);
2
CREATE INDEX Idx2 ON Emp_Intl (Dept_No);
Above, we have created a table called Emp_Intl. Each row in the table will contain a RowID. The RowID is 8bytes in size and contain FileID, PageID, SlotNo. A table can only have one clustered index. A clustered index sorts the rows of the table by the clustered key column value. In this example, the rows will be sorted in ascending order by Employee_NoA table can have numerous NON-CLUSTERED INDEXES. Think of them as pointers to data. They are implemented as B-TREES. More about B_TREES on the next slide.
Page 556
Chapter 17
Table Create and Data Types
When Do I Create a Non Clustered Index? 1. Utilize columns that only contain a large number of distinct values. This might be a combination of last name and first name or a social security number. If there are many duplicates then SQL Server will perform a sequential scan instead. 2. When queries do not return large result sets. This goes back to having many distinct values. 3. Utilize on columns that are frequently involved in the WHERE clause that utilize equality searches. 4. Utilize these not on OLTP applications, but when you have large Decision-support-system (DSS) applications. DSS systems for when joins and grouping are frequently required. A best practice is to create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on the foreign keys. 5. In Cover query situations. A Cover query uses only the non clustered index to retrieve the data to satisfy the query instead of utilizing the table for a query. The answer set is said to be covered by the index.
Following the do's and don'ts on this page can enhance performance and prevent difficulties.
Page 557
Chapter 17
Table Create and Data Types
B-Tree for Non Clustered Index on a Clustered Table (1 of 2) Previous Page# - null
Root
Next Page# - 2,3
Dept_No Employee_No
Non Clustered Index
100 200 300 400 500
1001 1007 1020 1040 1050 1004 1060 1030
Clustered Index Values
Leaf Page Previous Page# - 1
Leaf Page PAGE 2
Next Page# - 3
Previous Page# - 2
PAGE 3
Next Page# - Null
1000 2 1 1001 100 Rafael Minal 90000
1000 3 1 1030 500 Dawn Wilson 50000
1000 2 2 1004 400 Kyle
Stover 60000
1000 3 2 1040 300 Red
Saylor 40000
1000 2 3 1007 200 Sushma Davis 50000
1000 3 3 1050 300 Rex
Mason 60000
1000 2 4 1020 200 May
1000 3 4 1060 400 Kit
Wagner 50000
Jones
60000
398 298 198 98
398 298 198 98
A non clustered index will utilize a B-Tree node and it will always have a root node. A non clustered index will store the index value in order within the index node. When a non clustered index is created on a table that has a clustered index, then the index node will contain two values: index value and clustered index value(s). Above, we created a non clustered index on the column Dept_No. Since the base table also had a clustered index on Employee_No, then the Employee_No, values are also included. So, if your query and wanted to retrieve all rows WHERE the Dept_No was equal to 400, then the Azure SQL Data Warehouse would look in the non clustered index and see that there were two rows, the Employee_No 1004 and 1060. Then, the system would use the clustered index to find them. Page 558
Chapter 17
Table Create and Data Types
B-Tree for Non Clustered Index on a Clustered Table (2 of 2) 100
Intermediate Node 100
Header
Header
500
Intermediate Node 300
200
Header
300
Header
Root Node
Intermediate Node
400
Header
500
Header
Header
Header
Header
Leaf Pages containing the actual data rows
We created a non clustered index on the column Dept_No on a table with a clustered index on Employee_No. A non clustered index in this example will sort by Dept_No and point to the row(s) Employee_No. A page always allocates 8 K for both disk and memory use. If a leaf, intermediate or even root node reaches the 8 K limit, it will split into two nodes.
Page 559
Chapter 17
Table Create and Data Types
Adding a Non Clustered Index To A Heap CREATE TABLE Emp_Intl ( Employee_No INTEGER ,Dept_No SMALLINT ,First_Name VARCHAR(12) ,Last_Name CHAR(20) ,Salary DECIMAL(8,2) ) ;
1
CREATE INDEX Idxlast ON Emp_Intl (Last_Name);
Above, we have created a table called Emp_Intl and it does not have a clustered index. This means that the rows are unordered and stored in a heap. Each row in the table will contain a RowID. The RowID is 8-bytes in size and contain FileID, PageID, SlotNo. This table could have many non clustered indexes, but we only created one on Last_Name. The next pages will show the B-Tree for the newly created non clustered index.
Page 560
Chapter 17
Table Create and Data Types
B-Tree for Non Clustered Index on a Heap Table (1 of 2) Previous Page# - null
Next Page# - 2,3
Root
Last_Name RowID Davis Jones Minal Stover
Non Clustered Index
1000:1:2 1000:1:3 1000:1:1 1000:1:4
Row ID
Leaf Page FileID PageNum SlotNum
Physical Rows
1000
1
1
1001 100 Rafael Minal 90000
1000
1
2
1007 200 Sushma Davis 50000
1000
1
3
1020 200 May
Jones
1000
1
4
1004 400 Kyle
Stover 60000
60000
In a heap rows are not sorted
Row Identifier FREE SPACE
Row ID
A non clustered index will utilize a B-Tree node and it will always have a root node. A non clustered index will store the index value in order within the index node. When a non clustered index is created on a table that is a heap, then the index node will contain two values: index value and RowID(s). Above, we created a non clustered index on Last_Name. The index will contain every Last_Name sorted and the RowID. Page 561
Chapter 17
Table Create and Data Types
B-Tree for a Non Clustered Index on a Heap Table (2 of 2) Adams
Intermediate Node Adams
Header
Indy
Header
Header
Jones
Sims
Intermediate Node Jones
Header
Header
Root Node
Intermediate Node
Tan
Sims
Header
Header
Zin
Header
Header
Leaf Pages containing the actual data rows We created a non clustered index on the column Last_Name on a table that had no clustered index, which is considered an unordered heap of rows. A non clustered index in this example will sort by Last_Name and point to the RowID(s) on the leaf page. A page always allocates 8 K for both disk and memory use. If a leaf page, intermediate node or even root node reaches the 8 K limit, it will split into two leafs or nodes. Page 562
Chapter 17
Table Create and Data Types
Default Values CREATE TABLE Emp_Intl (Employee_No INTEGER NOT NULL DEFAULT ('') ,Dept_No SMALLINT ,First_Name VARCHAR(12) Above, we have directed the Azure SQL ,Last_Name CHAR(20) Data Warehouse to put in an empty ,Salary DECIMAL(8,2) string (two single quotation marks with ); no space between them) as the default value.
If you don’t desire a NULL in a particular column, you can alternatively utilize a default value to indicate that the column has not yet been populated. All you need to do is specify a DEFAULT constraint by adding the DEFAULT clause right after saying NOT NULL.
Page 563
Chapter 18
Page 564
View Functions
Chapter 18
View Functions
Chapter 18 – View Functions
"Be the change that you want to see in the world." - Mahatma Gandhi
Page 565
Chapter 18
View Functions
The Fundamentals of Views View Fundamentals A view is a virtual table. A view may define a subset of columns A view can even define a subset of rows if it has a WHERE clause A view never duplicates data or stores the data separately Views provide security
View Advantages
An additional level of security is provided. Helps the business user not miss join conditions. Help control read and update privileges. Unaffected when new columns are added to a table. Unaffected when a column is dropped unless its referenced in the view. View Recommendations
The above is designed to introduce View fundamentals and View advantages. Page 566
Chapter 18
View Functions
Creating a Simple View to Restrict Sensitive Columns Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert
Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
CREATE View Employee_V AS SELECT Employee_No ,First_Name ,Last_Name ,Dept_No FROM Employee_Table ; The purposes of views is to restrict access to certain columns, derive columns or Join Tables, and to restrict access to certain rows (if a WHERE clause is used). This view does not allow the user to see the column salary.
Page 567
Chapter 18
View Functions
Creating a Simple View to Restrict Rows Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name __________ Jones Squiggy Smythe Richard Chambers Mandee Coffing Billy Smith John Larkins Loraine Strickling Cletus Reilly William Harrison Herbert
Salary _______ 32800.50 64300.00 48850.00 41888.88 48000.00 40200.00 54500.00 36000.00 54500.00
CREATE VIEW Employee_View AS SELECT First_Name ,Last_Name ,Dept_No ,Salary FROM Employee_Table WHERE Dept_No IN (300, 400) ;
The purposes of views is to restrict access to certain columns, derive columns or Join Tables, and to restrict access to certain rows (if a WHERE clause is used). This view does not allow the user to see information about rows unless the rows have a Dept_No of either 300 or 400. Page 568
Chapter 18
View Functions
Basic Rules for Views No ORDER BY inside the View CREATE (exceptions exist) All Aggregation needs to have an ALIAS Any Derived columns (such as Math) needs an ALIAS
Why do these two columns need aliases?
CREATE View DeptSal_V AS SELECT Dept_No ,SUM(Salary) as SumSal , SUM(Salary) / 12 as MonthSal FROM Employee_Table You don't put an GROUP BY Dept_No; Order By in the view creation.
So we can bring them back in the SELECT query
SELECT Dept_No ,SumSal FROM DeptSal_V Order By 1 ;
Above, are the basic rules of Views with excellent examples.
Page 569
Users put the Order By when selecting from the view
Chapter 18
View Functions
Two Exceptions to the ORDER BY Rule inside a View CREATE VIEW Top_Sal_V AS SELECT TOP 3 * The TOP command FROM Employee_Table goes with Order By ORDER BY Salary DESC; like bread goes with butter.
Create view Sales_Olap_V AS SELECT Product_ID, Sale_Date, Daily_Sales ,Sum(Daily_Sales) OVER (ORDER BY Daily_Sales) as "CSUM" FROM Sales_Table ; Every ANSI Ordered Analytic has an Order By in it naturally There are EXCEPTIONS to the ORDER BY rule. The TOP command allows a view to work with an ORDER BY inside. ANSI OLAP statements also work inside a View.
Page 570
Chapter 18
View Functions
Views sometimes CREATED for Row Security CREATE VIEW empl_200_v AS SELECT Employee_No AS Emp_No ,Last_Name AS Last ,Salary/12 AS Mnth_Sal FROM Employee_Table WHERE Dept_No = 200 ; Only Dept_No 200 employees return
SELECT * FROM Empl_200_v ORDER BY Mnth_Sal ;
Emp_No _______
Last_Name _________
Mnth_Sal ___________
1324657
Coffing
3,490.740000
1333454
Smith
4,000.000000
Views are designed to do many things. In the example above, this derives data, limits columns, and also limits the rows coming back with a WHERE. Page 571
Chapter 18
View Functions
Creating a View to Join Tables Together
This view is designed to join two tables together. By creating a view, we have now made it easier for the user community to join these tables by merely selecting the columns you want from the view. The view exists now in the database sql_views and accesses the tables in sql_class.
Page 572
Chapter 18
View Functions
You Select From a View
Once the view is created, then users can query them with a SELECT statement. Above, we have queried the view we created to join the employee_table to the department_table (created on previous page). Users can select all columns with an asterisk, or they can choose individual columns (separated by a comma). Above, we selected all columns from the view.
Page 573
Chapter 18
View Functions
Another Way to Alias Columns in a View CREATE CREATE VIEW E_View (Emp_Nbr, Last, Mnth_Sal) AS SELECT Employee_No ,Last_Name Option 1: Aliases ,Salary/12 FROM Employee_Table can be here WHERE Dept_No = 200 ;
SELECT * FROM E_View ORDER BY Mnth_Sal ;
Emp_No _______
Last_Name _________
Mnth_Sal _________
1324657
Coffing
3490.74
1333454
Smith
4000.00
Will this View CREATE work or will it error? It works fine because it’s aliased above!
Page 574
Chapter 18
View Functions
The Standard Way Most Aliasing is done CREATE VIEW emp_v2 AS SELECT Employee_No ,Last_Name ,Salary/12 as Sal_Monthly FROM Employee_Table Option 2: WHERE Dept_No = 200 ; The most popular form of aliasing
SELECT * FROM Emp_v2 ORDER BY Sal_Monthly ;
Emp_No _______ 1324657 1333454
Last_Name _________ Coffing Smith
Sal_Monthly ___________ 3490.74 4000.00
The ALIAS for Salary / 12 that’ll be used in this example is Sal_Monthly and this form of aliasing is most often used.
Page 575
Chapter 18
View Functions
What Happens When Both Aliasing Options Are Present CREATE VIEW emp_v3 (Emp_Nbr, Last, Mnth_Sal) AS Once you alias here SELECT Employee_No that is the alias ,Last_Name ,Salary/12 as Sal_Mnth FROM Employee_Table This alias is not recognized WHERE Dept_No = 200 ; SELECT * FROM Emp_v3 ORDER BY 3 ;
Emp_No _______ 1324657 1333454
Last_Name _________ Coffing Smith
Mnth_Sal _________ $3,490.74 $4,000.00
The ALIAS for Salary / 12 that’ll be used in this example is Mnth_Sal. It came first at the top, even though it is aliased in the SELECT list also.
Page 576
Chapter 18
View Functions
Resolving Aliasing Problems in a View CREATE CREATE VIEW emp_v3 (Emp_Nbr, Last, Mnth_Sal) AS Once you alias here SELECT Employee_No that is the alias ,Last_Name ,Salary/12 as Sal_Mnth FROM Employee_Table This alias is not recognized WHERE Dept_No = 200 ;
SELECT * FROM Emp_v3 ORDER BY Sal_Mnth ; What happens when this query runs?
What will happen in the above query?
Page 577
Chapter 18
View Functions
Answer to Resolving Aliasing Problems in a View CREATE CREATE VIEW emp_v3 (Emp_Nbr, Last, Mnth_Sal) AS Once you alias here SELECT Employee_No that is the alias ,Last_Name ,Salary/12 as Sal_Mnth FROM Employee_Table This alias is not recognized WHERE Dept_No = 200 ; SELECT * FROM Emp_v3 ORDER BY Sal_Mnth ; What happens when this query runs?
Error – Sal_Mnth is unrecognized The query above errors because Sal_Mnth is an unrecognized alias. That is because we did our aliasing at the top, so this makes the alias right after Salary/12 non-valid for use when querying the view.
Page 578
Chapter 18
View Functions
Aggregates on View Aggregates CREATE VIEW Aggreg_Order_v AS SELECT Customer_Number ,COUNT(Order_Total) AS Order_Cnt ,SUM(Order_Total) AS Order_Sum ,AVG(Order_Total) AS Order_Avg FROM Order_Table GROUP BY Customer_Number ;
SELECT Customer_Number ,Order_Sum FROM Aggreg_Order_v ; Customer_Number Order_Sum _______________ __________ 31323134 5111.47 87323456 15231.62 11111111 8005.91 11111111 12347.53 57896883 23454.84
SELECT SUM (Order_Sum) FROM Aggreg_Order_v ;
SUM(Order_Sum) _______________ 64151.37
The examples above show how we put a SUM on the aggregate Order_Sum .
Page 579
Chapter 18
View Functions
Altering a Table CREATE TABLE Employee_Table2 WITH (Distribution = Replicate) AS SELECT * from employee_table;
CREATE VIEW Emp_HR_v AS SELECT Employee_No ,Dept_No ,Last_Name ,First_Name FROM Employee_Table2 ;
Altering the actual Table
ALTER TABLE Employee_Table2 ADD Mgr_No INTEGER ; Will the View STILL run?
SELECT * FROM Emp_HR_v;
YES!
This view will run after the table has added an additional column! Page 580
Chapter 18
View Functions
Altering a Table after a View has been Created CREATE TABLE Employee_Table3 WITH (Distribution = Replicate) AS SELECT * from employee_table;
CREATE VIEW Emp_HR_v3 AS SELECT * FROM Employee_Table3 ;
Altering the actual Table
ALTER TABLE Employee_Table3 ADD Mgr_No INTEGER ; Will the View STILL run?
SELECT * FROM Emp_HR_v3;
YES!
Only columns present when the view was created will be visible.
This view runs after the table has added an additional column, but it won’t include Mgr_No in the view results even though there is a SELECT * in the view. The View includes only the columns present when the view was CREATED.
Page 581
Chapter 18
View Functions
A View that Errors After an ALTER CREATE TABLE Employee_Table5 WITH (Distribution = Replicate) AS SELECT * from employee_table;
CREATE VIEW Emp_HR_v5 AS SELECT Employee_No ,Dept_No ,Last_Name ,First_Name FROM Employee_Table5 ;
Altering the actual Table
ALTER TABLE Employee_Table5 DROP COLUMN Dept_No; Will the View STILL run?
SELECT * FROM Emp_HR_v5;
ERROR
This view will NOT run after the table has dropped a column referenced in the view.
Page 582
Chapter 18
View Functions
Troubleshooting a View CREATE VIEW Emp_HR_v6 AS SELECT * FROM Employee_Table6 ; Altering the actual Table
ALTER TABLE Employee_Table6 DROP COLUMN Dept_No ; Will the View STILL run?
SELECT * FROM Emp_HR_v6;
Error This view will NOT run after the table has dropped a column referenced in the view even though the View was CREATED with a SELECT *. At View CREATE Time, the columns present were the only ones the view considered responsible for, and Dept_No was one of those columns. Once Dept_No was dropped, the view no longer works. Page 583
Chapter 18
View Functions
Loading Data through a View
New row Inserted
You can actually utilize a view to load data.
Page 584
Chapter 19
Page 585
Data Manipulation Language (DML)
Chapter 19
Data Manipulation Language (DML)
Chapter 19 – Data Manipulation Language (DML)
“I tried to draw people more realistically, but the figure I neglected to update was myself.” - Joe Sacco
Page 586
Chapter 19
Data Manipulation Language (DML)
INSERT Syntax # 1 The following syntax of the INSERT does not use the column names as part of the command. Therefore, it requires that the VALUES portion of the INSERT match each and every column in the table with a data value or a NULL.
INSERT [ INTO ] VALUES ( [ ..., ] ;
The INSERT statement is used to put new row(s) into a table. A status is the only returned value from the database; no rows are returned to the user. This INSERT syntax requires either a data value or a NULL for all the columns in a table. When executed this code places a single new row into a table.
Page 587
Chapter 19
Data Manipulation Language (DML)
INSERT Example with Syntax 1 INSERT INTO Employee_Table VALUES ( 20, 5, 'Jones', NULL , 45000) ;
20
5
Jones NULL 45000
The Employee_Table was created with these columns in this order: Employee_No ,Dept_No, Last_Name, First_Name, Salary
After the execution of the above INSERT, there is a new row with the integer value of 1 going into Column1, the integer value of 5 going into Column2, the character value of Jones going into Column3, a NULL value going into Column4 , and an integer value of 15 going into Column5. The NULL expressed in the VALUES list is the literal representation for no data.
Page 588
Chapter 19
Data Manipulation Language (DML)
INSERT Syntax #2 The syntax of the second type of INSERT follows:
INSERT [ INTO ] ( [..., ] VALUES ( [..., ] ;
This is another form of the INSERT statement that can be used when some of the data is not available. It allows for the missing values (NULL) to be eliminated from the list in the VALUES clause. It is also the best format when the data is arranged in a different sequence than the CREATE TABLE. Page 589
Chapter 19
Data Manipulation Language (DML)
INSERT Example with Syntax 2 INSERT INTO Employee_Table8 (Employee_No ,Dept_No, First_Name, Last_Name) VALUES( 24, 5,'Joe', 'Smoe') ;
24
5
Smoe
Joe
NULL
Notice that only four columns were inserted and that there are five columns in the row. The system filled the empty columns with Null.
SELECT * FROM Employee_Table8 WHERE Employee_No = 24 Employee_No Dept_No Last_Name First_Name Salary 24
5
Smoe
Joe
?
The above statement incorporates both of the reasons to use this syntax. First, notice that the column names ,Last_Name and First_Name, have been switched, to match the data values. Also, notice that Salary does not appear in the column list, therefore, it is assumed to be NULL. Page 590
Chapter 19
Data Manipulation Language (DML)
INSERT/SELECT Command The syntax of the INSERT / SELECT is:
INSERT [ INTO ] SELECT [..., ] FROM ;
Although the INSERT is great for adding a single row not currently present in the system, an INSERT/SELECT is even better when the data already exists within the Azure SQL Data Warehouse. In this case, the INSERT is combined with a SELECT. However, no rows are returned to the user. Instead, they go into the table as new rows. The SELECT reads the data values from the one or more columns, in one or more tables and uses them as the values to INSERT into another table. Simply put, the SELECT takes the place of the VALUES portion of the INSERT. This is a common technique for building data marts, interim tables and temporary tables. It is normally a better and much faster alternative than extracting the rows to a data file, then reading the data file and inserting the rows using a utility. Page 591
Chapter 19
Data Manipulation Language (DML)
INSERT/SELECT Example using All Columns (*) CREATE TABLE Employee_table9 (Employee_No integer, Dept_No smallint, Last_name char(20), First_name varchar(12), Salary decimal(8,2));
INSERT INTO Employee_Table9 SELECT * FROM Employee_Table;
This is a classic example of an INSERT SELECT statement. Because both tables have the exact same columns in the exact same order, the SELECT * works just fine. Page 592
Chapter 19
Data Manipulation Language (DML)
INSERT/SELECT Example with Less Columns When fewer than all the columns are desired or you want to change certain values, either of the following INSERT / SELECT statements will do the job:
INSERT INTO Order_Table4 (Order_Number, Customer_Number, Order_Date, Order_Total) SELECT Order_Number, Customer_Number, '2015-06-30', Order_Total FROM Order_Table ; Literal value
INSERT INTO Order_Table5 (Order_Number, Customer_Number, Order_Date, Order_Total) SELECT Order_Number, Customer_Number, GetDate(), Order_Total FROM Order_Table ; System value for current date
INSERT INTO Order_Table6 (Order_Number, Customer_Number) SELECT Order_Number, Customer_Number FROM Order_Table ;
Order_Date and Order_Total columns have NULL values
In both of the above examples, only the specified columns are populated in the Order_Table4 and Order_Table5 examples. In the first INSERT, the data is a literal date. The second INSERT uses the GETDATE() function. Both are acceptable, depending on what is needed. Working with the same concept of a normal INSERT, when using the column names, the only data values needed are for these columns and they must be in the same sequence as the column list, not the CREATE TABLE. Therefore, as in the final example, omitted data values or column names become a NULL data value. Page 593
Chapter 19
Data Manipulation Language (DML)
The UPDATE Command Basic Syntax UPDATE [ FROM [AS ] ] SET = { | } [..., = | ] [ WHERE ] [ AND … ] [ OR … ] ;
The UPDATE statement is used to modify data values in one or more columns of one or more existing rows. A status is the only returned value from the database; no rows are returned to the user. When business requirements call for a change to be made in the existing data, then the UPDATE is the SQL statement to use. In order for the UPDATE to work, it must know a few things about the data row(s) involved. Like all SQL, it must know which table to use for making the change, which column or columns to change and the change to make within the data.
Page 594
Chapter 19
Data Manipulation Language (DML)
Two UPDATE Examples UPDATE Order_Table6 SET Order_Date = GetDate() ,Order_Total = 10500.25 WHERE Order_Number = 123456;
UPDATE Order_Table6 SET Order_Date = '2016/06/30' ,Order_Total = 14500.23 WHERE Order_Number = 123512 AND Customer_Number = 11111111;
The first UPDATE command modifies all rows for Order_Number 123456. It changes the values in two columns with new data values provided after the equal sign (=). The next UPDATE uses the same table as the above statement. The UPDATE determines which row(s) to modify with compound conditions written in the WHERE clause based on values stored in other columns. Page 595
Chapter 19
Data Manipulation Language (DML)
Subquery UPDATE Command Syntax UPDATE [ FROM [AS ] ] SET = { | } [..., = | ] WHERE [..., ] IN ( SELECT [..., ] FROM [ WHERE … ] ) ;
Sometimes it is necessary to update rows in a table when they match rows in another table. To accomplish this, the tables must have one or more columns in the same domain. The matching process then involves either a subquery or join processing. Page 596
Chapter 19
Data Manipulation Language (DML)
Example of Subquery UPDATE Command Order_Table6 can be changed based on Order_Table. The following UPDATE uses a subquery operation to accomplish the operation:
UPDATE Order_Table6 SET Order_Date = GetDate() WHERE Order_Number IN (SELECT Order_Number FROM Order_Table WHERE Order_Total > 10000) ;
Sometimes it is necessary to update rows in a table when they match rows in another table. To accomplish this, the tables must have one or more columns in the same domain. The matching process then involves either a subquery or join processing.
Page 597
Chapter 19
Data Manipulation Language (DML)
Join UPDATE Command Syntax UPDATE SET = { | } [..., = | ] [ FROM [ AS ] ] WHERE [.] = [.] [ AND ] [ OR ] ;
When adding an alias to the UPDATE, the alias becomes the table name and MUST be used in the WHERE clause when qualifying columns.
Page 598
Chapter 19
Data Manipulation Language (DML)
Example of an UPDATE Join Command Order_Table6 can be changed based on the original Order_Table. The following UPDATE uses a join operation to accomplish the operation:
UPDATE Order_Table6 SET Order_Total = 11000 FROM Order_Table AS Orig WHERE Order_Table6.Customer_Number =Orig.Customer_Number AND Order_Table6.Customer_Number = 11111111 ;
Sometimes it is necessary to update rows in a table when they match rows in another table. To accomplish this, the tables must have one or more columns in the same domain. The matching process then involves either a subquery or join processing. Above, we join two tables together and we have an additional AND clause.
Page 599
Chapter 19
Data Manipulation Language (DML)
The DELETE Command Basic Syntax DELETE [ FROM ] [ AS ] [ WHERE condition ] ;
The DELETE statement has one function and that is to remove rows from a table. A status is the only returned value from the database; no rows are returned to the user. One of the fastest things that SQL Server does is to remove ALL rows from a table. Be Very CAREFUL with DELETE. It can come back to bite you if you’re not careful. Page 600
Chapter 19
Data Manipulation Language (DML)
Two DELETE Examples to DELETE ALL Rows in a Table DELETE FROM Order_Table5 ;
DELETE Order_Table5 ;
Select * FROM Order_Table5;
Order_Number Customer_Number Order_Date Order_Total _____________ ________________ ___________ ___________
Both examples will delete all the rows in the table. Since the FROM is optional, the second example still removes all rows from a table and executes exactly the same as the above statement. Page 601
Chapter 19
Data Manipulation Language (DML)
To DELETE or to TRUNCATE TRUNCATE TABLE Order_Table6 ; The table being truncated can be distributed or replicated, permanent or temporary or it can be a rowstore or columnstore. Both TRUNCATE and DELETE (without a WHERE clause) will remove all rows from the table. TRUNCATE is faster than DELETE because it does no logging of individual rows. Truncating will instantly remove all pages in the table. SQL statements, Batch statements and stored procedures can be used to TRUNCATE. If a TRUNCATE TABLE is cancelled or there is a system failure before completion, the table performs a rollback and all rows are still present as before. When truncating a table, the Azure SQL Data Warehouse keeps and updates all statistics. Indexes on the table are updated as well. After TRUNCATE TABLE completes, all statistics are updated by using the row count of (1000), which is the default.
While TRUNCATE TABLE is running, an Exclusive lock is placed on the table so all other operations on the table will not be allowed.
TRUNCATING uses fewer system resources, does not require row logging and is faster. So TRUNCATE if you can. Page 602
Chapter 19
Data Manipulation Language (DML)
A DELETE Example Deleting only Some of the Rows
DELETE FROM Employee_Table1 WHERE Employee_No = 1121334 ;
The DELETE example above only removes the row that contained Employee_No of 1121334 and leaves all other rows in the table. Page 603
Chapter 19
Data Manipulation Language (DML)
Subquery and Join DELETE Command Syntax The subquery syntax for the DELETE statement follows:
DELETE WHERE [..., ] IN ( SELECT [..., ] FROM [ AS ] [ WHERE condition … ] ) ;
The join syntax for the DELETE statement follows:
DELETE [ FROM [ AS ] ] WHERE .=. [ AND ] [ OR ] ;
You may be asked to delete rows in one table based on data from a different table. Sometimes it is desirable to delete rows from one table based on their existence in or by matching a value stored in another table. To access these rows from another table for comparison, a subquery or a join operation can be used. Page 604
Chapter 19
Data Manipulation Language (DML)
Example of Subquery DELETE Command The DELETE below is to remove rows from Order_Table6 for all Orders with an Order_Total > 13000 in the Order_Table. This DELETE uses a subquery operation to accomplish the DELETE.
DELETE FROM Order_Table6 WHERE Customer_Number IN ( SELECT Customer_Number FROM Order_Table WHERE Order_Total > 13000 ) ;
The above uses a Subquery and the DELETE command.
Page 605
Chapter 19
Data Manipulation Language (DML)
MERGE INTO Want to synchronize data in two different tables?
MERGE INTO can be used to get your tables in synch.
MERGE INTO involves performing an UPDATE, INSERT, or DELETE on a target table based on data in a source table. The target and source tables are joined on common column(s) or key(s). A target table is modified to reflect the data in a source table.
MERGE merges a source row set into a target table based on whether there is a MATCH or whether there is NOT a MATCH. If there is a MATCH and data changed, an UPDATE can be made. Yet, if there is a NOT a MATCH, an INSERT or DELETE can be made.
Page 606
Chapter 19
Data Manipulation Language (DML)
MERGE INTO Want to synchronize data in two different tables? MERGE INTO can be used to get your tables in synch.
SELECT Employee_No, Dept_No, Last_Name, First_Name, Salary FROM Employee_Table_Original;
SELECT Employee_No, Dept_No, Last_Name, First_Name, Salary FROM Employee_Table_New;
We are going to update Employee_Table_Original based on data in Employee_Table_New using a MERGE INTO.
Page 607
Chapter 20
Page 608
Set Operators Functions
Chapter 20
Set Operators Functions
Chapter 20 – Set Operators Functions
"The man who doesn't read good books has no advantage over the man who can't read them." - Mark Twain
Page 609
Chapter 20
Set Operators Functions
Rules of Set Operators 1.
Each query will have at least two SELECT Statements separated by a SET Operator
2.
SET Operators are UNION, UNION ALL, INTERSECT and EXCEPT
3.
Must specify the same number of columns from the same domain (data type/range)
4.
If using Aggregates, both SELECTs much have their own GROUP BY
5.
Both SELECTS must have a FROM Clause
6.
The First SELECT is used for all ALIAS and FORMAT Statements
7.
The Second SELECT will have the ORDER BY statement which must be a number
8.
When multiple operators the order of precedence is INTERSECT, UNION, and EXCEPT
9.
Parentheses can change the order of Precedence
10. Duplicate rows are eliminated in the spool unless the ALLkeyword is used 11. ◦Set operators consider two NULLs as equal for the purpose of comparison.
Page 610
Chapter 20
Set Operators Functions
INTERSECT Explained Logically
Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red INTERSECT SELECT * FROM Table_Blue ; In this example, what numbers in the answer set would come from the query above? Page 611
Chapter 20
Set Operators Functions
INTERSECT Explained Logically Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red INTERSECT SELECT * FROM Table_Blue ;
3 In this example, only the number 3 was in both tables so they INTERSECT. Page 612
Chapter 20
Set Operators Functions
UNION Explained Logically
Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red UNION SELECT * FROM Table_Blue ;
In this example, what numbers in the answer set would come from the query above? Page 613
Chapter 20
Set Operators Functions
UNION Explained Logically Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red UNION SELECT * FROM Table_Blue ;
1 2 3 4 5 Both top and bottom queries run simultaneously, then the two different spools files are merged to eliminate duplicates and place the remaining numbers in the answer set.
Page 614
Chapter 20
Set Operators Functions
UNION ALL Explained Logically
Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red UNION ALL SELECT * FROM Table_Blue ;
In this example, what numbers in the answer set would come from the query above? Page 615
Chapter 20
Set Operators Functions
UNION ALL Explained Logically Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red UNION ALL SELECT * FROM Table_Blue ;
1 2 3 3 4 5 Both top and bottom queries run simultaneously, then the two different spools files are merged together to build the answer set. The ALL prevents eliminating Duplicates.
Page 616
Chapter 20
Set Operators Functions
EXCEPT Explained Logically
Table_Red
Table_Blue
1 2 3
3 4 5
EXCEPT never adds additional rows, but only takes rows away!
SELECT * FROM Table_Red EXCEPT SELECT * FROM Table_Blue ;
In this example, what numbers in the answer set would come from the query above? Page 617
Chapter 20
Set Operators Functions
EXCEPT Explained Logically Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Red EXCEPT SELECT * FROM Table_Blue ;
1 2 The Top query SELECTED 1, 2, 3 from Table_Red. From that point on, only 1, 2, 3 at most could come back. The bottom query is run on Table_Blue, and if there are any matches, they are not ADDED to the 1, 2, 3 but instead take away either the 1, 2, or 3.
Page 618
Chapter 20
Set Operators Functions
Another EXCEPT Example
Table_Red
Table_Blue
1 2 3
3 4 5
EXCEPT never adds additional rows, but only takes rows away!
SELECT * FROM Table_Blue EXCEPT SELECT * FROM Table_Red ;
In this example, what numbers in the answer set would come from the query above?
Page 619
Chapter 20
Set Operators Functions
EXCEPT Explained Logically in Reverse Order Table_Red
Table_Blue
1 2 3
3 4 5
SELECT * FROM Table_Blue EXCEPT SELECT * FROM Table_Red ;
4 5 The Top query SELECTED 3, 4, 5 from Table_Blue. From that point on, only 3, 4, 5 at most could come back. The bottom query is run on Table_Red, and if there are any matches, they are not ADDED to the 3, 4, 5, but instead, take away either the 3, 4, or 5. Page 620
Chapter 20
Set Operators Functions
An Equal Amount of Columns in both SELECT List SELECT Dept_No ,Employee_No FROM Employee_Table INTERSECT SELECT Dept_No ,Mgr_No FROM Department_Table;
Both queries have the same number of columns in the SELECT list.
Dept_No _______
Employee_No ____________
400
1256349
Rule 1
You must have an equal amount of columns in both SELECT lists. This is because data is compared from the two spool files, and duplicates are eliminated. So, for comparison purposes, there must be an equal amount of columns in both queries. Page 621
Chapter 20
Set Operators Functions
Columns in the SELECT list should be from the same Domain SELECT First_Name FROM Employee_Table INTERSECT SELECT Department_Name FROM Department_Table;
First_Name __________
You can’t compare First_Name with Department_Name! Different Domains!
Rule 2
Answer set
No rows returned
The above query works without error, but no data is returned. There are no First Names that are the same as Department Names. This is like comparing Apples to Oranges. That means they are NOT in the same Domain.
Page 622
Chapter 20
Set Operators Functions
The Top Query handles all Aliases SELECT Dept_No as Depty ,Employee_No as "The Mgr" FROM Employee_Table INTERSECT Top query is responsible for SELECT Dept_No the column ,Mgr_No ALIAS, Title FROM Department_Table;
and Formatting.
Depty _____
The Mgr ________
400
1256349
The Top Query is responsible for ALIASING.
Page 623
Answer set
Rule 3
Chapter 20
Set Operators Functions
The Bottom Query does the ORDER BY SELECT Dept_No as Depty ,Employee_No as "The Mgr" FROM Employee_Table Bottom INTERSECT query is SELECT Dept_No responsible ,Mgr_No for the Sort FROM Department_Table with an ORDER BY 1 ; ORDER BY SELECT Dept_No as Depty ,Employee_No as "The Mgr" FROM Employee_Table Bottom query INTERSECT is SELECT Dept_No can use the ,Mgr_No number, FROM Department_Table column name ORDER BY Depty ; or alias
Rule 4
Rule 5
The Bottom Query is responsible for sorting. Above, we have both examples referencing the ORDER BY column as either the number 1 (column 1) or Depty. We could have also used Dept_No or even "The Mgr", but the ORDER BY statement must come from referencing column names, aliases or the number representing the column from the top query. Page 624
Chapter 20
Set Operators Functions
Great Trick: Place your Set Operator in a Derived Table SELECT Employee_No AS MANAGER ,RTRIM(Last_Name) + ', ' + First_Name as "Name" FROM Employee_Table INNER JOIN (SELECT Employee_No FROM Employee_Table INTERSECT SELECT Mgr_No FROM Department_Table) AS TeraTom (empno) ON Employee_No = empno ORDER BY "Name" __________ MANAGER 1256349 1333454 1000234 1121334
_______________ Name Harrison, Herbert Smith, John Smythe, Richard Strickling, Cletus
The Derived Table gave us the empno for all managers, and we were able to join it. Page 625
Chapter 20
Set Operators Functions
UNION Vs UNION ALL SELECT Department_Name, Dept_No FROM Department_Table UNION ALL SELECT Department_Name, Dept_No FROM Department_Table ORDER BY 1; UNION ALL Answer Set
UNION Answer Set _____________________ Department_Name ________ Dept_No
_____________________ Department_Name Dept_No ________
Customer Support Human Resources Marketing Research and Development Sales
Customer Support Customer Support Human Resources Human Resources Marketing Marketing Research and Development Research and Development Sales Sales
400 500 100 200 300
400 400 500 500 100 100 200 200 300 300
UNION eliminates duplicates, but UNION ALL does not. If you know that a set operator query does not have any duplicates you are still better to use UNION ALL. This does not check for duplicates so it is faster in performance. Page 626
Chapter 20
Set Operators Functions
Using UNION ALL and Literals SELECT Dept_No AS Dept ,'Employee ' as "Title" ,First_Name + ' ' + Last_Name as "Name" Dept _________ Title FROM Employee_Table ____ UNION ALL ? Employee SELECT Dept_No 10 Employee 100 Department ,'Department' 100 Employee ,Department_Name 200 Department FROM Department_Table 200 Employee ORDER BY 1, 2 ;
Name ______________
Squiggy Jones Richard Smythe Marketing Mandee Chambers Research and Develop Billy Coffing 200 Employee John Smith 300 Department Sales 300 Employee Loraine Larkins 400 Department Customer Support 400 Employee Cletus Strickling 400 Employee Herbert Harrison 400 Employee William Reilly 500 Department Human Resources
Notice the 2nd SELECT column in that it is a literal ‘Employee ‘ (with two spaces) and the other Literal is ‘Department’. These literals match up because now they are both 10 characters long exactly. The UNION ALL brings back all Employees and all Departments and shows the employees in each valid department. Page 627
Chapter 20
Set Operators Functions
A Great Example of how EXCEPT works Employee_Table Employee_No ________ Dept_No ____________ 2000000 ? 1000234 10 1232578 100 1324657 200 1333454 200 2312225 300 1121334 400 2341218 400 1256349 400
Last_Name __________ First_Name _______ Salary __________ 32800.50 Jones Squiggy 64300.00 Smythe Richard 48850.00 Chambers Mandee 41888.88 Coffing Billy 48000.00 Smith John 40200.00 Larkins Loraine 54500.00 Strickling Cletus 36000.00 Reilly William 54500.00 Harrison Herbert
Department_Table Dept_No ________________ Department_Name ________
SELECT Dept_No as Department_Number FROM Department_Table EXCEPT SELECT Dept_No FROM Employee_Table ORDER BY 1 ; _________________ Department_Number 500
This query brought back all Departments without any employees.
Page 628
100 200 300 400 500
Marketing Research and Dev Sales Customer Support Human Resources
Chapter 20
Set Operators Functions
USING Multiple SET Operators in a Single Request SELECT Dept_No , Employee_No empno FROM Employee_Table UNION ALL SELECT Dept_No, Employee_No FROM Employee_Table Dept_No ________ INTERSECT ? SELECT Dept_No, Mgr_No 10 FROM Department_Table 100 EXCEPT 200 SELECT Dept_No, Mgr_No 200 FROM Department_Table 300 WHERE Department_Name 400 LIKE '%Sales%' 400 ORDER BY 1, 2; 400
Empno ________ 2000000 1000234 1232578 1324657 1333454 2312225 1121334 1256349 2341218
Above, we use multiple SET Operators. They follow the natural Order of Precedence in that UNION is evaluated first, then INTERSECT, and finally EXCEPT. Page 629
Chapter 20
Set Operators Functions
Changing the Order of Precedence with Parentheses SELECT Dept_No , Employee_No empno FROM Employee_Table UNION ALL Dept_No (SELECT Dept_No, Employee_No _______ FROM Employee_Table ? INTERSECT (SELECT Dept_No, 10 Mgr_No 100 FROM Department_Table 200 EXCEPT 200 SELECT Dept_No, Mgr_No 300 FROM Department_Table 400 WHERE Department_Name 400 LIKE '%Sales%')) 400 ORDER BY 1, 2; 400
Empno _______ 2000000 1000234 1232578 1324657 1333454 2312225 1121334 1256349 1256349 2341218
Above, we use multiple SET Operators and Parentheses to change the order of precedence. Above, the EXCEPT runs first, then the INTERSECT and lastly, the UNION. The natural Order of Precedence without parentheses is UNION, INTERSECT, and finally EXCEPT. Page 630
Chapter 20
Set Operators Functions
Building Grouping Sets Using UNION SELECT NULL AS "Year", Product_ID, SUM(Daily_Sales) TotalSales FROM Sales_Table soh GROUP BY Product_ID UNION SELECT Year(Sale_Date) "Year", NULL Product_ID, SUM(Daily_Sales) TotalSales FROM Sales_Table soh GROUP BY Year(Sale_Date)
Year Product_ID TotalSales ____ _________ _________ ? ? ? 2000
1000 2000 3000 ?
331204.72 306611.81 224587.82 862404.35
The example above shows us that we made $862404.35 in the year 2,000. It also shows us what we made for Product_ID 1000, 2000 and 3000. If you totaled up the TotalSales 1000, 2000 and 3000, it would equal $862404.35.
Page 631
Chapter 20
Set Operators Functions
Three Grouping Sets Using a UNION SELECT NULL AS "Yr_Mnth", Product_ID, SUM(Daily_Sales) TotalSales FROM Sales_Table soh GROUP BY Product_ID UNION SELECT Year(Sale_Date) "Yr_Mnth", NULL Product_ID, SUM(Daily_Sales) TotalSales FROM Sales_Table soh GROUP BY Year(Sale_Date) UNION SELECT Month(Sale_Date) "Yr_Mnth", NULL Product_ID, SUM(Daily_Sales) TotalSales FROM Sales_Table soh GROUP BY Month(Sale_Date)
Yr_Mnth Product_ID TotalSales ________ _________ _________ ? ? ? 9 10 2000
1000 2000 3000 ? ? ?
331204.72 306611.81 224587.82 418769.36 443634.99 862404.35
The example above shows us that we made $862404.35 in the year 2,000. It also shows us what we made for Product_ID 1000, 2000 and 3000. If you totaled up the TotalSales 1000, 2000 and 3000, it would equal $862404.35. It also shows what we did in the months of September and October. If you total those months up it would also equal $862404.35. Page 632
Chapter 21
Page 633
Stored Procedure Functions
Chapter 21
Stored Procedure Functions
Chapter 21 – Stored Procedure Functions
“Freedom from effort in the present merely means that there has been effort stored up in the past.” - Theodore Roosevelt
Page 634
Chapter 21
Stored Procedure Functions
Creating a Stored Procedure Schema DBO "Database Owner"
The name of the Stored Procedure
CREATE PROCEDURE dbo.ListStudents AS SELECT Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt FROM Student_Table ORDER BY Class_Code ; The CREATE PROCEDURE command will create a stored procedure. The above procedure will return information about students from the Student_Table. The answer set is sorted by freshman, sophomore, junior and then senior. This procedure is created in the dbo schema. The letters dbo stand for “database owner.” The dbo schema is one that is always present in every database, and it is an excellent standard repository for stored procedures. Page 635
Chapter 21
Stored Procedure Functions
Executing a Stored Procedure The name of the Stored Procedure
EXEC dbo.ListStudents ;
Student_ID Last_Name __________ First_Name __________ __________
260000 125634 234121 423400 280023 322133 231222 333450 123250 324652
Johnson Hanson Thomas Larkins McRoberts Bond Wilson Smith Phillips Delaney
Stanley Henry Wendy Michael Richard Jimmy Susie Andy Martin Danny
Class_Code __________ Grade_Pt _________
? FR FR FR JR JR SO SO SR SR
? 2.88 4.00 0.00 1.90 3.95 3.80 2.00 3.00 3.35
You SELECT from a table. You SELECT from a view. You EXECUTE a Stored Procedure with an EXEC statement. This stored procedure queries the contents of a single table, returning a result set. A stored procedure works much like a view, but the query plan will actually be cached once it is executed for the first time. After the first time, the execution time will be consistent in consecutive executions. One reason to use Stored Procedures is consistency in executions. Page 636
Chapter 21
Stored Procedure Functions
There are Three Ways to Execute a Stored Procedure 1
ListStudents ;
2
EXECUTE ListStudents ;
3
EXEC ListStudents ;
Student_ID Last_Name __________ First_Name __________ __________ 260000 125634 234121 423400 280023 322133 231222 333450 123250 324652
Johnson Hanson Thomas Larkins McRoberts Bond Wilson Smith Phillips Delaney
Stanley Henry Wendy Michael Richard Jimmy Susie Andy Martin Danny
Class_Code __________ Grade_Pt _________ ? FR FR FR JR JR SO SO SR SR
? 2.88 4.00 0.00 1.90 3.95 3.80 2.00 3.00 3.35
You SELECT from a table. You SELECT from a view. You EXECUTE a Stored Procedure with an EXEC statement. Above, you can see the three ways to execute a Stored Procedure. The procedure name is ListStudents. You can merely type in ListStudents or EXECUTE ListStudents or EXEC ListStudents. Page 637
Chapter 21
Stored Procedure Functions
Creating a Stored Procedure with a CASE Statement CREATE PROC dbo.Employees AS SELECT Employee_No ,Dept_No ,Department = CASE DEPT_NO WHEN 100 THEN 'Marketing' WHEN 200 THEN 'Research and Development' WHEN 300 THEN 'Sales' WHEN 400 THEN 'Customer Support' WHEN 500 THEN 'Human Resources' ELSE 'Invalid Department' END ,First_Name + ' ' + Last_Name AS Fullname ,Salary FROM Employee_Table ORDER BY Department; Notice the name Department in blue and the following CASE statement. Check out the answer set on the following page.
Page 638
Chapter 21
Stored Procedure Functions
Our Answer Set EXEC dbo.Employees
Employee_No Department Fullname ____________ Dept_No _______ ______________________ _______________ Herbert Harrison 1256349 400 Customer Support William Reilly 2341218 400 Customer Support Cletus Strickling 1121334 400 Customer Support Richard Smythe 1000234 10 Invalid Department Squiggy Jones 2000000 ? Invalid Department Mandee Chambers 1232578 100 Marketing 1324657 200 Research and Development Billy Coffing 1333454 200 Research and Development John Smith Loraine Larkins 2312225 300 Sales
How about that answer set?
Page 639
Salary ________
54500.00 36000.00 54500.00 64300.00 32800.50 48850.00 41888.88 48000.00 40200.00
Chapter 21
Stored Procedure Functions
Dropping a Stored Procedure
DROP PROCEDURE SQL_Class.dbo.ListStudents
Dropping a stored procedure is easy. Just use the DROP PROCEDURE command. Above, we have fully qualified the database, schema and tablename in order to not accidentally drop another procedure named ListStudents in another database or schema. We could have also just ensured our default database was SQL_Class and then just typed, "DROP PROCEDURE ListStudents".
Page 640
Chapter 21
Stored Procedure Functions
Passing an Input Parameter to a Stored Procedure CREATE Procedure dbo.Employee_Find (@Employee_Num INTEGER ) AS SELECT * FROM Employee_Table WHERE Employee_No = @Employee_Num;
EXEC Employee_Find 2000000;
Employee_No Salary ____________ Dept_No ________ Last_Name _________ First_Name __________ ________ 2000000 ? Jones Squiggy 32800.50 Passing parameters is an integral part of why stored procedures are important. Using parameters, you can pass information into the body of a procedure in order to control how the procedure operates, and dramatically reduce complexity. The example above has a single input parameter. It is an Employee_No. When the stored procedure is executed, you merely supply the Employee_No you are trying to find and the result is a single employee. In this case it is Squiggy Jones.
Page 641
Chapter 21
Stored Procedure Functions
Executing With Positional Parameter vs. Named Parameters CREATE Procedure dbo.Employee_Find (@Employee_Num INTEGER ) AS SELECT * FROM Employee_Table WHERE Employee_No = @Employee_Num; This is how you execute using a positional parameter: EXEC SQL_Class.DBO.Employee_Find 2000000 ;
Employee_Number is equal to 2000000
This is how you execute using a named parameter: EXEC SQL_Class.DBO.Employee_Find @Employee_Num = 2000000 ;
The example above has a single input parameter. It is an Employee_No. In our examples above we have fully qualified the execute statement with the database.schema.storedprocedurename. In our first example we use a positional parameter, thus, the stored procedure assumes that 2000000 is the @Employee_Num value. In the second example, we specify this by using the @Employee_Num = 2000000. Page 642
Chapter 21
Stored Procedure Functions
Passing an Output Parameter to a Stored Procedure CREATE PROCEDURE dbo.Student_Count @Class_Code CHAR(2) OUTPUT AS SELECT Class_Code FROM Student_Table WHERE Class_Code = 'FR';
DECLARE @Class_Code Char(2); EXEC dbo.Student_Count @Class_Code OUTPUT; PRINT @Class_Code;
Class_Code __________ FR FR FR
The stored procedure above begins with a defined output parameter called @Class_Code. Notice, it has a data type of Char(2) and also specifies the keyword OUTPUT. The code that invokes the stored procedure creates the variable to pass as the output parameter (DECLARE @Class_Code Char(2)). The EXEC statement will also specify that the parameter is OUTPUT. Page 643
Chapter 21
Stored Procedure Functions
Changing a Stored Procedure with an ALTER The name of the Stored Procedure
ALTER PROCEDURE dbo.ListStudents AS SELECT Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt FROM Student_Table ORDER BY CASE Class_Code WHEN 'Fr' THEN 1 WHEN 'So' THEN 2 WHEN 'Jr' THEN 3 WHEN 'Sr' THEN 4 ELSE 5 END, Grade_Pt DESC
The CREATE PROCEDURE command will create a stored procedure. The above procedure will return information about students from the Student_Table. The answer set is sorted by Freshman, Sophomore, Junior and then Senior. Then, the minor sort is Grade_Pt DESC. This procedure is created in the dbo schema. The letters dbo stand for “database owner.” The dbo schema is one that is always present in every database, and it is an excellent standard repository for stored procedures. Page 644
Chapter 21
Stored Procedure Functions
Answer Set for the Altered Stored Procedure The name of the Stored Procedure
EXEC dbo.ListStudents
Student_ID Last_Name __________ First_Name __________ __________
234121 125634 423400 231222 333450 322133 280023 324652 123250 260000
Thomas Hanson Larkins Wilson Smith Bond McRoberts Delaney Phillips Johnson
Wendy Henry Michael Susie Andy Jimmy Richard Danny Martin Stanley
Class_Code __________ Grade_Pt _________
FR FR FR SO SO JR JR SR SR ?
4.00 2.88 0.00 3.80 2.00 3.95 1.90 3.35 3.00 ?
We have altered the stored procedure to sort by Fr, So, Jr, Sr and then the null value. The minor sort is Grade_Pt DESC. It is very easy to create, execute and alter a stored procedure.
Page 645
Chapter 21
Stored Procedure Functions
Using a Stored Procedure to Delete a Row SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust _________ 11111111 31313131 31323134 57896883 87323456
Name ________________ Billy's Best Choice Acme Products ACE Consulting XYZ Plumbing Databases N-U
Phone _________ 555-1234 555-1111 555-1212 347-8954 322-1012
CREATE PROCEDURE Del_Cust AS DECLARE @Cust_No INT; SET @Cust_No = 31313131; DELETE FROM SQL_Class.dbo.Customer_Table WHERE Customer_Number = @Cust_No; Exec Del_Cust ; SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust _________ 11111111 31323134 57896883 87323456
Name ________________ Billy's Best Choice ACE Consulting XYZ Plumbing Databases N-U
Phone _________ 555-1234 555-1212 347-8954 322-1012
The example above demonstrates how to delete a row from a table using a stored procedure. Page 646
Chapter 21
Stored Procedure Functions
A Different Method to Delete a Row SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust _________ 11111111 31323134 57896883 87323456
Name ________________ Billy's Best Choice ACE Consulting XYZ Plumbing Databases N-U
Phone _________ 555-1234 555-1212 347-8954 322-1012
CREATE PROCEDURE Del_Cust2 AS DECLARE @Cust_No INT = 87323456; DELETE FROM SQL_Class.dbo.Customer_Table WHERE Customer_Number = @Cust_No;
Exec Del_Cust2 ; SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust _________ 11111111 31323134 57896883
Name ________________ Billy's Best Choice ACE Consulting XYZ Plumbing
Phone _________ 555-1234 555-1212 347-8954
The example above demonstrates another way to declare a variable and to delete a row from a table using a stored procedure. Page 647
Chapter 21
Stored Procedure Functions
Deleting a Row Using an Input Parameter SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust _________ 11111111 31323134 57896883
Name ________________ Billy's Best Choice ACE Consulting XYZ Plumbing
Phone _________ 555-1234 555-1212 347-8954
CREATE PROCEDURE Del_Cust_Parm @Cust_No INT AS DELETE FROM SQL_Class.dbo.Customer_Table WHERE Customer_Number = @Cust_No;
Del_Cust_Parm 31323134;
SELECT Customer_Number as Cust ,Customer_Name as Name ,Phone_Number as Phone FROM Customer_Table;
Cust Name Phone _________ ________________ _________ 11111111 Billy's Best Choice 555-1234 57896883 XYZ Plumbing 347-8954
The example above demonstrates how to delete a row from a table using a stored procedure via an input parameter. This is the preferred method because you can use this stored procedure over and over again. You just supply a different customer_number each time you execute it. Page 648
Chapter 21
Stored Procedure Functions
Using Loops in Stored Procedures
1
2
CREATE Table My_Tbl_XYZ ( Cntr Integer ,TheDate Date );
Use your initials for the XYZ piece of the table
CREATE PROCEDURE Inserter_Five AS DECLARE @Cntr INTEGER = 0; WHILE @Cntr < 5 BEGIN; SET @Cntr = @Cntr + 1; INSERT INTO My_Tbl_XYZ VALUES (@Cntr, '2015-06-30') ; END;
3
Inserter_Five ;
There are now Five rows the table
We created a table called My_Tbl_XYZ. Then, we inserted five rows inside the table using a WHILE loop. The WHILE loop did a loop five times. Page 649
Chapter 21
Stored Procedure Functions
Stored Procedure Workshop Create the table below but substitute the XYZ with you initials
CREATE Table Table_XYZ ( Col1 INTEGER ,Col2 INTEGER ); Create the procedure but substitute the XYZ with you initials
Now, create a stored procedure called Final_XYZ that places 1,000 rows inside the table:
Col1 should have 1000 unique values Col2 should have 250 different values Your mission is to create the table above and then create a stored procedure that will insert 1,000 rows. The tricky part is that col1 should have 1,000 unique values, but col2 should have only 250 different values. Page 650
Chapter 21
Stored Procedure Functions
Looping with a WHILE Statement CREATE Table SQL_Class.dbo.Table_XYZ ( Col1 INTEGER ,Col2 INTEGER) ; CREATE Procedure Final_XYZ AS DECLARE @Cntr INT = 0; DECLARE @Cntr2 INT = 0; WHILE @Cntr < 1000 BEGIN; SET @Cntr = @Cntr + 1; SET @Cntr2 = @Cntr2 + 1; If @Cntr2 = 251 BEGIN; SET @Cntr2 = 1; END; INSERT INTO SQL_Class.dbo.Table_XYZ Values (@Cntr, @Cntr2); END; Exec Final_XYZ;
The assignment is to create a table called Table_XYZ. It has two columns (Col1 and Col2). Their data types are integer. The next part of the assignment is to insert 1,000 rows inside the table. The first column (Col1) will have 1,000 unique values. The second column (Col2) will have only 250 different values. Above is how it is done. Page 651
Chapter 22
Page 652
Statistical Aggregate Functions
Chapter 22
Statistical Aggregate Functions
Chapter 22 – Statistical Aggregate Functions
"You can make more friends in two months by becoming interested in other people than you will in two years by trying to get other people interested in you." - Dale Carnegie
Page 653
Chapter 22
Statistical Aggregate Functions
The Stats Table Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 1 3 3 3 4 5 5 5 5 7 7 9 9 9 9 10 10 10 10 10 10 13 13 13 14 15 15 16 16
1 1 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 20 20 20 20 20 20 30 30 40 40 50 50 60
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 2 3 4 5 6 7 8 9 10 22 12 13 14 15 14 13 12 11 9 8 7 6 5 4 3 2 1 1 1
0 5 10 15 20 30 30 30 35 35 40 40 45 45 50 55 55 60 60 65 65 65 70 70 80 85 90 90 95 100
Above, is the Stats_Table data in which we will use in our statistical examples.
Page 654
Chapter 22
Statistical Aggregate Functions
The VAR and VARP Functions SELECT VAR(col1) AS Variance_Example, VARP(col1) AS Var_EntirePopulation FROM Stats_Table ;
Variance_Example _______________ Var_EntirePopulation _________________ 77.5
74.92
The VAR and VARP functions return the statistical variance of all the values in the specified expression. The VAR uses a sample of the data population to return a value. The VARP returns the value based upon the entire data population. The expression parameter must be one of the exact or approximate numeric data types, except for the bit data type.
Page 655
Chapter 22
Statistical Aggregate Functions
A VAR Example Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 1 1 30 1 0 2 1 1 29 2 5 3 3 10 28 3 10 4 3 10 27 4 15 5 3 10 26 5 20 6 4 10 25 6 30 7 5 10 24 7 30 8 5 10 23 8 30 9 5 10 22 9 35 10 5 20 21 10 35 11 7 20 20 22 40 12 7 20 19 12 40 13 9 20 18 13 45 14 9 20 17 14 45 15 9 20 16 15 50 16 9 20 15 14 55 17 10 20 14 13 55 18 10 20 13 12 60 19 10 20 12 11 60 20 10 20 11 9 65 21 10 20 10 8 65 22 10 20 9 7 65 23 13 20 8 6 70 24 13 30 7 5 70 25 13 30 6 4 80 26 14 40 5 3 85 27 15 40 4 2 90 28 15 50 3 1 90 29 16 50 2 1 95 30 16 60 1 1 100
Page 656
SELECT VAR(col1) AS Col1 ,VAR(col2) AS Col2 ,VAR(col3) AS Col3 ,VAR(col4) AS Col4 ,VAR(col5) AS Col5 ,VAR(col6) AS Col6 FROM Stats_Table;
____ _____ Col1 Col2 ____ Col3 ____ Col4 _____ Col5 Col6 _____ 77.5 19.95 197.65 77.5 20.25 747.73
The VAR function returns the statistical variance of all the values in the specified expression. The VAR uses a sample of the data population to return a value.
Chapter 22
Statistical Aggregate Functions
A VARP Example Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 1 1 30 1 0 2 1 1 29 2 5 3 3 10 28 3 10 4 3 10 27 4 15 5 3 10 26 5 20 6 4 10 25 6 30 7 5 10 24 7 30 8 5 10 23 8 30 9 5 10 22 9 35 10 5 20 21 10 35 11 7 20 20 22 40 12 7 20 19 12 40 13 9 20 18 13 45 14 9 20 17 14 45 15 9 20 16 15 50 16 9 20 15 14 55 17 10 20 14 13 55 18 10 20 13 12 60 19 10 20 12 11 60 20 10 20 11 9 65 21 10 20 10 8 65 22 10 20 9 7 65 23 13 20 8 6 70 24 13 30 7 5 70 25 13 30 6 4 80 26 14 40 5 3 85 27 15 40 4 2 90 28 15 50 3 1 90 29 16 50 2 1 95 30 16 60 1 1 100
Page 657
SELECT VARP(col1) ,VARP(col2) ,VARP(col3) ,VARP(col4) ,VARP(col5) ,VARP(col6) FROM Stats_Table;
AS Col1 AS Col2 AS Col3 AS Col4 AS Col5 AS Col6
____ _____ Col1 Col2 ____ Col3 ____ Col4 _____ Col5 Col6 _____ 74.92 19.29 191.06 74.92 19.58 722.81
The VARP function returns the statistical variance of all the values in the specified expression. The VARP returns the value based upon the entire data population.
Chapter 22
Statistical Aggregate Functions
The STDEV and STDEVP Functions SELECT STDEV(Col1) AS StandDev ,STDEVP(Col1) AS StandDevPop FROM Stats_Table
StandDev ________ 8.8 The STDEV function returns the standard deviation, but only uses a sample of the data population
StandDevPop ___________ 8.66 The STDEVPOP function returns the standard deviation based on the entire population
The STDEV function returns the standard deviation of all the values in the specified expression, but only uses a sample of the data population to return a value. The STDEVP returns the standard deviation value based upon the entire data population. The expression parameter must be one of the exact or approximate numeric data types, except for the bit data type. Page 658
Chapter 22
Statistical Aggregate Functions
A STDEV Example Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 1 1 30 1 0 2 1 1 29 2 5 3 3 10 28 3 10 4 3 10 27 4 15 5 3 10 26 5 20 6 4 10 25 6 30 7 5 10 24 7 30 8 5 10 23 8 30 9 5 10 22 9 35 10 5 20 21 10 35 11 7 20 20 22 40 12 7 20 19 12 40 13 9 20 18 13 45 14 9 20 17 14 45 15 9 20 16 15 50 16 9 20 15 14 55 17 10 20 14 13 55 18 10 20 13 12 60 19 10 20 12 11 60 20 10 20 11 9 65 21 10 20 10 8 65 22 10 20 9 7 65 23 13 20 8 6 70 24 13 30 7 5 70 25 13 30 6 4 80 26 14 40 5 3 85 27 15 40 4 2 90 28 15 50 3 1 90 29 16 50 2 1 95 30 16 60 1 1 100
Page 659
SELECT STDEV(col1) ,STDEV(col2) ,STDEV(col3) ,STDEV(col4) ,STDEV(col5) ,STDEV(col6) FROM Stats_Table;
AS Col1 AS Col2 AS Col3 AS Col4 AS Col5 AS Col6
____ _____ Col1 Col2 ____ Col3 ____ Col4 _____ Col5 Col6 _____ 8.8 4.47 14.06 8.8 4.5 27.34
The STDEV function returns the standard deviation, but only uses a sample of the data population
Chapter 22
Statistical Aggregate Functions
A STDEVP Example Col1 Col3 ____ Col4 _____ Col5 _____ Col6 ____ Col2 ____ ____ 1 1 1 30 1 0 2 1 1 29 2 5 3 3 10 28 3 10 4 3 10 27 4 15 5 3 10 26 5 20 6 4 10 25 6 30 7 5 10 24 7 30 8 5 10 23 8 30 9 5 10 22 9 35 10 5 20 21 10 35 11 7 20 20 22 40 12 7 20 19 12 40 13 9 20 18 13 45 14 9 20 17 14 45 15 9 20 16 15 50 16 9 20 15 14 55 17 10 20 14 13 55 18 10 20 13 12 60 19 10 20 12 11 60 20 10 20 11 9 65 21 10 20 10 8 65 22 10 20 9 7 65 23 13 20 8 6 70 24 13 30 7 5 70 25 13 30 6 4 80 26 14 40 5 3 85 27 15 40 4 2 90 28 15 50 3 1 90 29 16 50 2 1 95 30 16 60 1 1 100
Page 660
SELECT STDEVP(col1) , STDEVP(col2) , STDEVP(col3) , STDEVP(col4) , STDEVP(col5) , STDEVP(col6) FROM Stats_Table;
____ _____ Col1 Col2 ____ Col3 8.66 4.39 13.82
AS Col1 AS Col2 AS Col3 AS Col4 AS Col5 AS Col6
____ Col4 _____ Col5 Col6 _____ 8.66 4.42 26.89
The STDEVPOP function returns the standard deviation based on the entire population
Chapter 23
Page 661
Systems Views
Chapter 23
Systems Views
Chapter 23 – Systems Views
“In the end we’ll remember not the words of our enemies, but the silence of our friends.” - Martin Luther King, Jr.
Page 662
Chapter 23
Systems Views
System Views System views describe three things: 1. Metadata 2. System catalog 3. Dynamic processes for the Azure SQL Data Warehouse There are three types of views within system views: 1. Catalog views 2. dynamic management views (DMVs) 3. Information schema views Catalog Views show information about metadata, such as table and column names. The name of each Azure SQL Data Warehouse catalog view begins with sys.pdw_. underscore Dynamic Management Views (DMVs) show information about dynamic processes, such as the queries that are currently in progress and memory usage for each node. The name of each Azure SQL Data Warehouse DMV begins with sys.dm_pdw_. underscore
Information Schema Views show metadata for the data objects in a particular database. These views have a special schema named INFORMATION_SCHEMA. This schema is contained in each database.
The basics of the Azure SQL Data Warehouse system views are listed above.
Page 663
Chapter 23
Systems Views
sys.all_columns SELECT name ,max_length ,precision ,scale FROM sys.all_columns
Name ____________ Subscriber_No Street City State Zip AreaCode Phone
max_length ________ precision scale __________ _____ 4 30 20 2 4 2 4
10 0 0 0 10 5 10
The sys.all_columns view shows columns for both user-defined and system objects.
Page 664
0 0 0 0 0 0 0
Chapter 23
Systems Views
sys.all_objects SELECT Name ,Type_Desc ,Create_Date ,Modify_Date FROM sys.all_objects WHERE name = 'Emp_Intl'
Name Type_Desc Create_Date Modify_Date ________ _____________ __________________ ____________________ Emp_Intl USER_TABLE 04/30/2015 9:23:16.317 04/30/2015 9:23:16.317
The sys.all_objects views shows user-defined and system objects, including tables and views.
Page 665
Chapter 23
Systems Views
sys.all_sql_modules
The sys.all_sql_modules views shows SQL Server modules, including user-defined and system objects.
Page 666
Chapter 23
Systems Views
sys.all_views SELECT Name ,Type_Desc ,Create_Date ,Modify_Date FROM sys.all_views WHERE name = 'Employee_V'
Name ________
Type_Desc ____________________ Create_Date Modify_Date __________ _____________________
Employee_V VIEW
04/30/2015 10:37:20.077 04/30/2015 10:37:20.077
The sys.all_views shows user defined and system objects.
Page 667
Chapter 23
Systems Views
sys.columns Select Object_ID ,Name ,Max_Length ,Precision ,Scale FROM sys.columns WHERE name = 'First_Name'
Object_ID Name Max_Length __________ __________ __________ Precision ________ Scale _____ 690101499 754101727 850102069 882102183 978102525 994102582 1010102639
First_Name First_name First_Name First_Name First_Name First_Name First_Name
12 12 12 20 12 12 12
0 0 0 0 0 0 0
0 0 0 0 0 0 0
The sys.columns system view shows columns for user-defined tables and user-defined views. What a great way to check to see if columns with the same name exist in other tables and if they are consistently defined.
Page 668
Chapter 23
Systems Views
sys.data_spaces
The sys.data_spaces system view contains a row for each data space. This can be a filegroup or partition scheme.
Page 669
Chapter 23
Systems Views
sys.database_files
The sys.database_files system view returns one row per file of the current Azure SQL Data Warehouse database as stored in the database itself. This is a per-database view.
Page 670
Chapter 23
Systems Views
sys.database_principals Select Name ,Type ,Type_Desc ,Default_Schema_Name as "Schema" ,Create_Date FROM sys.database_principals WHERE name in ('public', 'dbo')
Name Type_Desc Schema Create_Date ______ Type ____ _________________ _______ ____________________ public R dbo S
DATABASE_ROLE ? SQL_USER dbo
04/08/2003 9:10:42.317 04/08/2003 9:10:42.287
The sys.database_principals system view returns a row for each security principal in a database.
Page 671
Chapter 23
Systems Views
sys.database_role_members
The sys.database_role_members system view returns one row for each member of each database role for the Azure SQL Data Warehouse.
Page 672
Chapter 23
Systems Views
sys.databases
The sys.databases system view is aligned with the corresponding view exposed by SQL Server. The Azure SQL Data Warehouse exposes logical databases rather than the actual physical databases on the various Compute node instances. This view will show you the databases in the system. However, because some features are not supported on the Azure SQL Data Warehouse, some columns have fixed return values. Page 673
Chapter 23
sys.filegroups
The sys.filegroup system view contains a row for each data space that is a filegroup.
Page 674
Systems Views
Chapter 23
sys.identity_columns
The sys.identity_columns system view shows identity columns.
Page 675
Systems Views
Chapter 23
Systems Views
sys.objects SELECT name, type_desc, create_date FROM sys.objects WHERE name like '%table%' name __________________ Customer_table Order_table Student_table Course_table Student_Course_table Sales_table Employee_table Department_table Stats_table Job_table Emp_Job_table Names_table Hierarchy_table
type_desc _____________ USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE USER_TABLE
create_date ____________________ 03/17/2015 8:29:41.820 03/17/2015 8:29:43.283 03/17/2015 8:29:43.760 03/17/2015 8:29:44.257 03/17/2015 8:29:44.713 03/17/2015 8:29:45.180 03/17/2015 8:29:45.563 03/17/2015 8:29:45.983 03/17/2015 8:29:46.387 03/17/2015 8:29:46.940 03/17/2015 8:29:47.450 03/17/2015 8:29:47.943 03/17/2015 8:29:48.350
The sys.objects system view returns a row for each user-defined object that is created within a database.
Page 676
Chapter 23
Systems Views
sys.partition_range_values
The sys.partition_range_values view contains a row for each range boundary value of a partition function of type R.
Page 677
Chapter 23
sys.schemas
The sys.schemas system view contains a row for each database schema.
Page 678
Systems Views
Chapter 23
Systems Views
sys.server_role_members
The sys.server_role_members system view returns one row for each member of each fixed server role in the Azure SQL Data Warehouse.
Page 679
Chapter 23
sys.sql_logins
The sys.sql_logins system view returns one row for every SQL Server authentication login.
Page 680
Systems Views
Chapter 24
Page 681
Nexus
Chapter 24
Nexus
Chapter 24 – Nexus
“I might be just a little bit biased, but because of our long term vision and incredible determination, the Nexus has become what some consider the greatest BI tool on planet Earth!” - Tera-Tom Coffing
Page 682
Chapter 24
Nexus
Nexus is Now Available on the Microsoft Azure Cloud
Why the Nexus Chameleon should be your query tool of choice: 1) Queries every major system 2) Provides visualization and automatically writes the SQL 3) Can perform cross-system joins with a few clicks of the mouse 4) Converts table structures and moves the table and data between systems 5) Compares and synchronizes databases 6) Can move an entire database of tables or views between systems 7) Has the "Garden of Analysis" to re-query answer sets inside your PC 8) Provides a dashboard of graphs and charts for answer sets Download the Nexus for a free trial at www.CoffingDW.com and use Nexus in-house or on the Microsoft Azure cloud. Page 683
Chapter 24
Nexus
Nexus Queries Every Major System
Nexus is connected to each of the systems above
Priority number one for us was to build the best BI tool and then get it working on every major platform.
Page 684
Chapter 24
Nexus
Setup of Nexus is as easy as pie
To add a system just right click on the Systems Tree and choose Add data source connection
Some of the reasons Nexus is so popular on cloud platforms are because Nexus queries every major platform and it is so easy to setup. Just right click on the systems tree and choose "Add data source connection". You can then add all of your systems one by one and before you know it you are ready to query them all.
Page 685
Chapter 24
Nexus
Setup of Nexus is a Easy as 1, 2, 3
1 Choose your system type from the drop down menu
2 Hit the Add New Button
3 Pick your driver. The Nexus Chameleon drivers are already installed for you.
Once you have right clicked on the Systems Tree and selected "Add new data source", you will come to the Data Source Connection page (see above). First, choose your Source Type from the drop down menu. Hit the Add New button and choose your driver from the System DSN tab (The Nexus Chameleon drivers are outstanding). Then, hit the CONFIGURE button and put in your IP address, login and password. You are ready to begin querying.
Page 686
Chapter 24
Nexus
Nexus Data Visualization
“It never made sense to me that the data scientist and the business user couldn't work together on the same playing field. We developed a way for them to work together, by building the Super Join Builder.” - Tera-Tom Coffing
Page 687
Chapter 24
Nexus
Nexus Data Visualization
1
Right Click on any table and choose Super Join Builder
You can write the SQL yourself and Nexus will bring back an answer set, but why not let Nexus write the SQL for you? The Nexus has the best data visualization and it took years of work and millions of lines of code. Just right click on a table in any of your systems trees (above we chose the Addresses table) and then choose SUPER JOIN BUILDER. The table will appear visually and in color. It will show the table name, the columns and their data types. Just check the columns you want on your report and Nexus will build the SQL for you automatically!
Page 688
Chapter 24
Nexus
Nexus Data Visualization Shows What Tables Can Be Joined
Left Click on the Add Join drop down
The menu shows what tables can be joined together.
1
2
Once you see your first table in the Super Join Builder the fun is just beginning. Left Click on the top right of the visible table and select the drop down menu where it says "Add Join". Nexus will show you what tables can be joined. The Addresses table above can be joined to the Subscribers table. The Subscribers table can be joined to the Claims table. The Claims table can be joined to the Providers and Services tables. Be prepared to be amazed at the next page!
Page 689
Chapter 24
Nexus
Nexus is doing a Five-Table Join
The "Add Join" button showed us the tables that could be joined and we chose them all. Notice that we can now see each table visually (and in color) and their respective columns and data types. Also, notice that we have checked the columns we want on our report. The Nexus has already built the SQL instantly and automatically for you and it does so perfectly. This technology puts the business user on the same level with the data scientist. The next page will show the SQL generated! Page 690
Chapter 24
Nexus
Nexus Generates the SQL Automatically
1 By clicking on the SQL tab you can see the SQL that Nexus has generated automatically
This is the SQL that was built automatically from the previous page. Since we are querying an Azure SQL Data Warehouse system, the Nexus built T-SQL to satisfy the query. It does not matter whether the system you are querying is Hadoop, Oracle, SQL Server, Teradata or any other system, the Nexus will build the SQL perfectly for that system. All you have to do now is hit the EXECUTE button and you will receive your answer set.
Page 691
Chapter 24
Nexus
Nexus Delivers the Report 1
By hitting the EXECUTE button the report was delivered in the Answer Set tab
When you hit the EXECUTE button Nexus executes the query and delivers the report.
Page 692
Chapter 24
Nexus
Cross-System Joins from Teradata, Oracle and SQL Server
The three top tables are from Teradata and the bottom tables are from Oracle and SQL Server.
The three tables at the top are from Teradata, but the tables at the bottom are from Oracle and SQL Server. When you hit EXECUTE, the Nexus will deliver the report. Nexus not only builds the SQL needed, but the table conversions and data movement to make it happen. Nexus does all of the difficult things for you. You just point and click on the columns you want from the tables and Nexus does the rest. Is that amazing or what?
Page 693
Chapter 24
Nexus
The Tab of the Super Join Builder
“The 9 tabs of the Super Join Builder are each dedicated to a single query, but each provides a different function. This makes the automatic writing of the SQL so easy, intuitive, quick and yet powerful.” - Tera-Tom Coffing .
Page 694
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Objects Tab 1
The Objects tab is the first screen you see whenever you right click on any table and choose Super Join Builder. The Objects tab (in red) shows the table, columns and their data types. The Objects tab also allows you to left click on the right corner of any table on the ADD JOIN dropdown to see what other tables are joinable. If you click on a joinable table in the ADD JOIN menu then that table will appear in the Objects tab as well. If you check mark any of the columns from any tables in the Objects tab the SQL will be built and include those selected columns in the report. Since we have not selected any columns yet the SQL has not been built. Once we begin to checkmark columns the SQL will be built. Above, we first entered the Super Join Builder by right clicking on the Addresses table in our systems tree and we chose Super Join Builder. We then left clicked on the right corner of the Addresses table on the ADD JOIN drop down and we selected Subscribers. Both tables then appeared in the Object tab. We then left clicked on the Subscribers table on the ADD JOIN drop down and we can see that Claims joins to subscribers. Page 695
Chapter 24
Nexus
Selecting Columns in the Objects Tab
The Objects tab is the first screen you see whenever you right click on any table and choose Super Join Builder. The Objects tab (in red) shows the table, columns and their data types, and allows you to left click on the right corner of any table on the ADD JOIN dropdown to see what other tables are joinable. We have chosen a two table join between the Addresses table and the Subscribers table. Notice that we clicked on the checkbox on the columns Street, City and State of the Addresses table, and also notice that we clicked on the SELECT * of the Subscribers table which auto-clicked all columns. Our answer set (as of now) will come back with 9 columns (Street, City, State, Last_Name, First_Name, Gender, SSN, Member_No and Subscriber_No). The SQL has automatically been generated with each check of a column. Page 696
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Columns Tab 2
These are the columns that will be on the report. This is because you checked them in the Objects Tab. You can rearrange these columns like a Rubik's cube (click and drag) and the SQL will reflect any and all changes.
These are the columns that you did not check in the Object Tab. You can drag them up if you decide you want any of them on the report.
The columns tab displays the columns that you selected in the Object tab that will be on the report (at the top). Notice the colors correspond to their respective tables. It is here that you can change the order of the columns by dragging them to the order that you prefer. Notice at the bottom are the columns that you did not select in the Objects tab. These will not be on the report. You can however drag these up to the top and then they will be on the report. The columns at the top are on the report and the columns at the bottom are not, but you can rearrange these columns until the report is exactly what you want.
Page 697
Chapter 24
Nexus
Removing Columns from the Report in the Columns Tab
Drag any column to the trashcan (blue arrow) and it is no longer on the report.
You can remove a series of columns also. Above, we did a CTRL click on Gender and a SHIFT Click on Member_No and all columns between highlight. Keep the SHIFT key down and move them all to the trashcan.
If you want to delete a column from the report, just drag it to the trashcan and it will appear at the bottom in the list in the area of non-selected columns. You can also remove a series of columns. Above, we did a CTRL click on the Gender column and then did a SHIFT click on the Member_No column. If we keep the SHIFT key down, and drag them all together to the trash can, then all three of these columns are removed from the report.
Page 698
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Sorting Tab 3
The Sorting tab allows you to sort the answer set by simply double clicking on a column or by dragging it up. The columns are listed near the bottom (in color) and the columns at the very bottom were not selected to be on the report, but you can still sort by them. In our example above, we chose the Column State to be the major sort key. The Column State was selected previously to be part of the report. The column Zip is the minor sort key, but as you can see it was not previously selected to be on the report. Now, the Nexus will automatically place an ORDER BY statement at the end of the query. That ORDER BY statement will be ORDER BY State, Zip DESC.
Page 699
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Joins Tab 4
The drop down box allows you to change the join from an INNER JOIN to a:
LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN
The Nexus defaults all joins to INNER JOIN, but the Joins tab will allow you to change any of the joins from an INNER JOIN to any OUTER JOIN. Just hit the drop down box (red arrow) and your outer join options await you.
Page 700
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Where Tab 5
Columns on the report Indexed columns Columns not on the report
The WHERE tab is designed to do two things. First, it shows you the indexes of the tables even if you are using views. This allows users to click on indexed columns and utilize an additional WHERE clause. Secondly, it shows all of the columns already on the report and those that are not on the report. Either way, you can double click on any column and write the WHERE or AND clause. I will demonstrate that on the next page.
Page 701
Chapter 24
Nexus
Using the WHERE Tab For Additional WHERE or AND
By double clicking on the Subscriber_No, that column is placed down below. I entered the Subscriber_No = 123456778. SELECT "Add".Street, "Add".City, "Add"."State", SUB.Last_Name, SUB.First_Name, SUB.Subscriber_No FROM SQL_CLASS.ADDRESSES "Add" INNER JOIN SQL_CLASS.SUBSCRIBERS SUB ON "Add".Subscriber_No = SUB.Subscriber_No WHERE "Add".Subscriber_No = 123456778 ORDER BY "Add"."State" ASC, "Add".Zip DESC;
That will automatically be placed in the SQL. Since that column is an index the system will retrieve the answer set faster.
If you went to the SQL tab you would see this SQL. Notice it reflects everything we have done over the past several examples.
The example above shows us double clicking on the Subscriber_No column. Notice (follow the arrow) that an additional WHERE clause was added. The Subscriber_No = awaited us to place a Subscriber_No in and we typed 123456778.
Page 702
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – SQL Tab 6
The SQL tab shows the SQL that Nexus has automatically generated. Every click from every tab can cause a change to the SQL. We first went to the Object Tab where we chose the Addresses table and the Subscribers table. We chose our columns in the Objects tab, but we then went to the Columns tab and deleted some of the columns. We then went to the Sorting tab and chose our ORDER BY keys of State and Zip DESC. We then went to the WHERE tab and added an additional WHERE clause choosing the column Subscriber_No and then we typed in = 123456778. The SQL reflects everything we did. Page 703
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Answer Set Tab 7 1
By hitting the EXECUTE button the report was delivered in the Answer Set tab
When you hit EXECUTE, the SQL generated is run on your system and you receive an answer set. The above example is a different example than our previous examples. This reflects just a two-table join.
Page 704
Chapter 24
Nexus
The 9 Tabs of the Super Join Builder – Analytics Tab 9
Select all of the columns and then click on the Analytics tab
The Analytics tab is used for Rank, OLAP, and for Group by Grouping Sets, Group by Rollup and Group by Cube queries. It is usually used with a single table. Above, we have right clicked on the Sales_Table and chosen Super Join Builder. We will next click on the Analytics tab to show you how to generate analytics quickly. Turn the page and let's get started. Page 705
Chapter 24
Nexus
Analytics Tab Your three Analytics options are OLAP, Rank, Grouping Sets
We are in the OLAP tab.
The Analytics tab is used for Rank, OLAP, and for Group by Grouping Sets, Group by Rollup and Group by Cube queries. It is usually used with a single table. Above, we have right clicked on the Sales_Table and chosen Super Join Builder. We will next click on the Analytics tab to show you how to generate analytics quickly. Turn the page and let's get started.
Page 706
Chapter 24
Nexus
Analytics Tab – OLAP Example
This report will generate an OLAP report (Online Analytic Processing) such as a Cumulative Sum, Moving Sum, etc. We dragged the Daily_Sales column from the bottom to the OLAP column (top left). We dragged the Product_ID and Sale_Date columns to the sorting area. We dragged the Product_ID column to the Partitioning area and we changed our moving window to a 3. We then checked all of the OLAP functions on the top right, including the OLAP with Partitioning. The next slide will show the SQL automatically generated in the SQL tab.
Page 707
Chapter 24
Nexus
Analytics Tab – OLAP Example of SQL Generated
SELECT Sal.Product_ID, Sal.Sale_Date, Sal.Daily_Sales, SUM(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS UNBOUNDED PRECEDING) , SUM(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS 2 PRECEDING) , AVG(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS 2 PRECEDING) , Sal.Daily_Sales - SUM(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS BETWEEN 2 PRECEDING AND 2 PRECEDING) , COUNT(*) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS UNBOUNDED PRECEDING) , MIN(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ) , MAX(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ) , SUM(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS UNBOUNDED PRECEDING) , SUM(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS 2 PRECEDING) , AVG(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS 2 PRECEDING) , AVG(Sal.Daily_Sales) OVER ( ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS UNBOUNDED PRECEDING) , Sal.Daily_Sales - SUM(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS BETWEEN 2 PRECEDING AND 2 PRECEDING) , COUNT(*) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ROWS UNBOUNDED PRECEDING) , MIN(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ) , MAX(Sal.Daily_Sales) OVER (PARTITION BY Sal.Product_ID ORDER BY Sal.Product_ID ASC, Sal.Sale_Date ASC ) FROM SQL_CLASS.Sales_table Sal;
The SQL above might take hours to write, but with the Nexus it can be generated in 30 seconds.
Page 708
Chapter 24
Nexus
Analytics Tab – Grouping Sets Example
This report will generate Grouping Sets that also include Rollup and Cube. Notice that we dragged the Product_ID to the Product. We dragged the Sale_Date column to the Date Column and we dragged the Daily_Sales column to the Sum. We then checked the Grouping Sets, Rollup and Cube on the top right. The report is now ready to be executed. Page 709
Chapter 24
Nexus
Analytics Tab – Grouping Sets Answer Set
Notice now that there are three Result Sets. The picture above shows Result Set 3 which is the Group by Grouping Sets. The Result 1 tab will show the Group by Rollup and the Result 2 tab will show the Group by Cube.
Page 710
Chapter 24
Nexus
Nexus Data Movement
“If you have ever had to build a load script or convert table structures between different systems, you have experienced the impossible. We spent 7 years to make sure our users could do it with a single click of a button.” - Tera-Tom Coffing
Page 711
Chapter 24
Nexus
Moving a Single Table To a Different System
Just Right Click on a table and choose "Move Data".
Just right click on any single table and select Move Data. The data movement screen will appear. Check out the next slide.
Page 712
Chapter 24
Nexus
The Single Table Data Movement Screen Lite Speed is for smaller tables, but Warp Speed is for large table.
This button will show the size and number of rows of the source table
Choose your Target system and put in your login information once and Nexus will remember the next time. Above, we are moving the Addresses table from SQL Server to Teradata. When the EXECUTE button is hit, the table is converted automatically and moved. Simple and easy! Wait until you see the Database Mover! It is next.
Page 713
Chapter 24
Nexus
Moving an Entire Database To a Different System
Just Right Click on a Database and choose "Move Data".
Just right click on any database and select Move Data. The database mover screen will appear. Check out the next slide.
Page 714
Chapter 24
Nexus
The Database Mover Screen Lite Speed is for small tables. Warp Speed is for large table. Auto chooses Lite or Warp based on size parameters in the Options tab
Check the tables or move through the Views and then press the blue button.
Select all the tables, a single table, some of the tables or choose to move through the views (bottom left),and then press the blue arrow button. We are moving 19 tables from SQL Server to Teradata. Hit EXECUTE and all of the tables move. Don't forget though to check out the Options tab. You can set your parameters there. The next slide will show the Options Tab. Page 715
Chapter 24
Nexus
The Database Mover Options Tab
The Row Count and Table size parameters take effect for Lite or Warp Speed if you select AUTO on the previous screen.
The Options tab allows you to set more detailed parameters. Once you set them the first time, the Nexus Chameleon will remember them the next time (as defaults). You can change them as you see fit.
Page 716
Chapter 24
Nexus
Converting DDL Table Structures
1. Right Click on a Database and a menu appears. 2. Choose CONVERT TABLE STRUCTURES. 3. Pick the Database you want to convert to.
Right Click on a Database
We right clicked on a Teradata database and chose Convert Table Structures and then chose to convert the Teradata tables to Hadoop. Check out the next couple of screens on the following pages.
Page 717
Chapter 24
Nexus
Converting DDL Table Structures
Check the tables you want converted and press the big blue button.
You can click on the table’s box (red arrow) and it checks all the tables. You can uncheck any table you don't want, but once you have the tables you want converted checked, then you just press the big blue arrow. The tables will move over to the right in the To Be Converted area. Just hit Execute (at the top) and the table structures (DDL) will be converted. This example has converted 19 Teradata tables to Hadoop table structures. Check out the DDL Nexus creates on the next page.
Page 718
Chapter 24
Nexus
Converting DDL Table Structures
Nexus converts and creates the new DDL. You can logion to the system and paste in the table structures. This complex and difficult project sometimes takes a month, but with the Nexus it takes a minute.
Page 719
Chapter 24
Nexus
Compare and Synchronize
“Cloud computing will provide a necessity for companies to compare and synchronize tables and data across platforms. Nexus has once again shown that it is ahead of the curve.” - Tera-Tom Coffing
Page 720
Chapter 24
Nexus
Compare Two Different Databases From Different Systems
Our source system is a SQL Server system
Our target system is a Teradata system
Drag up or down the table names so the target and source tables are aligned
Uncheck any boxes you don't want in the Comparison
Nexus can compare two separate databases table by table across different platforms. Choose your source system and the database and then choose your target system and the database. Line up the tables for comparison and hit EXECUTE! Page 721
Chapter 24
Nexus
Comparisons Down to the Column Level
Hit the Columns Tab and see a comparison of each table's columns
Blue column colors indicate the compare key(s)
You can drag columns to rearrange them to align perfectly from source to target
Nexus even shows you the column by column comparisons. You can also move the column names up or down to make sure that everything is aligned down to the column level. Check the columns you want compared and hit EXECUTE!
Page 722
Chapter 24
Nexus
The Results Tab
These table had some differences. Hit Go to see the differences
No Differences between these tables
Scroll Bar so you can see all your tables
The Results Tab will show you all of the table comparisons. If two tables being compared were exactly the same then the result will be NO DIFFERENCES. If there are differences between two tables you can VIEW DIFFERENCES. Page 723
Chapter 24
Nexus
View Differences The In Both With Differences Tab shows the rows that are the same, but with a little difference in a column(s) value
The Full Differences Tab shows the differences in both tables
The In Source, Not Target Tab shows rows that are in the Source, but Not in the Target
The In Target, Not Source Tab shows rows that are in the Target, but Not in the Source
The View Differences Tab will show you the differences between two tables being compared. Above, you can see that there are two extra rows in the Target (Teradata table) that do not reside in the Source (SQL Server table).
Page 724
Chapter 24
Nexus
Synchronizing Differences In the Results Tab
We are synchronizing the Source to the Target.
Hit GO and see the next screen!
The Results tab will either show Differences or No Differences. The Drop Down arrow next to the GO button will allow you to Synchronize the Source to the Target or the Target to the Source. Above, we have chosen to Sync the Source to the Target. Page 725
Chapter 24
Nexus
Synchronizing Differences In the Results Tab
We are synchronizing the Source to the Target. You can now hit the Perform Synchronization button
The final Synchronization screen gives you more options, but when you are ready to perform the synchronization you can hit the Perform Synchronization button and the magic will happen.
Page 726
Chapter 24
Nexus
Hound Dog Compression
“Using Multi-Value compression on a Teradata table is a win-win. Large tables are about 35% smaller, 35% faster and take up about 35% less network traffic. The only negative is you have to figure out the correct algorithm and write the DDL. We spent a long time making this all happen automatically.” - Tera-Tom Coffing
Page 727
Chapter 24
Nexus
Hound Dog Compression on Teradata
Right Click on a Teradata Database
Save yourself an enormous amount of money by using the Hound Dog Compression tool. It is as easy as a right click on a Teradata database and then choosing Hound Dog Compress Database. Check out the next screen to see how easily it is done.
Page 728
Chapter 24
Nexus
Hound Dog Compression on Teradata
Check the tables and press the blue button.
You can click on the table’s box (red arrow) and it checks all the tables. You can uncheck any table you don't want, but once you have the tables you want to be compressed just press the big blue arrow. The tables will move over to the right in the Table to Compress area. Just hit Execute (at the top) and the tables will be compressed. You can then look at the dashboard tab (top right) and see the compression savings for every table.
Page 729