382 9 7MB
English Pages 540 Year 2006
Professional
SQL Server™ 2005 XML Scott Klein
Professional
SQL Server™ 2005 XML Scott Klein
Professional SQL Server™ 2005 XML Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2006 by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN-13: 978-0-7645-9792-3 ISBN-10: 0-7645-9792-2 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 1MA/SR/RS/QV/IN Library of Congress Cataloging-in-Publication Data: Klein, Scott, 1966Professional SQL Server 2005 XML / Scott Klein. p. cm. Includes index. ISBN-13: 978-0-7645-9792-3 (paper/website) ISBN-10: 0-7645-9792-2 (paper/website) 1. SQL server. 2. Client/server computing. 3. XML (Document markup language) I. Title. QA76.9.C55K545 2005 005.2’768--dc22 2005029721 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or online at http://www.wiley.com/go/permissions. LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE
AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ. For general information on our other products and services please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this book. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
About the Author Scott Klein is a software developer and architect, and his passion for SQL Server, .NET, and all things XML led him to Greenville, South Carolina, where he currently works as a SQL/.NET developer for CSI, a software solutions company. He has written several articles for TopXML (www.TopXLM.com) and is a frequent speaker at SQL Server and .NET user groups around Greenville and the surrounding areas. When he is not sitting in front of a computer or spending time with his family, he can usually be found aboard his Yamaha at the local motocross track.
Acknowledgments Writing a book is a daunting task. Writing your first book is just downright intimidating. The better the support people you have assisting and guiding you, the easier the task becomes. Therefore, it is only appropriate to thank those individuals who made this project much easier than it could have been. First and foremost, Clay Andres for sticking with the book idea when it seemed like the idea wasn’t going anywhere. A huge thanks to the folks at Wiley for making this book happen. Brian Herrmann, my awesome development editor, was truly that. With my being a first time book author, Brian was a tremendous help and a sheer delight to work with. Thanks, Brian. Thanks also to Jim Minatel, for accepting the book idea and letting me write it, and to Derek Comingore, for technically reviewing this book and providing priceless feedback and help. Thank you, Derek. I would be remiss if I didn’t mention the following individuals for their assistance in providing information. Primarily, I must thank Irwin Dolobowsky, my main contact at Microsoft. Irwin was my go-to guy, a life saver on many occasions. If he didn’t know the answer, he knew who did or would find out who did. Also included in the list of Microsoft people to thank are Michael Rys, Arpan Desai, Srik Raghavan, Mark Fussell, Vineet Rao, and Beysim Sezgin. Thank you, to all of you. Enough cannot be said about the love and support of my family. For my wife, Lynelle, who held the house together for the 8+ months I spent upstairs. And to my children, who were patient with their father knowing that they soon would get their dad back. I love you all. I can only hope the next book is less daunting.
Credits Senior Acquisitions Editor
Project Coordinator
Jim Minatel
Kristie Rees
Development Editor
Graphics and Production Specialists
Brian Herrmann
Production Editors
Carrie A. Foster Lauren Goddard Denny Hager Joyce Haughey Jennifer Heleine Alicia B. South
Jonathan Coppola Tim Tate
Quality Control Technicians
Copy Editor
Laura Albert John Greenough
Technical Editor Derek Comingore
Kathryn Duggan
Proofreading and Indexing Editorial Manager Mary Beth Wakefield
Production Manager Tim Tate
Vice President and Executive Group Publisher Richard Swadley
Vice President and Executive Publisher Joseph B. Wikert
TECHBOOKS Production Services
Contents Introduction
xv
Part I: Introduction to SQL Server 2005 XML
1
Chapter 1: What’s New in Version 2.0 of the .NET Framework for XML
3
System.xml Version 2.0 Enhancements and New Features Performance XMLTextWriter and XMLTextReader XMLReader and XMLWriter XSLT Processing XML Schema Validation
Type Support XPathDocument XPathNavigator XML Query Architecture XmlReader, XmlReaderSettings, XmlWriter, and XmlWriterSettings Summary
Chapter 2: What’s New in SQL Server 2005 XML xml data type xml data type Column xml Variable XML Parameter Function Return
Indexes on the xml data type Primary Index Secondary Index
XQuery XQuery Structure Additional Concepts
XML Data Modification Language Insert Delete Update
Transact-SQL Enhancements
4 5 5 5 5 6
6 8 9 10 11 13
15 16 17 17 18 18
19 19 19
21 21 22
26 27 28 28
29
Introduction I have a new favorite word, courtesy of a 1961 Robert Heinlein novel titled Stranger in a Strange Land, and emphasized by Rod Paddock in the March/April 2005 CoDe Magazine article titled “Grokking .NET.” The word is Grok, and not only is the meaning profound, the word is just fun to say. In the novel, the word Grok is Martian and means to “understand so thoroughly that the observer becomes a part of the observed,” but it applies to this book as well because this book is intended to help you Grok the new XML technologies in SQL Server 2005. Microsoft is serious about XML and it could not be more evident than with the release of SQL Server 2005, supporting a full-blown new xml data type. This new data type can be used as a column or in variables and stored procedures. It also supports technologies such as XQuery and XML Data Manipulation Language, which provides full query and data modification capabilities on the xml data type. The same focus has been taken to support the new xml data type on the client, and significant changes and enhancements have been made in version 2.0 of the .NET Framework as well as Visual Studio 2005. Why put all the work into the backend when you can’t utilize it from the client? For this reason, this the focus of the book’s energy is on those changes and improvements. Microsoft also made some significant improvements to SQLXML, and SQL Server 2005 comes with SQLXML 4.0. The majority of these changes were made to support the new xml data type, but some improvements were also made in the security and performance areas to give you a better experience when dealing with XML.
Whom This Book Is For This book is for developers with a desire to learn about this new and exciting technology and how it can be a benefit in their environment. While a previous knowledge of SQL Server 2000, T-SQL, and previous versions of SQLXML will come in handy, it is certainly not a perquisite to reading this book. A decent understanding about XML and related technologies (such as XQuery) will also be useful when reading this book, but it isn’t necessary.
What This Book Covers This focus of this book is in three primary areas. First and foremost is the new xml data type and serverside XML processing with associated topics such as indexing and querying of the xml data type. The book then turns its focus on the client-side processing of the xml data type with an emphasis on the new and enhanced technologies found in SQLXML 4.0. Lastly, the book takes a look at the new enhancements and changes to the .NET Framework and ADO.NET for the support of the new xml data type and CLR integration in SQL Server 2005.
Introduction
How This Book Is Structured The book is organized into a number of parts and sections to help you better grasp the new technology coming in SQL Server. The first couple of parts, focusing on SQL Server 2005, lay the foundation for the rest of the book, which builds on that foundation by discussing how the new version of the .NET Framework, Visual Studio 2005, and the integration of the CLR can add tremendous benefit to your environment. This book is structured as follows.
Part I—Introduction to SQL Server 2005 XML ❑
Chapter 1, “What’s New in Version 2.0 of the .NET Framework for XML,” takes a look at a few of the new features included in the new version of the .NET Framework as it pertains to XML.
❑
Chapter 2, “What’s New in SQL Server 2005 XML,” provides an overview of the changes and enhancements between SQL Server 2000 and SQL Server 2005.
❑
Chapter 3, “Installing SQL Server 2005,” provides a quick walkthrough and explanation to installing SQL Server 2005.
Part II—Server-Side XML Processing in SQL Server 2005 ❑
Chapter 4, “xml data type,” introduces the xml data type.
❑
Chapter 5, “Querying and Modifying XML Data in SQL Server 2005,” discusses how to query and modify the xml data type.
❑
Chapter 6, “Indexing XML Data in SQL Server 2005,” discusses indexing on the xml data type.
❑
Chapter 7, “XML Schemas in SQL Server 2005,” discusses XML schemas and XML schema collections.
❑
Chapter 8, “Transact-SQL Enhancements to FOR XML and OPENXML,” talks about the T-SQL changes and enhancements in SQL Server 2005.
❑
Chapter 9, “CLR Support in SQL Server 2005,” provides an overview of the CLR integration in SQL Server 2005.
Part III—Client-Side XML Processing in SQL Server 2005
xvi
❑
Chapter 10, “Client-Side Support for the xml data type,” discusses the support of the xml data type from the client with topics such as SQLXML classes.
❑
Chapter 11, “Client-Side XML Processing with SQLXML 4.0,” talks about the changes and enhancements to SQLXML 4.0 with a focus on the new SQL Native Client.
Introduction ❑
Chapter 12, “Creating and Querying XML Views,” talks about XML views and XSD schemas.
❑
Chapter 13, “Updating the XML View Using Updategrams,” digs into the changes and improvements to updategrams.
❑
Chapter 14, “Bulk Loading XML Data Through the XML View,” talks about the XML Bulk Load utility and discusses changes provided by SQLXML 4.0.
❑
Chapter 15, “SQLXML Data Access Methods,” discusses more about the SQL Native Client and other data access methods such as ADO, OLE DB, and ODBC.
❑
Chapter 16, “Using XSLT in SQL Server 2005,” provides an overview and introduction of XSLT.
Part IV—SQL Server 2005, SqlXml, and SOAP ❑
Chapter 17, “Web Service (SOAP) Support in SQL Server 2005,” introduces and discusses SQL Server 2005 endpoints (Web Services).
❑
Chapter 18, “SOAP at the Client,” builds on Chapter 18, discussing how to consume and use a SQL Server 2005 endpoint.
❑
Chapter 19, “Web Service Description Language (WSDL),” introduces and discusses WSDL files, using the built-in files and what to consider when you want to create your own WSDL file.
Part V—SQL Server 2005 and Visual Studio 2005 ❑
Chapter 20, “SQL Server 2005 SQLXML Managed Classes,” introduces SQLXML managed classes and how to use them from the client with Visual Studio 2005.
❑
Chapter 21, “Working with Assemblies,” introduces assemblies and discusses how to create and use them in SQL Server 2005 and Visual Studio 2005.
❑
Chapter 22, “Creating .NET Routines,” introduces .NET routines and discusses how to create and use them in SQL Server 2005 and Visual Studio 2005.
❑
Chapter 23, “ADO.NET,” discusses some of the changes and enhancements to ADO.NET 2.0, such as asynchronous command operations, query notifications, and support of the xml data type.
❑
Chapter 24, “ADO.NET 2.0 Guidelines and Best Practices,” provides some guidelines and best practices for ADO.NET 2.0.
❑
Chapter 25, “Case Study — Putting It All Together,” provides a case in which most of the technologies discussed in this book are used.
❑
Appendix A, “XQuery in SQL Server 2005,” provides a brief introduction to the support, syntax, and usage of XQuery in SQL Server 2005.
xvii
Introduction
What You Need to Use This Book All of the examples in this book require the following: ❑
SQL Server 2005
❑
Visual Studio 2005
While it is possible to run the products on separate computers, the examples in this book were done with both products running on the same computer.
Book Conventions To help you get the most from the text and keep track of what’s happening, we’ve used a number of conventions throughout the book.
Boxes like this one hold important, not-to-be forgotten information that is directly relevant to the surrounding text.
Tips, hints, tricks, and asides to the current discussion are offset and placed in italics like this. As for styles in the text: ❑
We italicize new terms and important words when we introduce them.
❑
We show keyboard strokes like this: Ctrl+A.
❑
We show file names, URLs, and code within the text like so: persistence.properties.
❑
We present code in two different ways: In code examples we highlight new and important code with a gray background. The gray highlighting is not used for code that’s less important in the present context, or has been shown before.
Source Code As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All of the source code used in this book is available for download at http://www.wrox.com. Once at the site, simply locate the book’s title (either by using the Search box or by using one of the title lists) and click the Download Code link on the book’s detail page to obtain all the source code for the book. Because many books have similar titles, you may find it easiest to search by ISBN; for this book, the ISBN is 0-7645-9792-2.
xviii
Introduction Once you download the code, just decompress it with your favorite compression tool. Alternately, you can go to the main Wrox code download page at http://www.wrox.com/dynamic/books/download .aspx to see the code available for this book and all other Wrox books.
Errata We make every effort to ensure that there are no errors in the text or in the code. However, no one is perfect, and mistakes do occur. If you find an error in one of our books, like a spelling mistake or faulty piece of code, we would be very grateful for your feedback. By sending in errata you may save another reader hours of frustration and at the same time you will be helping us provide even higher quality information. To find the errata page for this book, go to http://www.wrox.com and locate the title using the Search box or one of the title lists. Then, on the book details page, click the Book Errata link. On this page, you can view all errata that has been submitted for this book and posted by Wrox editors. A complete book list including links to each book’s errata is also available at www.wrox.com/misc-pages/booklist .shtml. If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport .shtml and complete the form there to send us the error you have found. We’ll check the information and, if appropriate, post a message to the book’s errata page and fix the problem in subsequent editions of the book.
p2p.wrox.com For author and peer discussion, join the P2P forums at p2p.wrox.com. The forums are a Web-based system for you to post messages relating to Wrox books and related technologies and interact with other readers and technology users. The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums. Wrox authors, editors, other industry experts, and your fellow readers are present on these forums. At http://p2p.wrox.com you will find a number of different forums that will help you not only as you read this book, but also as you develop your own applications. To join the forums, just follow these steps:
1. 2. 3. 4.
Go to p2p.wrox.com and click the Register link. Read the terms of use and click Agree. Complete the required information to join as well as any optional information you wish to provide and click Submit. You will receive an e-mail with information describing how to verify your account and complete the joining process.
You can read messages in the forums without joining P2P, but in order to post your own messages, you must join.
xix
Introduction Once you join, you can post new messages and respond to messages other users post. You can read messages at any time on the Web. If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing. For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works as well as many common questions specific to P2P and Wrox books. To read the FAQs, click the FAQ link on any P2P page.
xx
Par t I: Introduction to SQL Ser ver 2005 XML Chapter 1: What’s New in Version 2.0 of the .NET Framework for XML Chapter 2: What’s New in SQL Server 2005 XML Chapter 3: Installing SQL Server 2005
What’s New in Version 2.0 of the .NET Framework for XML You are probably saying to yourself, “Whoa, wait a minute, I thought this book was about XML technology in SQL Server 2005.” Yes, that is true. So why start the book off with a chapter about the XML technology found in version 2.0 of the .NET Framework? Since the inception of the .NET Framework, Microsoft has taken a serious approach to supporting XML, a fact proven by looking at the amount of functionality provided in the System.Xml namespace, a group of classes specifically designed for the reading, writing, and updating of XML. Even in the first version of the .NET Framework, the support for XML was tremendous. The list of supported XML functionality included, but was not limited to, the following: ❑
Integration with ADO.NET
❑
Compliance with W3C standards
❑
Data source querying (XQuery)
❑
XML Schema support
❑
Ease of use
Microsoft set out to create a technology that dealt with data access using XML. Users of System.Xml in version 1.x of the .NET Framework agree that, on the whole, the technology contained a great number of useful classes that made dealing with XML and its related technologies a delight. Even with all of the great advantages with version 1.1, it was not without its shortcomings. First and foremost, performance was an issue. Because of the way XML is processed, any obstacle or holdup in processing had a performance effect on the rest of the application. Security was another issue. For example, in the XML 1.0 specification, no precaution was taken to secure XML, which led to Denial of Service attacks via DTDs. Not good. The XmlTextReader had its own problems in that it could be subclassed and run in semitrusted code.
Chapter 1 The inclusion of the CLR (Common Language Runtime) in SQL Server 2005 further strengthens the importance of understanding the XML technology from both sides, server and client. While the primary focus of this book is the support of XML in SQL Server 2005, a small handful of chapters focus on uncovering and understanding XML support in version 2.0 of the .NET Framework, and more important, how to utilize this technology in conjunction with SQL Server 2005 XML to get the most power and efficiency out of your application. The entire goal of XML in version 2.0 of the .NET Framework boils down to a handful of priorities, with performance and W3C compliance at the top of the list. These are immediately followed by topics such as ease of use, or pluggable, meaning that the components are based on classes in the .NET Framework that can be easily substituted. Also included in the list is tighter integration with ADO.NET, which allows for datasets to read and write XML using the XmlReader and XmlWriter classes. This chapter outlines some of the major feature enhancements made to the System.xml namespace in version 2.0 of the .NET Framework. If you look at all the changes made to the System.xml namespace, that list could possibly take up a very large portion of a book. The goal of this chapter, however, is to highlight the handful of significant changes that you will most likely use on a day-to-day basis to help improve your XML experience.
System.xml Version 2.0 Enhancements and New Features The following list contains the System.xml enhancements that are covered in this chapter: ❑
Performance
❑
Type support
❑
XPathDocument
❑
XPathEditableNavigator
❑
XML query architecture
❑
XmlReader, XmlWriter, and XmlReaderSettings
Ideally, this list would include XQuery support. Unfortunately, in a January 2005 MSDN article, Microsoft announced that it would be pulling client-side XQuery support in version 2.0 of the .NET Framework. While the pains of realization set in, their reasons are justifiable. The main reason for pulling XQuery support was for the simple reason of timing. XQuery has yet to become a W3C recommendation and since it has not yet, this opens XQuery up for some changes. This put Microsoft in the peculiar situation of trying to meet the requests of its customers while trying to keep with future compatibility. Microsoft did not want to support a technology that could possibly change. That is not to say, however, that you won’t ever see support for client-side XQuery. Microsoft’s goal is to add it back in once XQuery has reached recommendation — which I hope will happen quickly. Time to dig right in. The following section deals with arguably the most important enhancement to version 2.0 of the .NET Framework: performance.
4
What’s New in Version 2.0 of the .NET Framework for XML
Performance You have to admit that developers like it when things go fast, and the faster the better. Developers absolutely hate waiting. XML performance is no different. This section, then, discusses the places where Microsoft focused the majority of the performance improvements. There isn’t any code in this section to try out, but feel free to run some performance tests using some of the concepts discussed in this section.
XMLTextWriter and XMLTextReader To begin with, the XMLTextWriter and XMLTextReader have been significantly re-written to cut these two call times nearly in half. Both of these classes have been completely rewritten to use a common code path.
XMLReader and XMLWriter The XmlReader and XMLWriter classes can now be created via the Create method. In fact, they outperform the XmlTextReader and XmlTextWriter and as is discussed a little bit later, the Create method is now the preferred method of reading and writing XML documents.
XSLT Processing XSLT processing performance has dramatically increased in version 2.0 of the .NET Framework. To understand why, you need to understand the XslTransform class. The XslTransform class, found in the System.Xml.Xsl namespace, is the brains behind XSLT. Its job is to transform the contents of one XML document into another XML document that is different in structure. The XslTransform class is the XSLT processor. In version 1.1 of the .NET Framework, the XslTransform class was based on version 3.0 of the MSXML XSLT processor. Since then, version 4.0 of the MSXML XSLT processor came out and included enhancements that vastly improved the performance of the XSLT processor. So what’s up with version 2.0 of the .NET Framework? The idea with version 2.0 of the .NET Framework was to improve better yet the XSLT processing beyond that of the MSXML 4.0 XSLT processor. In order to do this, Microsoft completely rebuilt the XSLT processor from the ground up. The new processor is now called the XslCompileTransform class and lives in the System.Xml.Xsl namespace. This new class has the same query runtime architecture as does the CLR, which means that it is compiled down to intermediate format at compile time. There is an upside and downside to this. The downside is that it will take longer to compile your XSLT style sheet. The upside is that the runtime execution is much faster. Because there is no XQuery support at this time, performance improvements in the XslCompileTransform class are critical since XML filter and transformation still need to use XSLT and XPath. To help with this, Microsoft added XSLT debugger support in Visual Studio 2005 to debug style sheets. This comes in handy.
5
Chapter 1
XML Schema Validation There is one major reason why XML Schema validation performance has improved, and that is type support. Type support will be defined in more detail in the next section; however, for XML Schema validation, type support comes into play in a huge way when you try to load or transform an XML document. When an XML document is loaded into a reader and a schema applied to it, CLR types are used to store the XML. This is useful because xs:long is now stored as a CLR long. First, the XML stores better this way. Second, there’s no more of this useless untyped string stuff. Type support also applies when creating an XPathDocument by applying XSLT to an original XPathDocument. In this scenario, the types are passed from one document to another without having to copy to an untyped string and then reparse them back the original type. This in itself is a tremendous performance boost, especially when linking multiple XML components. Conversion between schema types and CLR types was possible in version 1.1 using the XmlConverter helper class, but conversion support is now extended to any XmlReader, XmlWrite, and XPathNavigator class, discussed in the next section.
Type Suppor t While XQuery support has been removed from version 2.0 of the .NET Framework, type support for many of the XML classes now offers type conversions. Classes such as the XmlReader, XmlWrite, and XPathNavigator are all now type-aware, and support conversion between CLR types and XML schema types. In version 1.0 of the .NET Framework, type conversion was done by using the xmlConvert method, which enabled the conversion of a schema data type to a CLR (or .NET Framework) data type. For example, the following code demonstrates how to convert an xml string value to a CLR Double data type using the XmlConvert in version 1.0 of the .NET Framework: Imports System.Xml ‘declare local variables Dim xtr As XmlTextReader = New XmlTextReader(“c:\testxml.xml”) Dim SupplierID As Integer ‘loop through the xml file Do While xtr.Read() If xtr.NodeType = XmlNodeType.Element Then Select Case xtr.Name Case “SupplierID” SupplierID = XmlConvert.ToInt32(xtr.ReadInnerXml()) End Select End If Loop
6
What’s New in Version 2.0 of the .NET Framework for XML While converting an untyped value of an XML node to a .NET Framework data type is still supported in version 2.0 of the .NET Framework, you can accomplish this same thing via a single method call new to version 2.0 of the .NET Framework. Using the ReadValueAs method call provides improved performance (because of the single method call) and is easier to use. For example, you could rewrite the previous code as follows: Imports System.Xml ‘declare local variables Dim xtr As XmlTextReader = New XmlTextReader(“c:\testxml.xml”) Dim SupplierID As Integer ‘loop through the file Do While xtr.Read() If xtr.NodeType = XmlNodeType.Element Then Select Case xtr.Name Case “SupplierID” SupplierID = xtr.ReadElementContentAsInt() End Select End If Loop
The same principle can be applied to attributes and collections as well. For example, element values (as long as they are separated by spaces) can be read into an array of values such as the following: Dim I as integer Dim elementvalues() as integer = xtr.ReadValueAs(TypeOf(elementvalues()) For each I in elementvalues() Console.WriteLine(i) Next I
So far the discussion has revolved around untyped values, meaning that all the values have been read from the XML document and stored as a Unicode string value that are then converted into a .NET Framework data type. An XML document associated with an XML schema through a namespace is said to be typed. Type conversion applies to typed XML as well because the types can be stored in the native .NET Framework data type. For example, xs:double types are stored as .NET Double types. No conversion is necessary; again, improving performance. All the examples thus far have used the XmlReader, and as much fairness should be given to the XmlWriter for Type conversion, which it has. The new WriteValue method on the XmlWriter class accomplishes the same as the ReadValueAs does for the XmlReader class. In the following example, the WriteValue method is used to write CLR values to an XML document: Imports System.Xml Dim BikeSize As Integer = 250 Dim Manufacturer As String = “Yamaha”
7
Chapter 1 Dim xws As XmlWriterSettings = New XmlWriterSettings xws.Indent = True Dim xw As XmlWriter = XmlWriter.Create(“c:\motocross.xml”, xws) xw.WriteStartDocument() xw.WriteStartElement(“Motocross”) xw.WriteStartElement(“Team”) xw.WriteStartAttribute(“Manufacturer”) xw.WriteValue(Manufacturer) xw.WriteEndAttribute() xw.WriteStartElement(“Rider”) xw.WriteStartAttribute(“Size”) xw.WriteValue(BikeSize) xw.WriteEndAttribute() xw.WriteElementString(“RiderName”, “Tim Ferry”) xw.WriteEndElement() xw.WriteEndElement() xw.WriteEndDocument() xw.Close()
Running this code produces the following results in the c:\testmotocross.xml file:
Tim Ferry
Now that a lot of the XML classes are type-aware, they are able to raise the schema types with additional conversion support between the schema types and their CLR type counterparts.
XPathDocument The XPathDocument was included in version 1 of the .Net Framework as an alternative to the DOM for XML Document storage. Built on the XPath data model, the primary goal of XPathDocument was to provide efficient XSLT queries. If the purpose of the XPathDocument is for XML Document storage, then what happened to the DOM? The DOM is still around and probably won’t be going away any time soon. However, there are reasons why an alternative was necessary. First, the acceptance of XML is moving at an extremely fast rate, much faster than the W3C can keep up with the DOM recommendations. Second, the DOM was never really intended for use with XML as a data storage facility, specifically when trying to query the data. The DOM was created at the time when XML was just being adopted and obtaining a foothold in the development communities. Since then, XML acceptance has accelerated greatly and the DOM has not made the adjustments necessary to keep up in improvements. For example, XML documents are reaching high levels of capacity and the DOM API is having a hard time adapting to these types of enterprise applications.
8
What’s New in Version 2.0 of the .NET Framework for XML Basically, the DOM has three shortcomings. First, the DOM API is losing its hold on the XML neighborhood with the introduction of XmlReader and XmlWriter as ways to read and write XML documents. Most developers are ready to admit that the DOM is not the friendliest technology to grasp. The System.Xml class provided an easy way to read and write XML documents. Second, the DOM data model is based on XML syntax and query language syntax is not. This makes for inefficient XML document querying. Lastly, application modifications are a must when trying to find better ways to store XML in the application. This is primarily due to the fact that there is no way to store XML documents. Version 2.0 of the .NET Framework has greatly improved the XPathDocument by building on better query support and XPathNavigator API found in version 1. The goal of the XPathDocument in version 2.0 was to build a much better XML store. To do that, a number of improvements were made, including the following: ❑
XmlWriter to write XML content
❑
Capability to load and save XML documents
❑
Capability to accept or reject XML document changes
❑
XML store type support
What you will find is that the XPathDocument has all of the capabilities of the XmlDocument class with the added features of great querying functionality. On top of that, you can work in a disconnected state and track the changes made to the XML document. The next section includes a number of examples to demonstrate loading, editing, and saving XML documents.
XPathNavigator The XPathNavigator class provides a mechanism for the navigation and editing of XML content and providing methods for the editing of nodes in the XML tree. In version 1.1 of the .NET Framework, the XPathNavigator class was based purely on version 1.0 of the XPath data model. In version 2.0 of the .NET Framework, the XPathNavigator class is based on the XQuery 1.0 and XPath 2.0 data models. As part of the System.Xml.XPath namespace, the XPathNavigator class allows for very easy XML document navigation and editing. Using the XML document example created previously, the following code loads that XML document and appends a new Rider element using the XmlWriter and XPathNavigator classes: Dim xpd as XPathDocument = New XPathDocument(“c:\motocross.xml”) Dim xpn as XPathDocument = xpd.CreateNavigator Xpen.MoveToFirstChild() Xpen.MoveToNext() Using xw As XmlWriter = xpn.AppendChild xw.WriteStartElement(“Bike”)
9
Chapter 1 xw.WriteAttributeString(“Size”, “250”) xw.WriteElementString(“RiderName”, “Chad Reed”) xw.WriteEndElement() xpd.Save(“c:\motocross.xml”) End Using
The move from version 1.0 of XPath to version 2.0 is important for several reasons. First, there are better querying capabilities. For example, version 1.0 of XPath supported only four types, whereas version 2.0 supports 19 types. The second reason is better performance. XQuery 1.0 and XPath 2.0 nearly share the same foundation; XPath 2.0 is a very explicit subset of the XQuery 1.0 language. Because of this close relationship between the two, once you have learned one, you nearly understand the other.
XML Quer y Architecture The XML query architecture provides the capability to query XML documents using different methods such as XPath and XSLT (with XQuery to be provided later). The classes that provide this functionality can be found in the System.Xml.Xsl namespace. Part of this functionality is the capability to transform XML data using an XSLT style sheet. In version 2.0 of the .NET Framework, transforming XML data is accomplished by calling the XslCompileTransform class, which is the new XSLT processor. The XslCompileTransform class was mentioned previously during the discussion of performance. That section covered the topic of how the XslCompileTransform was created to improve XSLT performance. In this section, however, the focus of discussion will be on using the new XSLT processor and its associated methods. The XslCompileTransform class replaces the XslTransform class in version 1.0 of the .NET Framework. Therefore, it is needless to say that the Load and Transform methods of the XslTransform class are also obsolete. What replaces them? The XslCompileTransform is very similar in architecture to the XslTransform class in that it also has two methods: the Compile method and the Execute method. The Transform method of the XslCompileTransform class does exactly what the Compile method of the XsltCommand class did: it compiles the XSLT style sheet specified by the overload parameter. For example, the following code compiles the style sheet specified by the XmlReader: Dim ss as String = “c:\motocross.xsl”) Dim xr as XmlReader = XmlReader.Create(ss) Xr.ReadToDescendant(“xsl:stylesheet”) Dim xct as XslCompiledTransform = new XslCompiledTransform xct.Transform(xw)
In this example, you create the XmlReader, and then use its ReadToDescendant property to advance the XmlReader to the next descendant element using the qualified name. The XslCompileTransform is then created and the Transform method is called with the Reader. The next step is to call the Execute method to execute the transform using the compiled style sheet. Using the previous example, add the following code:
10
What’s New in Version 2.0 of the .NET Framework for XML Dim ss as String = “c:\motocross.xsl”) Dim xr as XmlReader = XmlReader.Create(ss) Xr.ReadToDescendant(“xsl:stylesheet”) Dim xct as XslCompileTransform = new XslCompileTransform xct.Transform(xw) Dim xpd as XPathDocument = New XPathDocument(“c:\motocross2.xml”) Dim xw as XmlWriter = XmlWriter.Create(Console.Out) Xs.Execute(New XPathDocument(“c:\motocross2.xml”), xw) Xw.close
The Execute method takes two input types for the source document: the IXPathNavigatable interface or a string URI. The IXPathNavigatable interface is implemented in the XmlNode or XPathDocument classes and represents an in-memory cache of the XML data. Both classes provide editing capabilities. The other option is to use the source document URI as the XSLT input. If this is the case, you will need to use an XmlResolver to resolve the URI (which is also passed to the Execute method). Transformations can be applied to an entire document or a node fragment. However you’re transforming a node fragment, you need to create an object containing the node fragment and pass that object to the Execute method.
XmlReader, XmlReaderSettings, XmlWriter, and XmlWriterSettings Throughout this chapter you have seen a number of examples of how to use the XmlReader and XmlWriter classes. This section highlights a number of new methods that complement the existing methods of both of these classes. The static Create method on both the XmlReader and XmlWriter classes is now the recommended way to create XmlReader and XmlWriter objects. The Create method provides a mechanism in which features can be specified that you want both of these classes to support. As seen previously, when combined with the XmlReaderSettings class, you can enable and disable features by using the properties of the XmlReaderSettings, which are then passed to the XmlReader and XmlWriter classes. By using the Create method together with the XmlReaderSettings class, you get the following benefits: ❑
You can specify the features you want the XmlReader and XmlWriter objects to support.
❑
You can add features to existing XmlReader and XmlWriter objects. For example, you can use the Create method to accept another XmlReader or XmlWriter object and you don’t have to create the original object via the Create method.
❑
You can create multiple XmlReaders and XmlWriters using the same settings with the same functionality. The reverse of that is also true. You can also modify the XmlReaderSettings and create new XmlReader and XmlWriter objects with completely different feature sets.
11
Chapter 1 ❑
You can take advantage of certain features only available on XmlReader and XmlWriter objects when created by the Create method, such as better XML 1.0 recommendation compliance.
❑
The ConformanceLevel property of the XmlWriterSettings class configures the XmlWriter to check and guarantee that the XML document being written complies with XML rules. Certain rules can be set so that, depending on the level set, you can check the XML document to make sure it is a well-formed XML document. There are three levels: ❑
Auto: This level should be used only when you are absolutely sure that the data you are processing will always be well-formed.
❑
Document: This level ensures that the data stream being read or written meets XML 1.0 recommendation and can be consumed by any XML processor; otherwise an exception will be thrown.
❑
Fragment: This level ensures that the XML data meets the rules for a well-formed XML fragment (basically, a well-formed XML document that does not have a root element). It also ensures that the XML document can be consumed by any XML processor.
Reading this list, you would think that it couldn’t get any better. To tell you the truth, there are additional benefits with some of the items. For example, in some cases when you use the ConformanceLevel property, it automatically tries to fix an error instead of throwing an exception. If it finds a mismatched open tag, it will close the tag. It is time to finish this chapter off with an example that utilizes a lot of what you learned: Dim BikeSize As Integer = 250 Dim Manufacturer As String = “Yamaha” Dim xws As XmlWriterSettings = New XmlWriterSettings xws.Indent = True xws.ConformanceLevel = ConformanceLevel.Document Dim xw As XmlWriter = XmlWriter.Create(“c:\motocross.xml”, xws) xw.WriteStartDocument() xw.WriteStartElement(“Motocross”) xw.WriteStartElement(“Team”) xw.WriteStartAttribute(“Manufacturer”) xw.WriteValue(Manufacturer) xw.WriteEndAttribute() ‘First Rider xw.WriteStartElement(“Rider”) xw.WriteStartAttribute(“Size”) xw.WriteValue(BikeSize) xw.WriteEndAttribute() xw.WriteElementString(“RiderName”, “Tim Ferry”) xw.WriteEndElement() ‘Second Rider xw.WriteStartElement(“Rider”) xw.WriteStartAttribute(“Size”) xw.WriteValue(BikeSize) xw.WriteEndAttribute() xw.WriteElementString(“RiderName”, “Chad Reed”) xw.WriteEndElement() xw.WriteEndDocument() xw.Close()
12
What’s New in Version 2.0 of the .NET Framework for XML The preceding example creates an XML document and writes it to a file. That file is then reloaded, and using the XPathEditableNavigator and XPathNavigator, a new node is placed in the XML document and resaved.
Summar y Now that you have an idea of the new XML features that appear in version 2.0 of the .NET Framework, you should also understand why this chapter was included in the book. Microsoft is taking a serious stance on XML technology and it is really starting to show with a lot of the features covered in this chapter. Performance in XML is imperative to overall application performance, so this was a great place to start. As discussed, many improvements were made in this area so that XML performance was not the bottleneck in application performance. You also spent a little bit of time looking at where those performance improvements were made, such as modifications to certain classes sharing the same code path and complete class re-writes. You read about the new type support added to the XmlReader, XmlWriter, and XmlNavigator classes, which contributes to the overall performance of XML, but more important, makes it much easier to read and write XML without the headaches of data type conversions. You will probably agree that the XPathDocument and XPathEditableNavigation were fun to read and put to test. This is some absolutely cool technology that will make working with XML much easier and a lot more fun than in the past as compared to the DOM. The DOM isn’t going away, but these technologies are far better suited for XML storage. The enhancements to the XmlWriter, XmlReader, XmlReaderSettings, and XmlWriterSettings are a welcomed improvement, as you learned how easy it is to read, write, and modify XML documents. Last, the topic of XML query architecture was discussed, along with the new XslCompiledTransform class, which replaces the XslTransform class, as well as how to use the new methods on that class. In the next chapter you discover what’s new in SQL Server 2005 XML (which is why you bought the book, right?) and all the new XML support it provides.
13
What’s New in SQL Ser ver 2005 XML SQL Server 2000 made great strides in supporting XML and related technologies. When it first came out, it supported the following: ❑
Exposing relational data as XML
❑
Shredding XML documents into row sets
❑
Using XDR schemas to map XML schemas to database schemas
❑
Using XPath to query XML
❑
Using HTTP to query SQL Server data
Subsequent SQLXML web releases were blessed with additional features such as the following: ❑
updategrams
❑
Client-side FOR XML
❑
SQLXML managed classes
❑
Support for Web Services
❑
Support for XSD schemas
With the most recent release, SQLXML Service Pack 3, there were many additions such as building a web service with SQL Server 2000, querying relational data with XPath, and the inclusion of .NET managed classes, to name a few. This release was a welcome event to developers who were looking to extend this functionality and take it to higher grounds.
Chapter 2 While each service pack provided better XML support, some very nice and needed enhancements and additions were made to SQL Server 2005 that let developers know that Microsoft is serious in supporting XML and XML technologies. This chapter examines the new XML features in SQL Server 2005 and some of the enhancements made to SQL Server 2005 that existed in SQL Server 2000. All of these items are discussed in detail in later chapters, but the focus of this chapter is to highlight the new and improved XML features of this release of SQL Server. With SQL Server 2005 there are six major improvements for XML support: ❑
New xml data type
❑
Indexes on xml type columns
❑
XQuery support
❑
XML DML (XML Data Modification Language)
❑
Transact-SQL enhancements (FOR XML and OPENXML)
❑
HTTP SOAP Access
Each topic is discussed in greater detail later in the book, so the goal of this chapter is to familiarize you with these six topics. The first point of discussion is the new xml data type.
xml data type One of the most important new features of SQL Server 2005 is the addition of an xml data type. This new data type supports the storing of XML documents and XML fragments (discussed in Chapter 4) in a SQL Server database, as well as storing XML in Transact-SQL variables. Overall, there are four major uses for the xml data type: ❑
Column type
❑
Variable type
❑
Parameter type
❑
Function return type
Realistically, there is a fifth use — using the xml data type in a CAST or CONVERT function used to convert an expression from one data type to another — which is covered in detail in Chapter 4. The xml data type supports both typed and untyped XML. Simply put, when a collection of XML schemas is associated with the xml data type column, parameter, or variable, it is said to be typed. Otherwise is said to be untyped. Nothing can really happen without the xml data type, so the following section introduces the xml data type column.
16
What’s New in SQL Server 2005 XML
xml data type Column Selecting the xml data type is just like selecting the int or varchar data type when you add a column to a table. It is a built-in data type just like all the other types. Simply select the xml data type from the drop down list as shown in Figure 2-1.
Figure 2-1
If you are not a visual person and like to sling code, you can also add it by using the following code: CREATE TABLE Employees (EmployeeID int, EmployeeInfo xml)
Alternatively, if the table is already created and you want to add an xml data type column, you can use this code: ALTER TABLE Employees ADD EmployeeInfo xml
You don’t have to do anything special when setting the properties of the xml data type. However, you should be aware of one property: the XML schema namespace property. This property is a built-in function that accepts the namespace of a target XML schema, an XML schema collection, or the name of a relational schema. If this value is left empty, an XML instance is automatically mapped that has the necessary XML schemas. It does not return the predefined XML schemas.
xml Variable Use of the xml data type goes far beyond simply creating a table. You can also use it as a variable. The following syntax demonstrates how to use it as a variable: DECLARE @xmlVar xml
The declaration of an xml variable is easy, nothing really complex. The xml data type has numerous uses as a variable. For example, Figure 2-2 shows how you can create a stored procedure that uses the xml data type as a variable in that stored procedure: CREATE PROCEDURE GetEmployeeInfo @EmployeeID [int] WITH EXECUTE AS CALLER AS DECLARE @EmployeeInfo xml
17
Chapter 2 Looking briefly at this stored procedure, an xml type variable is declared, which is used to store an XML document or fragment. In addition to using the xml data type as a variable, you can also use it as a parameter, which is the subject of the next section.
XML Parameter Using the same stored procedure as an example, modify it as follows: CREATE PROCEDURE GetEmployeeInfo @EmployeeID [int], @EmployeeInfo [xml] OUTPUT WITH EXECUTE AS CALLER AS
This example uses the xml data type as an output parameter. The calling application, whether it is SQL Server itself or a .NET application, calls this stored procedure and passes XML to it.
Function Return Similar to the variable, the xml data type can also be used as a return value. The following example uses the xml data type to return the results of a SELECT statement in this function. The return value is set as the xml data type, which is then returned via the RETURN statement: CREATE FUNCTION dbo.ReturnXML() RETURNS xml WITH EXECUTE AS CALLER AS BEGIN DECLARE @EmployeeInfo xml SET @EmployeeInfo = ‘
Scott Klein ’ RETURN(@EmployeeInfo) END GO
With the function created, it can now be executed as follows: SELECT dbo.ReturnXML()
The results returned look like the following: ScottKlein
18
What’s New in SQL Server 2005 XML In this example, the return value was hard coded into the stored procedure, but the purpose was to illustrate the functionality of the xml data type. In Chapter 4, you learn how to query the xml data type, which you can also build into a function such as the example here. These examples have been quite easy, but in the real world the amount of data being queried is not so little. That is why it is also possible to index the xml data type.
Indexes on the xml data type The importance of indexes on the xml data type is crucial because xml data type columns are stored as binary large objects, or BLOB’s. When you query xml data type columns, these BLOB’s are shredded at runtime to evaluate the query if there are no indexes on the column. If there is a lot of data, this can be extremely costly in terms of performance and processing. For this reason, SQL Server 2005 has introduced indexes on the xml data type columns.
Primary Index There are two types of indexes: primary XML and secondary XML indexes. Creating these indexes is not rocket science as shown here: CREATE PRIMARY XML INDEX PriI_Employee_EmployeeInfo ON Employees(EmployeeInfo)
This example created a primary index on the Employee table on the EmployeeInfo column. A primary XML index is a shredded version of what is in the xml column. When this index is created, it writes several rows of data for each XML BLOB in the column. A clustered index must already exist on the primary key of the table on which the XML index is being created. This is explained in more detail in Chapter 6.
Typically, when a table is dropped from a database, all the columns associated with that table are dropped as well. Not so with an xml column. An xml column with an associated index cannot be deleted or dropped from a table. The index must be removed first before the table can be deleted.
Secondary Index You can further improve performance by creating a secondary XML index on the same column. It is not required, but could really improve performance on large amounts of data.
A primary index must exist before a secondary index can be created for a specific column.
19
Chapter 2 This chapter shows only syntax, as Chapter 6 is dedicated to XML indexes and contains plenty of hands-on examples. There are three types of secondary XML indexes: PATH, VALUE, and PROPERTY.
PATH Use this index when you want to index the paths and node values as the key fields. This can significantly increase query performance. You create a PATH index using the following syntax: CREATE XML INDEX SecI_Employee_EmployeeInfo_PATH ON Employees (EmployeeInfo) USING XML INDEX PriI_Employee_EmployeeInfo FOR PATH
In a PATH secondary index, the path and node values are key columns that provide a more efficient search for searching paths.
VALUE There are two reasons why you would want to use the VALUE index. First, if your queries are based on values, and second, if the path includes a wild card character or isn’t fully specified. As with the PATH index, using the VALUE index in these situations increases query performance. The key columns for the VALUE index are the node and path values of the primary XML index. Creating a VALUE index is not that much different from creating a PATH index. You need to make some simple changes to the previous code: CREATE XML INDEX SecI_Employee_EmployeeInfo_VALUE ON Employees (EmployeeInfo)USING XML INDEX PriI_Employee_EmployeeInfo FOR VALUE
If your query is retrieving values from an XML document and you don’t know the element or attribute names that contain the values, the VALUE index can come in very useful. You’ll notice that in each of these syntax examples, the secondary index was created using the primary index as the “primary” index. This means that these indexes are not individually acting indexes but that they work in tandem to improve query performance.
PROPERTY The PROPERTY index is built on the key columns of the primary XML index such as Primary Key, path, or node values. The syntax is as follows: CREATE XML INDEX SecI_Employee_EmployeeInfo_PROPERTY ON Employees (EmployeeInfo) USING XML INDEX PriI_Employee_EmployeeInfo FOR PROPERTY
The PROPERTY index is beneficial when your query returns one or multiple values from a single XML instance, such as when you use the value() method of the xml data type.
20
What’s New in SQL Server 2005 XML
XQuer y For SQL Server 2005, Microsoft has added server-side support for XQuery. Based on the existing XPath query language, XQuery is a language that can query structured, and even semi-structured, XML data. Coupled with the xml data type, this allows for quick and efficient storage and retrieval of XML data.
As of this writing, SQL Server 2005 Beta 2 comes with the XQuery language based on the November 2003 Last Call working draft. What does that mean? Primarily, it means that the XQuery found in SQL Server 2005 may be a bit different from the specifications of the final recommendation from the W3C. Not to worry though; the differences are covered later on in the book, as well as what you might find in the final release of SQL Server 2005. Also as of this writing, Microsoft has decided not to ship a client-side XQuery support in the .NET Framework 2.0. Again, what does this mean? It means you get to continue to use all that XSLT and XPath knowledge and experience, at least for the short term. And you thought it wouldn’t pay off.
Server-side support for XQuery means that you get all the added benefits of the XPath language plus additional support for things like better iteration, sorting of results, and the ability to shape the results of your queried XML (typically called construction). The XQuery data model is what drives the XQuery language, which means, just like the xml data type, you can have typed or untyped results as well as XML fragments.
XQuery Structure In its simplistic form, an XQuery expression contains a query prolog (your namespace declaration) and the actual query expression. What follows is a simple example of an XQuery expression: SELECT Instructions.query(‘declare namespace MSAW=”http://schemas.microsoft.com/ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; /MSAW:root/MSAW:Location[LocationID=50]’) AS Result FROM Production.ProductModel WHERE ProductModelID = 10
The first two lines are actually one line of code and should be entered as such. A hard return was used here to separate them for line continuation and readability only. If you type this syntax in exactly as shown (as two lines), you will receive an error. Before getting deeper into the discussion of XQuery’s structure, run the following SQL statement against the AdventureWorks database: SELECT Instructions FROM Production.ProductModel WHERE ProductModelID = 10
Take the results of the above SQL query statement and save them to your hard drive as Production.xml for future reference.
21
Chapter 2 There are basically two parts to this query. The first part contains the namespace declaration (declare namespace ...) and the actual query (/MSAW:root/MSAW:Location[LocationID=50]). The results of this query are shown in Figure 2-2.
Figure 2-2
What you see is a section (or fragment) of the XML stored in the Instruction column. By specifying the query piece you were able to return just the section of the XML you were looking for. The namespace — a group or collection of elements and attributes with a unique name — is equally important. Namespaces provide the mechanism for mapping elements and attributes within an XML document to an associated schema. Running this query without the namespace would result in an error, such as the following: There is no element named ‘{http://}’
Additional Concepts There are a few concepts that you need to understand in order to fully grasp how XQuery works. An introduction to XQuery can be found in Chapter 5. Those concepts are the following: ❑
Sequence
❑
Atomization
❑
Quantification
❑
Type promotion
The first of these concepts, sequence, is discussed in the next section.
22
What’s New in SQL Server 2005 XML Sequence A sequence is simply the result of an XQuery expression that contains a list of XML nodes and fragments as well as XSD types. An item is an individual entry in the sequence and can be a node of one of the following types: ❑
Element
❑
Attribute
❑
Text
❑
Comment
❑
Document
❑
Processing instruction
The following example demonstrates how to construct a query that will return a single element sequence: SELECT Instructions.query(‘ This is a test ’) AS Result FROM Production.ProductModel WHERE ProductModelID = 10
The result of this query returns the following: This is a test
Not very impressive, but it does demonstrate that you have the ability to retrieve specific information from within your XML document. For example, the following query returns the first step (previously shown in Figure 2-2) from your original query: SELECT Instructions.query(‘declare namespace MSAW=”http://schemas.microsoft.com/_ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; FOR $Inst in /MSAW:root Return ( {string(($Inst/MSAW:Location[LocationID=50]/MSAW:step[1][1]} ) ‘) AS Result FROM Production.ProductModel WHERE ProductModelID = 10
You should see the following results: Insert aluminum sheet MS-6061 into tool T-99 framing tool
This query is almost identical to your original query, but what you told it to do was retrieve a more specific value (or sequence) from the XML document.
23
Chapter 2 Atomization Atomization is the process of retrieving the typed value of an item, and many times it is implied. In certain scenarios, atomization allows you to return the value of an item without having to query for it again. The following example queries the MachineHours attribute from the previous example and returns multiple values. The first value is the original queried value. The second value uses the data() function to extract the same value and increments it by 1 (adds 1 to it). The third value matches the second value, but is returned automatically using atomization instead of using the data() function again. SELECT Instructions.query (‘declare namespace MSAW=”http://schemas.microsoft.com/_ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; FOR $AW in / MSAW:root/ MSAW:Location[2] Return ” ‘) FROM Production.ProductModel WHERE ProductModelID = 10
This says, “Take a look at the second Location node and return the MachineHours attribute. Now add 1 to it and return that value.” Your results should look similar to the following:
Quantification There are two types of quantification, Existential and Universal, which specify semantics for Boolean operators when applied to two sequences. A quantified expression in XQuery uses the following syntax: { some | every } in satisfies
Existential The Existential quantifier says that for any two sequences, if an item in the first sequence has a match in the second sequence based on the comparison operator used, then the return value is true. In other words, if a value in the first sequence matches a value in the second sequence based on a specific comparison operator, then the return value is true. Look at the following example: SELECT Instructions.query (‘declare namespace MSAW=”http://schemas.microsoft.com/_ sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; if (every $AW in // MSAW:Location Satisfies $AW/@MachineHours) then All Locations have Machine Hours else Not all Locations have Machine Hours ‘) FROM Production.ProductModel WHERE ProductModelID = 10
24
What’s New in SQL Server 2005 XML When you run this, you get the following in return: Not all locations have Machine Hours
If you still have your Production.xml file open, take a look at each element and notice that not every element has a MachineHours attribute, so your results are correct. Now make the following changes to the then and else clauses and run the query again: SELECT Instructions.query (‘declare namespace MSAW=”http://schemas.microsoft.com/_ sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; if {every $AW in // MSAW:Location satisfies $AW/@LocationID) then All Locations have a LocationID else Not all Locations have a LocationID ‘) FROM Production.ProductModel WHERE ProductModelID = 10
What results did you get? What you should get back is a little bit different message than the first example, stating that all locations have a LocationID, as follows: All Locations have a LocationID
The reason for this is because in this last example the existential quantifier returned a value of true, meaning that for each location, a LocationID was found. That was not the case in the first example.
Universal The Universal quantifier says that for any two sequences, if all items in the first sequence have a match in the second sequence, then the return value is true. The following example looks to see if any of the pictures in the Product table has an angle of front: SELECT CatalogDescription.value (‘declare namespace MSAW=”http://schemas.microsoft.com/_ sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; if {some $AW in //MSAW: ProductDescription/MSAW:Picture satisfies $AW/@MSAW:Angle=”front” then “True” else “False” ‘, ‘varchar(5)’) as PictureAngleFront FROM Production.ProductModel WHERE ProductModelID = 35
Your results should look like the following: PictureAngleFront TRUE
25
Chapter 2 There are two changes in this code you need to be aware of. First is the schema declaration. Second, you are looking to see if any of pictures have an Angle of front, so your XQuery statement uses some instead of every. If you had used every and not all of the Picture elements had an Angle of front then your return value would be False.
Type Promotion Type promotion allows for type casting for numeric expressions if one of the values is untyped or for numeric types. For example, you may want to compare numerical values and determine which value is higher or lower. In these cases, you can implicitly cast the two numbers similar to the following: Max(xs:long(“1.0”), xs:integer(“2.0”))
In the preceding example, both values were typed, but what if you needed to type cast an untyped value with a typed value? The following example shows how to cast a typed value with an untyped value: Max(xtd:untypedatomic(35), xs:integer(“2.0”))
This section barely touched these topics, but for good reason: Chapter 5 and Appendix A are dedicated to XPath, XQuery, and the querying of the xml data type and how they are used in SQL Server 2005. This chapter is just intended to whet your appetite.
XML Data Modification Language XQuery is a very powerful language intended to allow for querying XML data, and while it is very useful and powerful, it does have its limitations. The biggest limitation is the inability to modify XML documents. To compensate for the shortcomings of XQuery, Microsoft has added on to the XQuery implementation in SQL Server 2005 by giving developers the ability to insert, update, and delete XML documents and fragments. Microsoft did this by creating the XML Data Modification Language (or XML DML). The XML DML is built on and around the XQuery language as defined by the W3C, but it enhances XQuery by allowing full-on insert, update, and delete access anywhere the xml data type is used. Using XML DML is as simple as adding one of the three following words to your XQuery statement: ❑
Insert
❑
Delete
❑
Update
The following sections show you some easy examples.
26
What’s New in SQL Server 2005 XML
Insert The basic syntax for inserting a node or nodes into an XML document looks like this: INSERT Expression1 ( {as first | as last } into | after | before Expression2 )
Now take a look at an example that inserts an element into an XML document. Suppose you had the following XML document:
You want to insert some employee information into the XML document. The following example inserts a firstname element into the XML document underneath the EmployeeInformation element: INSERT Evel Into (/ROOT/ProductDescription/EmployeeInformation)[1]
Your XML document now looks like the following:
Evel
To try this example, type the following code into Query Builder and execute it: DECLARE @xmldoc xml SET @xmldoc = ‘
’ SET @xmldoc.modify(‘ insert Evel into (/ROOT/Employee/EmployeeInformation)[1]’) GO
27
Chapter 2 You didn’t need to specify first or last because it was the first child added. The as first and as last keywords are saved for Chapter 5 when this is discussed in depth. In that chapter, added attributes are discussed as well.
Delete The syntax for deleting a node or nodes is really simple: DELETE expression
Use the previous example and delete the element you just added. Here is the original code with the delete code added (shown with a gray screen background): DECLARE @xmldoc xml SET @xmldoc = ‘
’ SELECT @xmldoc SET @xmldoc.modify(‘ insert Evel into (/ROOT/Employee/EmployeeInformation)[1]’) SELECT @xmldoc SET @xmldoc.modify(‘ delete /Root/Employee/EmployeeInformation/FirstName’) SELECT @xmldoc GO
Execute this in Query Builder. Each SELECT @xmldoc returns a row that you can click and see the results of the query. When you click on the first result you see the original query. Click the second result and you see the same information but with the element added. Click the third result and you see the element has been removed. As with the Insert, you can delete attributes and much more.
Update You can update the contents of an XML document with the modify method. The syntax looks like this: Replace value of Expression1 with Expression2 Expression1 is the node whose value you want to update. Expression2 is the new value of the node.
It doesn’t make sense to update a node or element. Typically you just delete the offending node. The replace value of syntax is used to update the value of a node.
28
What’s New in SQL Server 2005 XML Use your previous XML document to illustrate updating the value of a node in an XML document. Make the appropriate changes to the XML as follows: DECLARE @xmldoc xml SET @xmldoc = ‘
Scott
’ SELECT @xmldoc SET @xmldoc.modify(‘ replace value of (/Root/Employee/EmployeeInformation/FirstName/text())[1] with “Calvin” ‘) SELECT @xmldoc GO
Execute this in Query Builder. When you click the first result, you see the original query with the value of element of Scott. Click the second result and you see the same information but with the value of the element updated to Calvin. While you have barely scratched the surface of the XML DML topic, you should start to see the flexibility and power it adds to XQuery. You’ll learn more about this in Chapter 5.
Transact-SQL Enhancements For those of you who have used FOR XML and OPENXML before, you’ll whole-heartedly welcome the Transact-SQL changes to both of these. SQL Server 2000 introduced FOR XML and OPENXML as a clause to the SELECT statement. The FOR XML clause supported three modes — RAW, AUTO, and EXPLICIT. The RAW mode created a single element per row returned. It did not allow nesting. The AUTO mode generated nesting based on the SELECT statement. The EXPLICIT mode gave you greater control over the shape of your XML. The downside to using pre-SQL Server 2005 FOR XML clauses was that FOR XML could be used only on the client side. And it wasn’t the easiest thing to figure out, especially if you were trying to generate somewhat complex EXPLICIT structures. Fortunately, there is SQL Server 2005. This section covers the changes and enhancements to FOR XML and OPENXML in the new version of SQL Server.
FOR XML In SQL Server 2005 many improvements and new features were added to make FOR XML more useful, including the following:
29
Chapter 2 ❑
Integrating FOR XML with the xml data type
❑
Nesting FOR XML expressions
❑
The new PATH mode
❑
Assigning FOR XML results
This section covers these enhancements in the order in which they appear in the list, beginning with integration of FOR XML with the xml data type.
xml data type Integration The addition of the xml data type in SQL Server provides the capability to directly generate XML. You can request that the query results of a FOR XML query be returned as an xml data type by specifying the new TYPE directive. For example: SELECT EmployeeID, FirstName, LastName FROM Employees Order By EmployeeID FOR XML AUTO, TYPE
Your results look like this:
Nesting FOR XML Expressions SQL Server 2000 supported the capability to specify the FOR XML clause at the top level of the SELECT statement only. This meant that any results returned to you were in need of further manipulation. SQL Server 2005 now provides the capability to generate FOR XML queries that return results in the xml data type for server side processing. This means that you can write nested queries where the inner query returns the results to the outer query as an xml data type. For this example, drop and create the following two tables and associated data: DROP TABLE Employees GO CREATE TABLE [dbo].[Employees]( [EmployeeID] [int] NOT NULL, [FirstName] [varchar](25) NULL, [LastName] [varchar](25) NULL ) ON [PRIMARY] GO CREATE TABLE [dbo].[EmployeePhone]( [EmployeePhoneID] [int] NOT NULL, [EmployeeID] [int] NOT NULL
30
What’s New in SQL Server 2005 XML [CellPhoneNumber] [varchar](15) NULL, [HomePhoneNumber] [varchar](15) NULL ) ON [PRIMARY] GO INSERT INTO Employees (EmployeeID, FirstName, LastName) VALUES (1, ‘Fred’, ‘Flintstone’) GO INSERT INTO Employees (EmployeeID, FirstName, LastName) VALUES (2, ‘Barney’, ‘Rubble’) GO INSERT INTO EmployeePhone (EmployeePhoneID, EmployeeID, CellPhoneNumber, HomePhoneNumber) VALUES (1, 1, ‘555-BED-ROCK’, ‘555-555-5555’) GO
The following example illustrates a nested query using FOR XML: SELECT EmployeeID, FirstName, LastName, (SELECT CellPhoneNumber, HomePhoneNumber FROM EmployeePhone ep WHERE ep.EmployeeID = e.EmployeeID FOR XML AUTO, TYPE) FROM Employees e WHERE EMPLOYEEID = 23 FOR XML AUTO, TYPE
In the preceding example, the inner SELECT statement queries the employee phone number and returns it in XML format to the outer query by supplying the FOR XML expression. It is guaranteed to be wellformed because the TYPE directive was supplied. The outer query then runs its query, combining its results with those of the inner results to provide a well-formed XML document.
PATH Mode The PATH mode is a new addition to FOR XML in SQL Server 2005. Do you remember how difficult it was to find your way around the EXPICIT mode? Wouldn’t it be nice to provide the same flexibility and functionality without the complications of the EXPLICIT mode? Fortunately, that is what the PATH mode does. It provides the flexibility of the EXPLICIT mode in a much easier fashion. The PATH mode treats column names and column aliases as XPath expressions, indicating how the values are mapped to XML. The following example, while quite simple, illustrates the syntax of the PATH mode. SELECT ContactID, FirstName, LastName FROM Person.Contact WHERE ContactID = 218 FOR XML PATH
31
Chapter 2 The PATH mode generates element-centric results by default, so the results from this query look like the following:
218 Scott Colvin
Namespaces are not supported when generating XML using the PATH mode. The PATH mode and the rest of the FOR XML enhancements are discussed in much more detail in Chapter 8.
Assigning FOR XML Results FOR XML queries can now return assigned values that allow you to assign the results of a FOR XML query to a variable, as well as insert them into an xml data type column.
For example, you could assign the following FOR XML query result to a variable as follows: CREATE TABLE [dbo].[Vendor]( [VendorID] [int] NOT NULL, [VendorName] [varchar](25) NULL, [VendorAddress] [varchar](25) NULL, [VendorContact] [varchar](25) NULL ) ON [PRIMARY] GO DECLARE @xmlvar xml SET @xmlvar = (SELECT VendorID, VendorName, VendorAddress, VendorContact FROM Vendor FOR XML AUTO, TYPE)
Just as easily, you can insert the results of a FOR XML query directly into a table as follows: CREATE TABLE [dbo].[VendorInfo]( [VendorInfoID] [int] IDENTITY(1, 1) NOT NULL, [VendorInfo] xml NULL, ) ON [PRIMARY] GO INSERT INTO Products (VendorInfo) (SELECT VendorName, VendorAddress, VendorContact FROM Vendors FOR XML AUTO, TYPE) GO
In the preceding example, the VendorInfo column is an xml data type column. In addition to these enhancements, additional enhancements have been made to the RAW and EXPLICIT modes. These FOR XML enhancements allow you to do the following:
32
What’s New in SQL Server 2005 XML ❑
Specify a row element name
❑
Retrieve element-centric XML
❑
Specify the root element
The OPENXML enhancements include the following: ❑
CDATA directive with an element name
❑
elementxsinil column mode
All of these enhancements are discussed in detail in Chapter 8, so stay tuned.
HTTP SOAP Access SQL Server 2005 provides the capability to send HTTP SOAP requests directly to SQL Server without going through an IIS server. This includes the capability to execute Transact-SQL statements and stored procedures (including extended stored procedures) as well as user-defined functions. This functionality works only if SQL Server 2005 is running on Windows Server 2003. Just as important, SQL Server has the capability to function as its own Web Service, which provides the capability to allow any Web Service application to access SQL Server, reduce the need for a firewall with its built-in security, and utilize the Web Service infrastructure by applying predefined schemas to query results in native XML format. No examples are given in this chapter, but Chapters 17 and 18 are dedicated specifically to HTTP SOAP access.
Summar y The main purpose of this chapter was to give you brief look into the new features and enhancements in SQL Server 2005. A large portion of the chapter examined the new xml data type as it applies to columns, variables, and parameters, and the impact it has on the rest of the XML topics. You spent some time looking at how you can index the xml data type to gain performance using primary and secondary indexes. You also learned how SQL Server 2005 stores xml instances and what that means when querying these columns, especially when dealing with large amounts of data. Additionally, this chapter included enough coverage of XQuery to give you a basic understanding of its functionality. However, because XQuery is such an integral part of SQL Server 2005 that you’ll spend a good portion of Chapter 5 on it. XQuery can’t be introduced without also introducing the XML Data Modification Language. As you discovered in this chapter, XML DML makes up for some of the things lacking in XQuery and how you can put the functionality of the data modification language to good use. Like XQuery, though, a couple of pages do not do XML DML justice, so it is also covered in more depth in Chapter 5.
33
Chapter 2 After discussing the xml data type, XQuery, and XML DML, you learned about the Transact-SQL enhancements to FOR XML. As you have probably figured out by now, these improvements could not have come at a better time. FOR XML’s integration with the xml data type was a great blessing, and together with the PATH directive and other enhancements, your FOR XML life just got a lot easier. This chapter barely scratched the surface, though, and FOR XML is covered in detail in Chapter 8. Last, you learned a little bit about the HTTP SOAP capabilities in SQL Server 2005. Chapters 17 and 18 are dedicated to this topic, so this chapter simply introduced you to some of the highlights and features that SQL Server 2005 supports in this area. In the next chapter, you’ll learn how to install and configure SQL Server 2005.
34
Installing SQL Ser ver 2005 The first two chapters of the book highlighted some of the features that are new to SQL Server 2005 and XML, as well as what you can look forward to in the .NET Framework 2.0. Both of those topics will come in very handy later on in the book, so keep all of that newfound information in the back of your mind when reading later chapters. This chapter walks you through the installation of SQL Server 2005 step by step so that you can put it to good use throughout the rest of the book. Fortunately, you will find that the installation is not significantly different from previous versions of SQL Server. If you have already installed it or don’t want to try it out on your own, you can skip to Chapter 4. However, there are a few major differences pertaining to the installation of SQL Server 2005, so you might want to read this chapter to become familiar with those differences. The version used throughout this book is SQL Server 2005 Developers Edition Beta 2, build 3790. Installation requirements aren’t covered in this book, but you can find information regarding hardware and software requirements on Microsoft’s website at www.microsoft.com/sql/ or on the SQL Server installation CD.
Where to Get SQL Ser ver 2005 Beta 2 Express Edition Unless you have an MSDN subscription (Universal, Enterprise, or Professional) the only thing to work with is the SQL Server 2005 Beta 2 Express Edition, available at www.microsoft.com/ downloads/details.aspx?FamilyID=62B348BB-0458-4203-BB03-8BE49E16E6CD& displaylang=en. The SQL Server Express Edition is the next version of MSDE. The Beta 2 version is an evaluation version and is good up to 18 months from the date of installation. After the evaluation period is over, no SQL Server services will start.
Chapter 3 Like its bigger brothers (SQL Server 2005 Enterprise Edition and SQL Server 2005 Developer Edition), the Express Edition also needs the .NET Framework 2.0. But unlike its bigger brothers, the Express Edition does not install it. You have to do that yourself — at least in Beta 2. You can get the .NET Framework 2.0 from www.microsoft.com/downloads/details.aspx? familyid=B7ADC595-717C-4EF7-817B-BDEFD6947019&displaylang=en.
Installing SQL Ser ver 2005 Begin the installation by running the Setup.exe in the root of the CD/DVD or if Autorun does not automatically begin the installation. The first screen to appear is the Welcome/Start screen. Here you have several options. This is a good place to review the requirements for running SQL Server 2005 by selecting the Review hardware and software requirements link. To begin the actual installation, click the Run the SQL Server Installation Wizard link shown in Figure 3-1.
Figure 3-1
The first part of the installation installs software that is necessary prior to installing SQL Server 2005. Three components are required before the installation of SQL Server 2005 can begin: the .NET Framework 2.0, Microsoft SQL Native Client, and Microsoft SQL Server 2005 Beta 2 Setup Support Files, as shown in Figure 3-2.
36
Installing SQL Server 2005
Figure 3-2
The first prerequisite component is the .NET Framework 2.0. As discussed in Chapter 1, SQL Server 2005 uses version 2.0 of the .NET Framework. Luckily, SQL Server 2005 installs version 2.0 for you. Once all the perquisite components have been installed, click Finish. At this point the SQL Server installation begins, as shown in Figure 3-3.
Figure 3-3
37
Chapter 3 Click Next to begin the installation of SQL Server 2005. Figure 3-4 shows an important step in the installation process: the System Configuration Check, or SCC. It verifies a total of twelve items of your system to make sure that your system on which to install SQL Server is configured correctly.
Figure 3-4
We’re lucky this process does not take long and it actually makes multiple checks at one time. The SCC is quite thorough and does not allow the installation to continue if certain requirements are not met. For example, it generates a warning if the computer that SQL Server 2005 is being installed on is less than 600 MHz, but it will not stop the installation. As another example, if the system on which SQL Server 2005 is being installed on has less than 128MB of RAM, it does not allow the installation to continue. It also generates a warning if the amount of memory is between 128MB and 256MB of RAM. For a complete list of SCC checks, see the online help under the topic system configuration checker. The Continue button is available only if all check results are successful, or if failed checks are non-fatal. For any failed check items, resolution to blocking issues is included with results in the report. If everything passes and you are given the green light, click Continue.
38
Installing SQL Server 2005 The next screen, shown in Figure 3-5, allows you to choose which components you want to install. Obviously you want to choose the first option, SQL Server, as that is the minimum required component to run SQL Server 2005. You may also want to select the Workstation components, Books Online and development tools option, which installs some of the components and tools with which to administer SQL Server 2005. Selecting one of these components selects the minimum features necessary to run SQL Server 2005. It is similar to selecting a Typical installation.
Figure 3-5
One of the new features in the installation is the capability to install SQL Server 2005 as a virtual server. Virtual server enables you to run multiple instances of an OS on a single computer. Think of it as running multiple computers on a single computer. You can select the Install as virtual server option if you want SQL Server to support it. Also included in the installation is Reporting Services, which was a separate installation for SQL Server 2000. For a more detailed installation, click the Advanced button, which displays a detailed list of items, shown in Figure 3-6. This option allows you to select which features you want based on the components you selected in the previous step, as well as allowing you to select more detailed features as opposed to a generic component on the previous screen. For example, selecting the SQL Server component on the previous screen basically tells the installer that you want to install SQL Server 2005. This screen lets you select detailed installation options for the SQL Server component such as Replication and Full-Text Search. If you overlooked a component you wanted to install on the previous installation step, you can either select it here or click the Back button to select the desired component. Selecting the components on the Feature Selection screen is preferable over going back because you can select the features for that component and not have to go back.
39
Chapter 3
Figure 3-6
Any feature preceded by a red X means that component will not be installed. Any feature preceded by a white box means that feature will be installed. Any feature preceded by a gray box typically means that you can expand that node and select subfeatures. Most likely, some of those subfeatures have not been selected for installation. You should expand that tree node and preview those subfeatures, as there might be some that you want to install.
Be sure that you install the sample databases, primarily the AdventureWorks database, because you will be working with them throughout the book. They can be installed by expanding the Documentation and Samples node and selecting the Databases node.
If you have already installed SQL Server 2005 and are running the installation again, this screen is used to add and remove features. Click the Next button once you are satisfied with your feature selection. The next screen in the installation process, shown in Figure 3-7, allows you to select the instance in which you would like to install SQL Server 2005. Just like SQL Server 2000, you can run multiple instances of SQL Server 2005 on a computer.
40
Installing SQL Server 2005
Figure 3-7
The Default instance is the default selection. If you install SQL Server 2005 a second time and select the Default instance, the installation will ask you if you want to upgrade your existing Default instance. The same goes for a Named instance. If you type in a Named instance that already exists, the installer will ask you if you would like to upgrade that instance. Each instance of SQL Server runs in its own specific space. In other words, it has its own set of services with its own settings, such as collation and other options. Except for the first installation, you should select a Named instance for each subsequent installation and give that installation a unique instance name, unless you plan on upgrading the desired instance to add or remove features. Once you have selected the instance in which to install SQL Server, click Next. The next step in the installation process is the Service Account setup. This screen (see Figure 3-8) lets you define which services run under which account. You can customize each service to start under a specific account or you can use the built-in System account. This screen also lets you determine which services are automatically started when the SQL Server 2005 computer is started. Once you have configured the services, click Next. The next step in the installation process is the selection of the Authentication Mode, as shown in Figure 3-9. This step defines the credentials with which you will be connecting (authenticating) to SQL Server 2005.
41
Chapter 3
Figure 3-8
Figure 3-9
42
Installing SQL Server 2005 Just as in the previous version of SQL Server, SQL Server 2005 gives you two options for authentication: Windows Authentication Mode or Mixed Mode authentication. Windows Authentication connects the user to SQL Server through a Windows user account. SQL Server validates the account credentials (user name and password) via the Windows operating system. Mixed Mode authentication allows the user to connect either via Windows authentication or SQL Server authentication. If you select Mixed Mode authentication, be aware that there are some changes in SQL Server 2005. New for this release is Strong Password enforcement. No longer will SQL Server allow you to get away with blank passwords or using “password” as the password. For example, if you type in “password” for the sa password, you receive the message shown in Figure 3-10.
Figure 3-10
In fact, SQL Server 2005 does not allow the following as passwords for the sa account: ❑
Blank passwords
❑
The word “Password” or “password”
❑
The word “Admin” or “admin”
❑
The word “Administrator” or “administrator”
❑
The word “sysadmin” or “Sysadmin”
❑
The acronym “sa”
All passwords used must meet a certain set of requirements before SQL Server lets you use them. Any password must meet three of the following four requirements: ❑
Must contain uppercase letters
❑
Must contain lowercase letters
❑
Must contain numbers
❑
Must contain non-alphanumeric characters, such as #, $, &, or @
As the error message suggests, see Authentication Mode in the Books Online for more information on strong passwords. The read is well worth your time. Once you have set your authentication mode, click Next.
43
Chapter 3 The final step in the installation process is setting the collation. Figure 3-11 shows the options available for setting collation and sort order for SQL Server 2005. Collation specifies the SQL Server sorting behavior, meaning how character strings are sorted and compared. If you don’t have any specific sorting or case-sensitivity needs, the default sort order works for most installations.
Figure 3-11
Use the top part of this screen when the installation of SQL Server must match the collation settings of another instance of SQL Server or if it must match the Windows local settings of another computer running SQL Server. Use the SQL collation section for backward compatibility with earlier versions of SQL Server. You should select this option if you want to match compatible settings with SQL Server 8.0 (SQL Server 2000), 7.0, or earlier.
SQL collation cannot be used with Analysis Services. If you select to install SQL Server Analysis Services, SQL Server tries to match the best Windows collation for Analysis Services, based on the SQL collation you select. If the SQL Server collation and Analysis Services collation do not match, your results may not be consistent. Your best bet is to use Windows collation for both.
44
Installing SQL Server 2005 To select separate collation settings for SQL Server and Analysis Services, select the Customize for each service account check box. This enables the drop-down list of services from which to select the desired service. Select the service, and then select your collation and sort order. After you select the appropriate sort order, click Next. The next screen in the installation is the Report Server setup. If you did not select to install the Reporting Services on the Feature Selection screen, you will not see the screen. If you did select Reporting Services, this step allows you specify how a Report Service instance is installed. You can install the default configuration, which installs and configures Report Server for you, or you can install the Report Server, and after the installation is complete you can configure Reporting Services via the Reporting Services Configuration Tool. If you select the option to install the default configuration, clicking the Details button displays a screen with information for the default configuration (see Figure 3-12).
Figure 3-12
After you configure this screen, click Next. The next screen in the wizard is the Error and Usage Report Settings (see Figure 3-13), which allows you to automatically send feedback to Microsoft for any errors generated or features used.
45
Chapter 3 Microsoft uses these error reports to improve SQL Server functionality. All information is treated as confidential.
Figure 3-13
If you select to send Feature Usage data (the second checkbox in Figure 3-13), SQL Server is configured to occasionally send a report to Microsoft containing information about how you are using SQL Server 2005. This information is also treaded confidentially. After you make your selections on this screen, click Next. The next screen in the setup wizard is the overview of the options you selected during the configuration of the setup (see Figure 3-14). You can look over the items you selected, and by clicking the Back button you can change any items.
46
Installing SQL Server 2005
Figure 3-14
If you are satisfied with the selections you made, click Install. At this point the installation begins and you should see a screen that displays the installation progress, similar to Figure 3-15. At the end of the installation, you may be required to reboot. At this point, SQL Server 2005 is installed and you are almost ready to go. Why almost? You need to set a couple of configuration items before you use some of the examples in this book. As well, if you are using SQL Server 2005 and Visual Studio on separate machines, you need to tell SQL Server about it. By default, SQL Server 2005 does not accept remote connections. So if you plan to run SQL Server 2005 and Visual Studio 2005 on separate computers, you need to tell SQL Server that connections will be coming in from a remote computer. You can find this configuration, along with most other SQL Server 2005 configuration items, by opening the Surface Area Configuration form, shown in Figure 3-16.
47
Chapter 3
Figure 3-15
Figure 3-16
48
Installing SQL Server 2005 First, to tell SQL Server to accept remote connections, select the top option, Surface Area Configuration for Services and Connections. This opens the corresponding form, shown in Figure 3-17.
Figure 3-17
To enable remote connections, select the Remote Connections option on the left side of the form. You will notice that by default, SQL Server accepts only local connections. To enable remote connections, click the Local and Remote Connections radio button. This will allow you to select three connection options. Typically, using TCP/IP only will suffice, but if your environment requires a different selection, make the selection and click OK. The next step is to enable the CLR, which is turned off by default. Back on the main screen (shown previously in Figure 3-16), select the bottom option, Surface Area Configuration for Features to open the form displayed in Figure 3-18. To enable the CLR, select the CLR Integration option on the left and then click the Enable CLR Integration check box on the right. Click OK to save the changes. Now you are ready to go!
49
Chapter 3
Figure 3-18
Summar y This chapter walked you through basic steps of installing SQL Server 2005 and highlighted some areas of detail, as well as pointed out some differences between SQL Server 2005 and previous versions of SQL Server. With SQL Server installed, you can now easily work with the examples throughout the book. By now you should have good grasp of what’s new in SQL Server 2005 and an idea of what’s coming up in later chapters. The next few chapters deal with topics that, while not directly specific to SQL Server, will be of great benefit to you in the last half of this book. In particular, the next chapter covers the new xml data type.
50
Par t II: Ser ver-Side XML Processing in SQL Ser ver 2005 Chapter 4: xml data type Chapter 5: Querying and Modifying XML Data in SQL Server 2005 Chapter 6: Indexing XML Data in SQL Server 2005 Chapter 7: XML Schemas in SQL Server 2005 Chapter 8: Transact-SQL Enhancements to FOR XML and OPENXML Chapter 9: CLR Support in SQL Server 2005
xml data type In Chapter 2, you learned about the xml data type and some of the functionality that it exposed, such as some of the xml data type methods and untyped versus typed XML. This chapter, however, discusses aspects of the xml data type not covered in Chapter 2, as well as elaborates on most of the topics introduced in that chapter. The addition of the xml data type provides tremendous support for XML data processing including native support for XML. This means that XML documents, fragments, and values can be stored natively in SQL Server using the xml data type. The xml data type also simplifies modifying XML Data. The goals of this chapter are to examine the xml data type in depth and to expose much more of the functionality that it provides. This chapter covers the following topics: ❑
Typed versus untyped XML
❑
Altering the xml data type column
❑
xml data type methods
❑
Defaults, constraints, and computed columns on xml data type columns
❑
Creating views
❑
XML settings options
❑
Best practices
A lot of the examples throughout this book use the AdventureWorks database that comes with SQL Server 2005. However, some of the examples require the creation of new tables and other objects and refer to a database called Wrox. If you prefer not to use the AdventureWorks database for these examples, feel free to create a new database called Wrox.
Chapter 4
untyped versus typed XML XML comes in two flavors, untyped and typed, and the xml data type in SQL Server 2005 supports them both. This section highlights the differences between untyped and typed XML, as well as some scenarios when you would want to use one over the other.
untyped XML In simple terms, untyped XML means that no schema is associated with an XML document. In reality though, you may have a schema that is perfectly valid for an XML document but have chosen for one reason or another not to associate the schema with the XML document. There are number of reasons you would not want to associate a schema with an XML document, and in a lot of cases it makes sense not to make the association. For example: ❑
Client-side XML validation
❑
Unsupported server schema components
❑
Not well-formed or invalid XML
It is also perfectly suitable to use untyped XML when there is no schema present at all. In any of these cases, the XML document is checked to see if it is well-formed prior to mapping to the xml data type. Be aware that there will be performance issues when using untyped XML because of the node conversions at runtime. Node values are stored as strings internally and a conversion needs to be made before it is added to the xml data type. In Chapter 2, you saw some examples that showed you how to create untyped XML such as columns, variables, and parameters. The xml data type column that was created in the Employee table had no schema or schema collection associated with it, so it is an untyped XML data type column. As explained previously, creating an xml column in a table is fairly straightforward, as shown here: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] NOT NULL ) ON [PRIMARY] GO
Inserting into an untyped XML data type column is not rocket science either, as shown in the following example: DECLARE @xmlvar varchar(200) SET @xmlvar = ‘HoratioHornblower0 5/01/1850’ INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (1, @xmlvar) GO
54
xml data type If you were to then query the new Employee table, the results shown in Figure 4-1 would be returned.
Figure 4-1
Chapter 2 also discussed using the xml data type as variables and parameters, and Figure 4-1 demonstrates how to use the xml data type in a variable, as well as inserting the untyped XML into an xml data type column. Also in Chapter 2, you saw a portion of a stored procedure that accepts an xml data type parameter. Using that code, combined with the preceding code example, the following example demonstrates using the xml data type as a parameter. Using the following code, create a stored procedure called AddEmployee: CREATE PROCEDURE AddEmployee @xmlvar [xml] WITH EXECUTE AS OWNER AS INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (2, @xmlvar)
Then in a query window execute the following code: DECLARE @xmlvar varchar(200) SET @xmlvar = ‘HortensePowdermaker03/01/1932’ EXEC AddEmployee @xmlvar GO
Again, if you were to query the Employee table you would see the results shown in Figure 4-2.
Figure 4-2
While untyped XML may have its place, there are a number of good reasons why you should consider typed XML storage.
55
Chapter 4
typed XML When a schema collection that describes XML data is associated with an XML document, the XML document is said to be typed. In reality, this is the best scenario because it allows for the association of a collection of XML schemas with an XML column, which automatically validates the XML. There are several advantages to using an XML schema. First, XML validation is automatic. Regardless if you are assigning XML to a variable or inserting XML into an XML column, SQL Server automatically applies the schema to the XML for validation. The result of this is better performance because node values are not converted at runtime. Second, XML storage is minimized because the information about the types of elements and attributes is provided in the schema itself, thus providing better conversion interpretation about the values stored. Using typed XML is not all that different from using an untyped xml data type other than the fact that a schema collection is required to have a typed xml data type. This applies to the XML column, parameter, and variable. The untyped examples used previously can be modified to be typed very easily. For example, given the following XML document and schema, creating a table that has a schema collection associated to the xml data type column is quite easy. Suppose you wanted to store the following XML in the xml data type column in your Employee table:
The first step is to create the necessary schema used to create an XML schema collection. The schema collection you create is then used when you create the table. Based on the above XML document, create the following schema:
56
xml data type
You then use the CREATE XML SCHEMA COLLECTION statement to create the schema collection, named EmployeeSchemaCollection. The XML schema must exist prior to associating it with an xml data type column, parameter, or variable. XML schema collections are discussed in detail in Chapter 7. After you’ve created the XML schema collection, you can then use it in the creation of your new table, as illustrated in the following code. The process of associating the schema collection to the xml data type column now makes the column a typed column: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] (EmployeeSchemaCollection) NOT NULL ) ON [PRIMARY] GO
The other method of associating a schema or schema collection to an xml data type column is via the SQL Server Management Studio when creating the table or adding a column, as shown in Figure 4-3.
Figure 4-3
The default selection is to have no schema collection, which makes the column untyped. This is perfectly acceptable and the pros and cons of doing so were explained earlier in the chapter. The preferred option is to select a schema collection to associate to the column. Any schema collections created prior to creating the table (or adding the column to the table) appear in the drop-down list available for selection.
57
Chapter 4 The sys.sys schema collection is the default schema collection if one is not specified and helps determine how well-formed an XML instance is. Schema collections are discussed in Chapter 7. In this chapter, it is only necessary to discuss associating a schema collection to an xml data type.
Making Changes to the xml data type Column Altering the xml data type column is completely allowable, with support provided by the ALTER TABLE statement. An xml data type column can be changed from untyped to typed and vice versa, as well as changed from a character string type column to an xml data type column (typed or untyped). The following example illustrates how to alter a column from a string type to an xml type: /* create the original table */ CREATE TABLE Customer ( [CustomerID] [int] PRIMARY KEY, [CustomerName] [varchar] (100) ) GO /* Insert data into the table */ INSERT INTO Customer (CustomerID, CustomerName) VALUES (1, ‘’) GO /* Change the CustomerName column type to XML type */ ALTER TABLE Customer ALTER COLUMN CustomerName xml GO
This change is allowed because the value inserted into the CustomerName column prior to changing the data type is well-formed XML and is accepted by the xml data type. The following is also allowed: /* create the original table */ CREATE TABLE Customer ( [CustomerID] [int] PRIMARY KEY, [CustomerName] [varchar] (100) ) GO /* Insert data into the table */ INSERT INTO Customer (CustomerID, CustomerName) VALUES (2, ‘Fast Freddys Five Finger Discount’) GO /* Change the CustomerName column type to XML type */ ALTER TABLE Customer ALTER COLUMN CustomerName xml GO
58
xml data type Querying the Customer table returns results similar to Figure 4-4.
Figure 4-4
The statements executed without error, but why did they work? The reason this example works is because no schema was specified when the column was altered to an xml data type column, thus making it untyped.
Converting from untyped to typed An xml data type column can be changed from one type to another (untyped to typed and vice versa). The following example illustrates changing an xml data type column from untyped to typed: ‘First, create the table untyped (no XML schema associated with the xml column) CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] NOT NULL ) ON [PRIMARY] GO ‘Now, make it a typed column - THE SCHEMA COLLECTION MUST EXIST FIRST! ALTER TABLE Employee ALTER COLUMN EmployeeInfo xml ( EmployeeSchemaCollection) GO
As the comments in the code state and as explained in the previous section, the XML schema collection must exist prior to associating it with an xml data type column, parameter, or variable. When this statement is executed, all XML data in the CustomerName column is validated against the schemas in the specified schema collection. There are two things to keep in mind when converting to a typed XML column. First, if any invalid XML documents are found during the validation, the conversion from untyped to typed halts and the conversion does not take place. Second, when altering an xml column from a string or untyped type, the conversion could take awhile on tables with large amounts of data. It should be obvious that the preferred choice is to create the column as typed to begin with, but it is perfectly acceptable to create untyped columns as needs dictate.
xml data type Methods The xml data type comes with five methods that support the querying and modification of XML instances. These xml data type methods are extremely useful when they are used together. Very rarely will you use these methods by themselves, and as the examples demonstrate, the real power and flexibility behind the xml data type is when these methods are used together.
59
Chapter 4 This section examines these five methods: ❑
query()
❑
value()
❑
exist()
❑
nodes()
❑
modify()
The following section discusses each of these methods in detail, beginning with the query() method.
query() If your goal is to return parts or sections of an XML instance, then the query() method is the method of choice. The query() method executes a query by evaluating an XQuery expression against the elements and attributes in an XML instance. Results are returned as untyped XML. The syntax for the query() method is as follows: query(‘XQueryExpression’)
The query() method can be run against any XML instance, such as an xml data type variable or column. For example, the following example uses the query() method to return a portion of an XML instance from a xml data type variable: DECLARE @xmlvar xml SET @xmlvar = ‘
Tim Ferry
Chad Reed
’ SELECT @xmlvar.query(‘/Motocross/Team/Rider’)
In the preceding example, an xml data type variable is declared and an XML instance is assigned to that variable. The last line of code uses the query() method to specify an XQuery expression against the xml data type variable and select a portion of the XML instance. The XQuery expression in the example is asking for everything under the Team node; thus the query returns all of the Rider information. The results are shown in Figure 4-5.
Figure 4-5
60
xml data type The query() method can also be used when querying an xml data type column. The following example uses the query() method to return a section of an XML instance from an xml data type column: SELECT Instructions.query(‘declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; { /MSAW:root/MSAW:Location[1] } ‘) as Result FROM Production.ProductModel WHERE ProductModelID=7
A portion of the results looks like the following:
Work Center - 10 Frame FormingThe following instructions pertain to Work Center 10. ...
Like the other query() method examples, this example uses an XPath expression to query the first (as denoted by the [1] predicate) location node from the Instructions column in the Production.ProductModel table. A predicate is somewhat similar to a WHERE clause. It provides further filtering on a node-set. In the previous example, the predicate said, “where the location is the first Location.” The query() method is very valuable and flexible when querying the xml data type and the goal is to return a portion of an XML document.
value() The value() method is useful when you want to extract node values from an XML instance, particularly an xml data type column, variable, or parameter. It returns the value that the XQuery expression evaluates to. The syntax for this method is as follows: value(XQueryExpression, SQLType)
The first parameter is the XQuery expression that looks for the node value within the XML instance. The second parameter is the string literal value converted to the SQL type specified by this parameter. The following example uses the value() method to extract an attribute from an XML instance: DECLARE @xmlvar xml DECLARE @Team varchar(50) SET @xmlvar = ‘
Tim Ferry
61
Chapter 4
Chad Reed
’ SET @Team = @xmlvar.value(‘(/Motocross/Team/@Manufacturer)[1]’, ‘varchar(50)’) SELECT @Team
The result returned from this is the word Yamaha. The XQuery expression in this example specifies the first attribute in the /Motocross/Team path and returns the attribute for that node. The next example uses the value() method to return a node value from the XML instance: DECLARE @xmlvar xml DECLARE @Team varchar(50) SET @xmlvar = ‘
Tim Ferry
Chad Reed
’ SET @Team = @xmlvar.value(‘(/Motocross/Team/Rider/RiderName)[1]’, ‘varchar(50)’) SELECT @Team
The result returned from this statement is the name of the first rider, Tim Ferry. Just like the previous example, the XQuery expression in the value() method specifies the first RiderName node, signified by the [1] predicate, from which to obtain the results. In the SET @Team statement, change the code to look like the following: SET @Team = @xmlvar.value(‘(/Motocross/Team/Rider/RiderName)[2]’, ‘varchar(50)’)
Now rerun the entire code. The results should now be the second rider, Chad Reed. This is called static typing, which determines an expression’s return type, and is covered in Chapter 5. Both of the preceding examples used the value() method to query an xml data type variable. More common scenarios require the querying of data in an xml data type column. So, the following example uses the value() method to query an xml data type column: SELECT Instructions.value(‘declare namespace msaw =”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; (//msaw:Location/@LocationID)[1]’, ‘int’) as Result FROM Production.ProductModel WHERE Instructions IS NOT NULL
62
xml data type The results from this query list all LocationIDs from the Instruction column.
Type Conversion The value() method uses, when necessary, the CONVERT function of T-SQL to implicitly convert XQuery expression results from the XSD type to its corresponding SQL type. In the following table, the XSD data type to SQL Server 2005 data type mappings are shown to help make the necessary conversion in your program. XSD
SQL Server
boolean
bit
decimal
numeric
double
float
float
real
string
nvarchar(4000), nvarchar(max)
NOTATION
nvarchar
Qname
nvarchar
Duration
varbinary
Datetime
Varbinary
Time
Varbinary
Date
Varbinary
gYearMonth
Varbinary
gYear
Varbinary
gMonthDay
Varbinary
gDay
Varbinary
gMonth
Varbinary
hexBinary
Varbinary
Base64Binary
Varbinary
anyURI
Varbinary
The value() method is a very versatile component of the xml data type and when combined with other xml data type methods, it proves to be even more valuable.
63
Chapter 4
exist() The exist() method allows you to check for the existence of a specific XML fragment in an XML instance. The return result is 1 if it exists, and 0 if it does not. The syntax for the exist() method is as follows: exist(‘XQeuryExpression’)
Consider the following example: DECLARE @xmlvar xml DECLARE @bitvar bit SET @xmlvar = ‘
Tim Ferry
Chad Reed
’ SET @bitvar = @xmlvar.exist(‘/Motocross/Team[@Manufacturer eq xs:string(“Yamaha”)]’) SELECT @bitvar
In the execution of this code, the exist() method returns a 1 because it finds the value of Yamaha in the XML instance. Change the manufacturer to Suzuki and run the code again. The exist() method returns a 0 because it does not find a value of Suzuki in the XML instance. Using that same example, the exist() method can look for node values as well, as illustrated by the following code: DECLARE @xmlvar xml DECLARE @bitvar bit SET @xmlvar = ‘
Kevin Windham
Mike LaRocco
Jeremy McGrath
’ SET @bitvar = @xmlvar.exist(‘/Motocross/Team/Rider/RiderName[text()[1] eq xs:string(“Kevin Windham”)]’) SELECT @bitvar
64
xml data type As with the first example, the exist() method in this example returns a 1 because it finds Kevin Windham in the XML instance. The following two examples use the exist() method with a typed XML instance (the previous examples used untyped XML instances). The first uses the exist() method against an XML variable: DECLARE @intvar int DECLARE @xmlvar xml (Production.ManuInstructionsSchemaCollection) SELECT @xmlvar = Instructions FROM Production.ProductModel WHERE ProductModelID = 7 SET @intvar = @xmlvar.exist(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; /MSAW:root/MSAW:Location[@LocationID=30] ‘) SELECT @intvar
As with the previous examples, the exist() method returns a 1. By changing the @LocationID variable to a value, such as 80, then rerunning the code, the exist() method returns a 0 because it cannot find a Location node with an attribute of LocationID with a value of 80, but it did find one with a value of 30. The second example modifies the previous example, still using the exist() method, but against an xml data type column: SELECT Instructions.exist(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; /MSAW:root/MSAW:Location[@LocationID=50] ‘) FROM Production.ProductModel WHERE ProductModelID = 10
The results of this query also return a value of 1 because the XPath expression inside the exist() method finds a LocationID with a value of 50. The exist() method can also be used in the WHERE clause, as follows: SELECT ProductModelID, Name FROM Production.ProductModel WHERE Instructions.exist(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; /MSAW:root/MSAW:Location[@LocationID=60] ‘) = 1
65
Chapter 4 In this example the SELECT statement selects non-XML columns with the WHERE clause supplying the XQuery expression in the exist() method. The query says to return the ProductModelID and Name columns where a LocationID value of 60 exists within the XML in the Instructions column. The results return two rows: ProductModelID 7 10
Name HL Touring Frame LL Touring Frame
The exist() method is preferred over the value() method when comparing predicates, expressions that evaluate to TRUE or FALSE. UNKOWN is even considered to be a predicate, and in these cases where the expression is returning one of these three, it is good practice to use the exist() method rather than the value() method. For example, if you know for certain that the query expression is returning a value (non-TRUE/FALSE) then the value() method is the way to go. On the other hand, if you are checking to see if a certain node, attribute, or value exists, use the exist() method. The xml data type methods so far have dealt with specific values within an XML instance, but what about the times you need to return the results as relational data? This is where the nodes() method comes in.
nodes() The term shredding in XML terms means converting an xml data type instance into relational data. The nodes() method puts this term to very good use. The purpose of the nodes() method is to specify
which nodes are mapped to a new dataset row. The general syntax of the nodes() method looks like the following: Nodes (XQuery) as Table(Column)
The XQuery parameter specifies the XQuery expression. If the expression returns nodes, then the nodes are included in the result set. Likewise, if the result of the expression is empty, then the result set is also empty. The Table(column) parameter is the name and column of the final result set. This first example uses an xml data type variable: DECLARE @xmlvar xml SET @xmlvar=’
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco
66
xml data type Jeremy McGrath
Ricky Carmichael Broc Hepler
James Stewart Michael Byrne
’ SELECT Motocross.Team.query(‘.’) AS RESULT FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team)
The results are returned as a single result set with four rows, as shown in Figure 4-6.
Figure 4-6
In this example, the nodes() method identifies the nodes in the results of the XQuery expression, returning them as a rowset, with each team being a row. The nodes() method basically said, “Break out each Team into a row,” thus making a result set. Each row in the rowset is a logical copy of the original XML instance. The node in each row, in this case the Team node, matches one of the nodes specified in the XQuery expression. The query() method in this example is used together with the nodes() method to return the appropriate results. The query() method is the method used to query the XML document, and the nodes() method defines how the results are sent back. The query() method can also take an absolute path expression, which means that the query starts on the root node. In the following example, an absolute path expression is used: DECLARE @xmlvar xml SET @xmlvar=’
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael
67
Chapter 4 Broc Hepler
James Stewart Michael Byrne
’ SELECT Motocross.Team.query(‘/Motocross/Team’) AS RESULT FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team)
The results are quite a bit different now. The result set from the first query had each Team in its own row. The results from the absolute path query return four rows with all four Teams in each row (basically four rows for every context node, as shown in Figure 4-7).
Figure 4-7
You should be starting to see the real power behind these methods. When they are used individually they are extremely powerful. When used together, the functionality they provide is nearly endless.
modify All the other methods focus on getting data out of an XML instance. The modify() method’s function in life, on the other hand, is to modify xml type variables or columns. This method takes a XML Data Modification Language statement as a parameter to perform the necessary operation (insert, update, or delete). XML DML was introduced in Chapter 2 and is covered in greater detail in Chapter 5. The syntax of the modify() method looks like this: Modify(XML DML)
The modify() method of the xml data type allows you to insert, update (replace value of), and delete content within an XML instance. The modify() method uses the XML DML to provide those actions on the XML instance. The following example shows you how to use the modify() method: DECLARE @xmldoc xml SET @xmldoc = ‘
68
xml data type ’ SET @xmldoc.modify(‘ insert Knievel into (/Root/Employee/EmployeeInformation)[1]’) SELECT @xmldoc GO
In this example, an xml data type variable is defined and an XML document is assigned to that variable. The modify() method is then executed against that xml data type variable to insert a new node and value. The results of the modify() method on the XML document are as follows:
Knievel
If you recall from Chapter 2, a number of examples used the modify() method with XML DMLModification to modify the XML content, so this should not be new. While the previous example is fairly simple, don’t worry, because an entire section is dedicated to this method and XML DML in Chapter 5. Now that you’re somewhat familiar with all the xml data type methods, the next section shows you how to combine some of the methods within a single statement.
Combining Methods The following example combines the value(), query(), and nodes() methods in a single statement against an xml data type variable. The value() method gets the Manufacturer, the query() method gets the riders for the specific Team, and the nodes() method tells the query to return the results as a rowset: DECLARE @xmlvar xml SET @xmlvar=’
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler
James Stewart Michael Byrne
69
Chapter 4 ’ --SELECT Motocross.Team.query(‘.’) SELECT Motocross.Team.value(‘@Manufacturer’, ‘varchar (50)’) as Manufacturer, Motocross.Team.query(‘Rider’) as Team FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team)
The results of the statement are shown in Figure 4-8.
Figure 4-8
In this example, the query(), value(), and nodes() methods were used to return the results shown. The query() method is the method used to query the entire XML instance, the value() method is used to return the individual Manufacturer values, and the nodes() method is used to format the results as rowsets. Look at one more example using a combination of some of the xml data type methods. The following example uses the exist() method to check to see if any of the Teams have any riders, and if they do not, they aren’t included in the results: DECLARE @xmlvar xml SET @xmlvar=’
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler
’ --SELECT Motocross.Team.query(‘.’) SELECT Motocross.Team.value(‘@Manufacturer’, ‘varchar (50)’) as Manufacturer FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team) WHERE Motocross.Team.exist(‘Rider’) = 1
The results of the statement appear in Figure 4-9.
70
xml data type
Figure 4-9
In this example, the same query(), value(), and exist() methods are used to determine if any of the manufacturers have riders. The query() method queries the XML instance, the value() method returns the manufacturer value for all manufacturers who have riders, provided by the exist() method. If the result of the exist() method is true for each manufacturer then the value() method returns the name of the manufacturer.
Using Operators with Methods Operators allow you to use a table-valued function against each row returned by a query expression from an outer table. In simple terms, the APPLY operator creates a second table-column pair with which to compare to the original table-column pair. Each row from each table-column pair is evaluated against each row in the second table-column pair. The final table-column list is the combination of the second table-column pair being added to the original table-column pair. There are two different APPLY operators: OUTER APPLY and CROSS APPLY. Considering the previous example, you can obtain the same results by using the APPLY operator. The OUTER APPLY operator applies the nodes() method to each row, but the caveat is that it includes rows that have NULL values. To compensate for this, you need to add an additional clause on the WHERE clause, as follows: DECLARE @xmlvar xml SET @xmlvar=’
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler
’ SELECT DISTINCT Motocross.Team.value(‘@Manufacturer’, ‘varchar (50)’) as Manufacturer FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team) OUTER APPLY Motocross.Team.nodes(‘./Rider’) AS Motocross2(Team2) WHERE Motocross2.Team2 IS NOT NULL
71
Chapter 4 By adding the OUTER APPLY operator and modifying the WHERE clause, the returned results will look exactly like what was shown in Figure 4-9. The other APPLY operator, CROSS APPLY, does away with modifying the WHERE clause and applies the nodes() method to each row in the result set, but returns only those rows generated from the nodes() method (much like the exist() method) from the first table-column pair. For example: SELECT Motocross.Team.value(‘@Manufacturer’, ‘varchar (50)’) as Manufacturer FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team) CROSS APPLY Motocross.Team.rows.nodes(‘./Rider) AS Motocross2(Team2)
Any column that the nodes() method returns cannot be used directly. Meaning, the following is not allowable: SELECT Motocross.Team FROM @xmlvar.nodes(‘/Motocross/Team’) Motocross(Team)
So far all the examples have dealt with xml data type variables. The following example uses an xml data type column with the nodes() method. In this example, the value() and query() methods are used to return values from the nodes in the result set. For each given location the SELECT clause returns the LocationID and any tools at the given location: SELECT Instruct.value(‘@LocationID’,’int’) as LocationID, Instruct.query(‘declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; MSAW:step/ MSAW:tool’) as Tool FROM Production.ProductModel CROSS APPLY Instructions.nodes(‘ declare namespace MSAW2 =”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; /MSAW2:root/MSAW2:Location’) as PPM(Instruct) WHERE ProductModelID=53
The results are shown in Figure 4-10.
Figure 4-10
In this example, the CROSS APPLY is used to return only the rows from the outer table, in this case the Instructions.nodes() query, which produces the result “small wrench.” The results are returned through a result set from the table-value function. In this example, the table-value function is the right input and the outer table expression is the left input of the expression. For every row in the left input, the right input is examined and the resulting rows are joined for the final results. Time to move on to defaults, constraints, and computed columns as they relate to xml data type columns.
72
xml data type
Defaults, Constraints, and Computed Columns As with any other data type column in a table, an xml data type column can have defaults and constraints applied to it, as well as being used as a computed column.
Defaults There are two ways to apply defaults to an xml data type column. The first is to implicitly cast the data to an XML type as follows: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] NOT NULL DEFAULT N’’ ) ON [PRIMARY] GO
The second method is to explicitly convert the XML using the CAST function as follows: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] NOT NULL DEFAULT CAST(N’< Employee>’ As xml) ) ON [PRIMARY] GO
Seems easy enough, but what purpose would adding a default to an xml data type column serve? The answer lies in looking at other data type columns that have defaults, but with much more functionality. Suppose that in your Employee table you had the columns for the employee first name, last name, hire date, and so on. There really is no downside to this approach, but suppose that instead of all those columns, you had an xml data type column called EmployeeInfo and on that column you applied a default that contained a shell of an XML instance such as the following:
In this scenario, instead of inserting data into different columns, you could just as easily open the XML instance using XmlReader and insert into the XML document the appropriate nodes, such as the first name and last name nodes using the update method of XML DML. Even better, why not have the entire XML instance stored in the column and use the update method of XML DML and just update the appropriate nodes with the data? The XML instance might look like this:
73
Chapter 4
. . .
Instead of updating many columns, you are updating only a single column. The result is a performance gain and an easier method of updating an XML document.
Constraints Constraints allow you to define how you want SQL Server to enforce database integrity. In other words, constraints allow you to specify the type of allowable data to be inserted into the columns in your database. In SQL Server 2005, you can add constraints to xml data type columns, thereby limiting the XML values being added to the column. You can define these constraints by specifying row-level constraints and table-level constraints. These constraints apply to both typed and untyped XML.
Column-Level Constraints Column-level constraints are applied to the specific column to which you want to limit the data, and can be applied only to a single column. For an xml data type column, adding a constraint entails using the check() object and specifying the query expression for which to check. For example, the following code demonstrates how to apply a constraint on an xml column: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] check(EmployeeInfo.exist(‘/Employee/@EmployeeID’)=1) ) ON [PRIMARY] GO
This constraint states that any XML instance added to this column must have an Employee element with an EmployeeID attribute. The XML instance must have both of these in order for the insert to be successful. For example, the following INSERT succeeds because of the EmployeeID attribute on the Employee element: INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (1, ‘Damon’) GO
The following example, however, does not allow the INSERT because the EmployeeID attribute is missing: INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (1, ‘Damon’) GO
74
xml data type The error message returned from the execution of this statement says that the INSERT statement conflicted with the check() constraint and therefore the INSERT fails because no EmployeeID attribute was supplied. Column-level constraints are also useful when validating node values within the XML document. For example, the following constraint could check to ensure that duplicate values within the given XML document are not found in the Employee element:
Williams Williams
The constraint for this looks like the following: CONSTRAINT NameCheck CHECK ((‘/Employee[FirstName=LastName]’)=0)
This constraint says to look at the FirstName and LastName elements underneath the Employee element and make sure their values are not equal.
Table-Level Constraints A table-level constraint means that more than one column is included in the constraint. These types of constraints are good for further enforcing the integrity of the data in your tables. For example, you could use the value() method of the xml data type to check the value of an element or attribute of an XML document to see if it contains specific information before inserting or updating a table. One of the limitations of check constraints, however, is that they do not support any of the xml data type methods. As mentioned in Chapter 2, the workaround for this is to create a user-defined function that wraps the xml data type method and then use the UDF for the creation of the table. The following example creates a simple UDF to be used on the Employee table that will be used later: CREATE FUNCTION xmludf(@xmlvar xml) returns bit AS BEGIN RETURN @xmlvar.value(‘EmployeeInfo/@EmployeeID)[1]’, ‘int’) = EmployeeID) END GO
Once the UDF is created, it can then be applied as a constraint to a table. The following creates an Employee table with a constraint that uses the UDF created earlier: CREATE TABLE Employee ( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] CONSTRAINT EmployeeInfoValidate check (dbo.xmludf(EmployeeInfo)) ) ON [PRIMARY] GO
75
Chapter 4 The constraint applied to the Employee table specifies that any XML instance stored in the EmployeeInfo column will compare the EmployeeID attribute to the corresponding rows value in the EmployeeID column. For example, the following code sample inserts a row into the Employee table with an EmployeeID of 1 and an EmployeeID attribute of 21, which will fail: INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (1, ‘Damon’) GO
The insert in the previous example fails because the EmployeeID column value does not match the EmployeeID attribute of the Employee element in the XML document. However, the following constraint succeeds: INSERT INTO Employee (EmployeeID, EmployeeInfo) VALUES (1, ‘Damon’) GO
This example succeeds because the EmployeeID column value matches the EmployeeID attribute of the Employee element in the XML document. Constraints are valuable because they are a great way to enforce the validity of data. Constraints on xml data type columns, however, are much more useful because it is possible to enforce XML integrity not only of the existing column, but of your entire table as well.
Computed Columns Never let it be said that the xml data type is not flexible. Not only is it possible to apply defaults and constraints to an xml data type column, but the XML contained in the column can be used in creating computed columns as well. Computed columns are virtual columns, or existing columns, that are computed from an expression or equation using one or more columns in the same table. For example, it is possible to convert the value from a string column to an xml column as follows: CREATE TABLE Table1 ( Column1 varchar(200), Column2 as CAST(column1 as xml) ) GO
The preceding example reads the value from column1 and is used to compute the values for column2. The catch in this example is that in order for this to work, the data in column1 must be well-formed XML. It is also possible to go the other way as well, meaning you can convert xml to a string, as follows: CREATE TABLE Table2 ( Column1 xml, Column2 as CAST(column1 varchar(500)) ) GO
76
xml data type The CAST function was used in both of these examples to explicitly convert one data type to another data type. Although this is nice, the real power comes from the capability to read XML instance node values and use those values to create computed columns. xml data type methods cannot be used to create computed columns directly, so you must utilize other methods in the creation of computed columns, such as using UDFs to wrap the xml data type method.
Since xml data type methods can’t be used to create computed columns, the simple solution is to create a user-defined function that queries the value from the XML instance and then uses the function in the CREATE TABLE statement. The first step is to create the user-defined function as follows: CREATE FUNCTION GetNodeValue(@xmlvar xml) RETURNS int AS BEGIN RETURN @xmlvar.value(‘(/Employee/@EmployeeID)[1]’, ‘int’) END GO
The second step is to create the table: CREATE TABLE Employee ( EmployeeInfo xml, EmployeeID as dbo.GetNodeValue(EmployeeInfo) ) GO
You must specify the dbo account when specifying the user-defined function during the table creation statement because if it is left off, SQL Server does not recognize it as a built-in function and generates an error. The next step is to insert the XML data into the table (the EmployeeInfo column): INSERT INTO Employee (EmployeeInfo) VALUES (‘Robin’) GO
Now query the Employee table and review the results. The results should look like Figure 4-11.
Figure 4-11
In the previous example, the user-defined function was created and used in the CREATE TABLE statement as the computed column for the second column (EmployeeID). The user-defined function uses the value() method to query the XML instance for a specific value, in this case the EmployeeID attribute of the Employee node. When a record is inserted into the table, the user-defined function pulls the value from the XML instance and is used as the value for the EmployeeID column.
77
Chapter 4 This functionality also makes it possible (and quite simple) to use the query() method to query entire XML fragments to be used for computed columns. Modify the user-defined function as follows: DROP FUNCTION GetElementInfo GO CREATE FUNCTION GetElementInfo(@xmlvar xml) RETURNS xml AS BEGIN RETURN @xmlvar.query(‘root/Employee’) END GO
The table also needs to change a bit: DROP TABLE Employee GO CREATE TABLE Employee ( EmployeeInfo xml, EmployeeID as dbo.GetElementInfo(EmployeeInfo) ) GO
The last step is to insert a row into the table with a somewhat sizable XML instance: INSERT INTO Employee (EmployeeInfo) VALUES (‘Robin’) GO
Query the Employee table again to view the results, as shown in Figure 4-12.
Figure 4-12
In this example, the UDF is applied to the EmployeeID column with the data type being set as the user-defined function. The UDF queries and returns the entire XML document, returning the XML document such that when the UDF is applied to the EmployeeID column and data is inserted into the EmployeeInfo table, the UDF executes and sets the value of EmployeeID column equal to what was inserted into the EmployeeInfo column. The Employee column is therefore used as a computed column using the xml data type. By now you should start to have a grasp on computed columns using the xml data type column, and you can move on to other matters, such as views, which are covered in the next section.
78
xml data type
Creating Views Views can be created using an xml data type column. Since the contents of a view are based on a query, this makes it very enticing to use with the xml data type because the view has access to all the xml data type functionality, such as the value() and query() methods. The following example illustrates building a view that queries an xml data type column and uses the value() method to return values. First, a little clean-up: DROP TABLE Motocross GO CREATE TABLE Motocross ( [MotocrossID] [int] NOT NULL, [MotocrossInfo] [xml] NOT NULL ) ON [PRIMARY] GO
Now insert some data: INSERT INTO Motocross (MotocrossID, MotocrossInfo) VALUES (1, ‘
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler
James Stewart Michael Byrne
‘) GO
After inserting the data, the next step is to create the view: CREATE VIEW GetTeamInfo AS SELECT MotocrossInfo.value(‘(/Motocross/Team/@Manufacturer)[1]’, ‘varchar(40)’) as Team, MotocrossInfo.value(‘(/Motocross/Team)[1]’, ‘varchar(40)’) as Riders FROM Motocross
79
Chapter 4 Now that the view is created, you can query it: SELECT * FROM GetTeamInfo
The results should look like this: Team -----Yamaha
Riders ----------------------------------Tim Ferry Chad Reed David Vuillemin
Views are a great way to filter the data coming from the xml data type column, and other than not being able to use views in a distributed partitioned view (see the “xml data type Best Practices” and “Limitations” sections), there are no limitations when using the xml data type in a view. A distributed partitioned view is a view that includes a UNION ALL operator, where the tables defined by the UNION ALL are structured equally. However, the tables are stored as multiple tables within the same instance of SQL Server or a group of independent instances.
XML Settings Options Certain settings affect how XML will behave in SQL Server. This behavior applies to xml data type variables and columns. The following table lists the settings that must be configured and the appropriate value for each setting. If these settings are not configured as shown in the table, all queries and modifications on xml data type will fail. SET Options
Required Values
NUMERIC_ROUNDABOUT
OFF
ANSI_PADDING
ON
ANSI_WARNING
ON
ANSI_NULLS
ON
ARITHABORT
ON
CONCAT_NULL_YIELDS_NULL
ON
QUOTED_IDENTIFIER
ON
These options can be set by running the appropriate T-SQL. For example: SET ANSI_PADDING ON GO
80
xml data type
xml data type Best Practices As with everything else in technology there are some things to consider when using anything new. XML support in SQL Server 2005 is no different, and while it is a fantastic addition, it is always a good idea to know what some of the best practices and limitations are. The following sections detail some of things to take into consideration as you plan to move forward with SQL Server 2005.
Why and Where The intent of this book is not to persuade anyone to use SQL Server 2005 XML technology over relational storage at all. Chances are, however, that if you are reading this you have either started down the path of using XML in your databases or are already doing so. If you are in the first group, those that are considering using SQL Sever 2005 XML, the purpose of this section is to highlight the reasons and benefits of string XML in SQL Server 2005. Consider using the XML data model if any of the following conditions are met: ❑
You are “platform independent.” XML does not care what platform or operating system you are using.
❑
There is a lack of consistency in the structure of your data. If your data structure changes frequently, you should strongly consider the XML data model.
❑
Your data is in hierarchical format, a collection of strictly nested sets or nodes.
❑
The order of you data is important.
typed versus untyped This chapter spent quite a bit of time covering the differences between typed and untyped XML, but didn’t really discuss which to use in a particular scenario. Thus, the purpose of this section is to give you some guidance as to when you should use one over the other. Regardless of whether you use typed or untyped, SQL Server is going to check for well-formed XML anyway. At times, however, one method is a better solution over the other. You should use the untyped XML data type if the following criteria are met: ❑
There are no schemas associated with the XML data.
❑
You want data validation to happen on the client rather than on the server.
You should use typed XML under the following conditions: ❑
You want XML data validation to take place on the server rather than on the client.
❑
You want to utilize the query optimizations.
❑
You want to utilize the storage optimizations.
❑
You want to utilize the compilation of type information.
81
Chapter 4
Constraints As stated previously, constraints are a great way to limit the type of data that is permitted in an XML instance. For example, consider using a constraint under either of the following conditions: ❑
Any time business rule logic cannot be included (or is not allowed) in an XML schema. In these circumstances the logic can be moved to a constraint using xml data type methods.
❑
Any time columns other than xml data type columns are included in the constraint.
There is a downside to using constraints, and that is that you cannot use any of the xml data type methods when you specify a constraint. If you do need to specify a constraint, the solution is to create a UDF around the xml data type method. When you create the constraint, you can specify the function in the constraint.
Limitations As much as XML fans would like to say that the xml data type is perfect, there are a few limitations, which are displayed in the following list: ❑
xml data type instance cannot exceed 2GB.
❑
xml data type cannot be used in a distributed partitioned view.
❑
xml data type cannot be used as a PRIMARY KEY or FOREIGN KEY constraint.
❑
xml data type cannot be used as a UNIQUE constraint.
❑
xml data type cannot use the CAST or CONVERT functions on a text or ntext data type.
❑
Since XML has its own encoding, COLLATE is not supported.
❑
xml data type cannot be used in a GROUP BY statement.
❑
xml data type cannot be part of a clustered or non-clustered index.
❑
xml data type can only cast string data type to xml data type.
❑
xml data type does not preserve namespace prefixes.
Summar y This chapter took an in-depth look at the xml data type and its implementation in SQL Server 2005, as this is the foundation for the rest of the book. You delved into typed and untyped XML and learned the importance of determining how XML instances are stored in SQL Server, as well as some of the considerations to keep in mind when making that decision. Likewise, you learned how to alter the xml data type column. This can be very beneficial if you are considering moving toward XML storage or are currently storing XML in your database and would like to migrate to the xml data type column.
82
xml data type From there, you focused on the xml data type methods and how to use them to query XML instances. Understanding these methods prepares you for the upcoming chapters on querying and modifying XML data using XQuery and XML DML. Building on the xml data type column theme, the chapter’s focus shifted to using defaults, constraints, and computed columns to further enhance the xml data type column for greater usability, especially when applied to using the xml data type methods. Equally important is understanding the best way to put this new knowledge to use in a given situation, so the last part of the chapter focused on providing some insight on when to use the xml data type and outlined some of its limitations. Building on all of this new knowledge, the next chapter discusses querying and modifying XML data.
83
Quer ying and Modifying XML Data in SQL Ser ver 2005 A sizable section of Chapter 4 dealt with the xml data type methods, which are used to extract data from an XML document. The syntax of those methods, each one of them, except for the modify() method, takes an XQuery expression as a parameter. The XQuery expression of those methods is what really determines what data is returned from the XML document. This chapter focuses on querying the xml data type and modifying data in XML instances, both in variables or the xml data type column. Both of these topics were introduced briefly in Chapter 2, but it is necessary to spend much more time on each one in order to fully grasp the implementation of XQuery in SQL Server 2005. The first part of this chapter focuses primarily on the built-in XQuery support in SQL Server 2005, while the second half delves into the modification of XML documents using XML DML. The intent of this chapter is to give you a good understanding of XQuery implementation in SQL Server 2005. It does not go into every aspect of XQuery, which could fill a book by itself.
XQuer y The XQuery language provides the capability to query well-formed XML documents. Combined with the added benefit of SQL Server 2005 providing native XML storage via the xml data type, XML documents can be queried natively in SQL Server. As of SQL Server 2005 Beta 2, the support for XQuery is based on the Last Call working draft of the W3C XQuery Language of November 2003.
Chapter 5 Chapter 2 spent a few pages reviewing the structure of an XQuery expression, as well as some of the expressions and terms used in the XQuery language. This section briefly reviews what was introduced in Chapter 2; provides some new information on XQuery Prolog, XQuery Path expressions, and XQuery XML construction; and then introduces topics and examples of other XQuery features, namely the FLOWR statement and XQuery sorting.
XQuery Structure and Concepts Review The XQuery language is a case-sensitive language defined by the W3C and is built on XPath expressions that allow for the querying of XML documents. This section briefly introduces XQuery and its syntax, and the components that make up an XQuery query. Here are a few of the syntax rules: ❑
XQuery is case-sensitive.
❑
XQuery elements, attributes, and variables must be valid XML names.
❑
XQuery string values can be within double (" ") or single (' ') quotes.
❑
XQuery variables are defined by the $ symbol.
As explained in Chapter 2, there are two main parts to an XQuery query. The first part is the XQuery Prolog (discussed in more detail in the “XQuery Prolog” section), which is simply a namespace declaration, such as the following: declare namespace MSAW=”http://schemas.microsoft.com/_ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”);
The second part of the XQuery query is the body of the query, the query expression, as follows: /MSAW:root/MSAW:Location[LocationID=50]
When put together, the entire XQuery expression looks like the following: SELECT Instructions.Query(‘declare namespace MSAW=”http://schemas.microsoft.com/_ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; /MSAW:root/MSAW:Location[@LocationID=50]’) AS Location FROM Production.ProductModel WHERE ProductModelID = 47
All of these parts are necessary to have a true XQuery expression, including one of the xml data type methods, such as the query() method shown in the example. The power behind an XQuery expression is its ability to query deep into an XML document and retrieve any piece of information, whether it’s from an XML variable or from data stored in an xml data type column. Chapter 2 also touched briefly on some of the concepts and terms of XQuery such as Sequence, Atomization, Quantification, and Type promotion, which are reviewed in subsequent sections.
86
Querying and Modifying XML Data in SQL Server 2005 Sequence As defined in Chapter 2, a sequence is the result returned from an XQuery expression made up of nodes and fragments called items. For example, consider the following query: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $Inst in /MSAW:root return ( {string(($Inst/MSAW:Location[@LocationID = 50]/MSAW:step[1])[1]) } , {string(($Inst/MSAW:Location[@LocationID = 50]/MSAW:step[2])[1]) } ) ‘) AS Steps FROM Production.ProductModel WHERE ProductModelID=47
The results are shown in Figure 5-1.
Figure 5-1
Taking a look at the XQuery expression, this example is querying the first two elements, which are wrapped in parentheses. What would happen if the parentheses are removed? Go ahead and try it. What happened? It returned an error because of the return keyword bound to the first element. The solution to this is to either put the parentheses back, or to remove the second element. The parentheses are important because the parentheses have a higher order of precedence than that of the comma separating the elements. This is a fairly simple example. However, suppose you were to execute the following query: SELECT Instructions FROM Production.ProductModel WHERE ProductModelID = 47
A small portion of the results are shown here: Work Center - 50 Frame FormingThe following instructions pertain to Work Center 10. (Setup hours = .0, Labor Hours = 3.5, Machine Hours = 0, Lot Sizing = 1) Slide the stem onto the handlebar centering it over the knurled section.
Take care not to scratch the handlebar.
87
Chapter 5 Notice that this query returned a lot of information that is difficult to decipher. It’s one big XML document from a single column. The preceding results are only a small portion of the entire results returned. Wouldn’t it be great if there were a way to query specific data out all of that information? Well, there is: XQuery. If the schema in the Prolog supported it, the XQuery expression could have queried a level deeper into the XML document and pulled out the material node information.
Atomization Returning the typed value of an item is called atomization. A common scenario of atomization occurs when you use the data() function to return a typed value of a specific node. Going back to the example in Chapter 2, the query expression used atomization to automatically return the value of a node that had already been retrieved via the data() function: SELECT Instructions.query (‘declare namespace MSAW=”http://schemas.microsoft.com/_ Sqlserver/2004/07/adventure-works/ProductModelManuInstructions”; FOR $AW in / MSAW:root/ MSAW:Location[2] Return
‘) FROM Proction.ProductModel WHERE ProductModelID = 47
In this example, the first value is the attribute MachineHours. The second value is the same value returned using the data() function, and the third value is automatically returned using atomization. Therefore, the data() function is not needed. The use of the data() function is completely optional in XQuery, although it does improve the readability of an XQuery expression.
Quantification Quantification comes in two flavors: Existential and Universal. Existential simply means that for any two sequences, a value of TRUE will always be returned when any item in the first sequence matches any item in the second sequence. Universal means that for any two sequences a return value of TRUE will always be returned if every item in the first sequence has a match in the second sequence. In the following example, a Universal quantified expression uses the xml data type value() method instead of the query() method to compare two sequences, checking to see if every Location (sequence 1) has a MachineHours attribute (sequence 2): SELECT Instructions.value (‘declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; ( if (every $loc in //MSAW:root/MSAW:Location satisfies $loc /@MachineHours) then “YEP!” else
88
Querying and Modifying XML Data in SQL Server 2005 “NOPE!” [1])’,’varchar(5)’) AS ReturnValue FROM Production.ProductModel WHERE ProductModelID = 47
The results from this query return NOPE! because not every Location has a MachineHours attribute. Change the XQuery quantified expression from every to some (an Existential quantification) and rerun the query. What are your results?
Type Promotion Type promotion allows the implicit casting of numeric values, or an untyped value to a typed value. Casting can be Explicit or Implicit; however, there are certain rules that you need to follow when type casting regardless if you are explicitly or implicitly casting values.
Explicit Casting There are a number of rules you need to follow when explicitly casting. These castings are not supported: ❑
Casting to or from list types. For example, you cannot cast to or from xs:ENTITIES.
❑
Casting to or from xs:QNAME and xs:NOTATION.
❑
Casting to or from duration subtypes xdt:yearMonthDuration and xdt:dayTimeDuration.
What type of casting is allowed? A built-in type can be cast to another built-in type.
Implicit Casting The following rules apply when you are implicitly casting values: ❑
You can cast decimals to a float.
❑
You can cast a float to a double.
❑
You can cast numerical types (built-in) to their base type.
❑
You cannot implicitly cast string types.
❑
You cannot implicitly cast numeric types to string.
XQuery Prolog As stated earlier, an XQuery expression contains two parts: the Prolog and the body. The Prolog is a combination of namespace and schema declarations that define the query processing environment. The body of an XQuery expression is what contains the actual expression that specifies which values you want returned from the XML document. The following example, taken from earlier in this chapter, is used to define the XQuery expression: SELECT Instructions.query(‘declare namespace MSAW=”http://schemas.microsoft.com/ _sqlserver/2004/07/adventure-works/ProductModelManuInstructions”;
89
Chapter 5 /MSAW:root/MSAW:Location/[LocationID=50]’) AS Location FROM Production.ProductModel WHERE ProductModelID = 47
The first step in setting a Prolog is declaring a namespace prefix. You do this by using the declare keyword followed by the name of your namespace prefix. The name is then used in the body of the expression. The following example demonstrates declaring a namespace: declare namespace MSAW
The Prolog in the code example is the following: MSAW=”http://schemas.microsoft.com/_ sqlserver/2004/07/adventure-works/ProductModelManuInstructions”;
The Prolog is followed by the body of the expression: /MSAW:root/MSAW:Location/[LocationID=50]
The purpose of declaring a namespace is first to define a prefix to use in the body of the query, and second, to associate the prefix with a namespace URI (Uniform Resource Identifier), which in this case points to the corresponding XSD schema. Using the default namespace declaration binds a default namespace for all element names. In the event that you have been attacked by a severe case of writer’s block and for whatever reason you cannot think of a namespace prefix, don’t worry; you can always use the DECLARE DEFAULT ELEMENT namespace. This is a default namespace that will bind a default namespace for all element names. The following example illustrates how to use this default namespace: SELECT CatalogDescription.query(‘ declare default element namespace “http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription”; /ProductDescription/Features ‘) as Result FROM Production.ProductModel WHERE ProductModelID=28
When you run this query, all elements in the results are now prefixed with the default namespace defined in the Prolog. Here’s a portion of these results: These are the product highlights.
1 yearparts and labor
90
Querying and Modifying XML Data in SQL Server 2005 5 years maintenance contact available through dealer
By declaring a default namespace, you can bind each element name to the default namespace without the need to specify a prefix. This lets you write your expression without the need to specify the prefix at every turn, such as the case in the first example in this section where a namespace prefix was declared and used throughout the query.
XQuery Path Expressions Path expressions in XQuery provide a node location in an XML document. The nodes, regardless of whether they are elements, attributes, and other node types are always returned by the path expression ordered in the same order as in the XML document, without duplicates nodes listed. When specifying a path expression, the expression can be either relative or absolute. A relative path expression contains at least one or more steps separated by slash marks, typically one (/) or two (//). Using the CatalogDescription column in the Production.ProductModel table in the Adventure Works database for this example, a relative expression would look like the following: child::Manufacturer
In this example expression, child refers to the current node being searched, which in this example is the node. It returns the node of the node. Absolute path expressions begin with slashes (one or two) and can be followed by a relative path (the absolute path is optional). For example, using the same table and column, the following is an absolute path expression: /child::ProductDescription/child::Manufacturer
Since the expression begins with a slash, which tells the expression to start at the root node, and is then followed by a relative path expression. The expression queries starting at the root node and returns all nodes of the nodes of the root node. Absolute paths that start with a single slash (/) may not necessarily be followed by a relative path expression. For example, if the expression contains a single slash only, the entire XML document is returned. Path expressions consist of one or more steps. A step is a level in the XML hierarchy. For example, the following expression contains a single step: /MSAW:Location
The following expression has two steps: /MSAW:Location/MSAW:Step
91
Chapter 5 The first example returns all nodes underneath the root node element. The second example returns all child elements for each element. Steps in path expressions can of two different types: Axis step or General step. In SQL Server 2005, General steps in path expressions are not supported, so they are not covered in this book.
Axis Step There are two parts to an Axis step: the axis and the node test. The axis specifies the direction in which to search, and the node test defines the names of the nodes to be selected. There are six types of axes: ❑
Child: Returns children of the context node.
❑
Descendant: Returns all descendants of the context node.
❑
Parent: Returns the parent of the context node.
❑
Attribute: Returns the attribute of the context node.
❑
Self: Returns its own node.
❑
Descendant-or-self: Returns itself and its children.
The following example illustrates using a child axis to query child nodes of the parent node: SELECT CatalogDescription.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription”; /child::MSAW:ProductDescription/child::MSAW:Manufacturer’) FROM Production.ProductModel WHERE ProductModelID=35
The results are shown in Figure 5-2.
Figure 5-2
In this example, there are two steps in the expression. The first step, , is a child of the root node. The second step, , is a child of the node. The query then returns all children nodes of the node.
Node Test The node test is a test condition in which all the nodes selected in a step must meet the query criteria. For example, the following step example returns only the child elements of the node whose element name is Manufacturer: /child::ProductDescription/child::Manufacturer
92
Querying and Modifying XML Data in SQL Server 2005 Node test conditions can be specified by either the node name or by node type. For example, the following query expression returns the element by specifying the node name to query: SELECT CatalogDescription.query(‘ declare namespace PD=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription”; /child::PD:ProductDescription/child::PD:Manufacturer/child::PD:Name ‘) FROM Production.ProductModel WHERE ProductModelID=35
In this example, the Name node value is returned because the Name element is specified in the path expression. The path expression contains node tests. The results of this query are as follows: AdventureWorks
A node test can be a node name (illustrated by the example) or a node type. A node test where the condition is of a node type returns only those nodes where the type is specified in the query, such as the following: /child::PD:ProductDescription/child::PD:Manufacturer/child::comment()
This query returns all comment types found within the Manufacturer node. Node tests can be useful when you are not sure exactly what nodes are in your XML document and you want to query and test for specific nodes.
XQuery XML Construction XQuery construction is permitted using XQuery constructors inside of an XQuery expression. The constructors are accessible to all elements and attributes as well as other components of an XML document. Constructors allow you to build XML-type syntax, defining the construct of your XML. The concept of dynamically retrieving data from your database is where XML construction comes in handy. For example, the following queries all the manufacturing steps at the second element and builds or constructs a element with the returned data: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”;
{ / MSAW:root/ MSAW:Location[2]/ MSAW:step }
‘) as Location FROM Production.ProductModel WHERE ProductModelID=47
93
Chapter 5 The results are shown in Figure 5-3.
Figure 5-3
Click the link to better view the results (see Figure 5-4). The namespaces have been removed for readability:
Figure 5-4
The results of the query, which are all of the steps for the second location, were returned inside of the constructed element. Constructors also provide the ability to construct attributes. Building on the previous example, modify the expression as follows: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”;
{ /MSAW:root/MSAW:Location[2]/MSAW:step }
‘) as Location FROM Production.ProductModel WHERE ProductModelID=47
The results look the same, except there are two attributes added to the element, as illustrated in Figure 5-5.
Figure 5-5
Make one more modification to get rid of the namespace and return only the string value of the manufacturing step: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”;
{ for $var in /MSAW:root/MSAW:Location[2]/MSAW:step return string($var) }
‘) as Location FROM Production.ProductModel WHERE ProductModelID=47
The results are shown in Figure 5-6, and have been formatted for better readability.
Figure 5-6
Computed element and attribute names are not fully supported in this release, but will be supported at product release time. Until then, you can use string literals to define the names.
FLWOR Statement The FLWOR (pronounced flower) statement is the syntax in which you can define XQuery expression iteration within the XML document. FLWOR is an acronym that stands for FOR, LET, WHERE, ORDER BY, and RETURN. LET is not supported in the current beta release of SQL Server 2005.
A FLWOR statement is made up of the following components: ❑
An input sequence (constructed XML nodes are not accepted as input)
❑
A FLWOR variable (for example, FOR $var)
❑
An optional WHERE clause
❑
An optional ORDER BY clause
❑
A RETURN expression
Using the code sample from the previous section, the following example queries all the step elements for the second node: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $var in //MSAW:root/MSAW:Location[2]/MSAW:step return string($var)
95
Chapter 5 ‘) as Steps FROM Production.ProductModel WHERE ProductModelID=47
The results from the query are shown in Figure 5-7.
Figure 5-7
Looking at the query syntax, it follows the FLWOR syntax because it uses a number of the necessary components required for a FLWOR statement. It has the FLWOR statement FOR and the RETURN expression, and it includes the input sequence specified by the XPath expression. The preceding example uses a single variable bound to a single input sequence. The following example uses the same code sample from earlier, but uses multiple variables, each bound to an input sequence: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $var in //MSAW:root/MSAW:Location[2]/MSAW:step, $mat in $var/MSAW:material Return
{ $mat }
‘) as Material FROM Production.ProductModel WHERE ProductModelID=47
The results are as follows:
handlebar components
The difference between the previous two examples is that the latter defines two variables in the for clause. The first expression returns all of the steps for the second location. The second path expression, assigned to the $mat variable, returns all results for all the material elements within the $var variable. In this example, there is only a single occurrence of the material element. The WHERE clause in the FLWOR statement is not the where clause following the FROM clause in the preceding example. The where clause is included within the query expression and provides the capability to limit or filter the results returned. For example, the following queries all Locations where the number of step nodes is greater than two:
96
Querying and Modifying XML Data in SQL Server 2005 SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $var in /MSAW:root/MSAW:Location where count($var/MSAW:step) > 2 return
{ $var/@LocationID }
‘) as Location FROM Production.ProductModel WHERE ProductModelID=47
The results of this query, shown in Figure 5-8, include only those LocationIDs where the number of manufacturing steps is greater than 2.
Figure 5-8
This filtering for this query is provided by the where clause in the following expression query: where count($var/MSAW:step) > 2
XQuery Sorting Sorting in XQuery is provided by the GROUP BY clause of the FLWOR statement. In fact, the GROUP BY clause can only be used in a FLWOR statement. If the GROUP BY clause is not specified, the results are returned in ascending order by default, but you can specify the optional keywords ascending or descending as well. The capability to sort using GROUP BY is not limited to sorting by types. For example, a query expression can sort on an element value, attribute value, or element name. Using the Adventure Works database again, the following example shows how to retrieve all the alternate phone numbers for a specific person and then sort them from the Person.Contact table: SELECT AdditionalContactInfo.query(‘ declare namespace ct=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ContactTypes”; declare namespace ci=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ContactInfo”; for $var in /ci:AdditionalContactInfo//ct:telephoneNumber order by $var/ct:number[1] return $var ‘) As Result FROM Person.Contact WHERE ContactID=1
The results are shown in Figure 5-9.
97
Chapter 5
Figure 5-9
In the preceding example, the optional sort (ascending or descending) was left off, which automatically sorted the phone numbers in ascending order. In the next example, instead of sorting by a node value, the results are sorted by the MachineHours attribute: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/ adventure-works/ProductModelManuInstructions”; for $MH in /MSAW:root/MSAW:Location order by $MH/@MachineHours descending return
{ $MH/@LocationID } { $MH/@MachineHours }
‘) as Location FROM Production.ProductModel WHERE ProductModelID = 47
The results (see Figure 5-10) show the Locations, the LocationID attribute, and the MachineHours attribute sorted in descending order by MachineHours.
Figure 5-10
This final example shows how to sort by element name. The query expression queries the first manufacturing step of the first location to retrieve all child elements in ascending order: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ ProductModelManuInstructions”; for $var in /MSAW:root/MSAW:Location[1]/MSAW:step[1]/* order by local-name($var) return $var ‘) as Result FROM Production.ProductModel where ProductModelID=47
98
Querying and Modifying XML Data in SQL Server 2005 The results show the element names sorted in ascending order: aluminum sheet MS-2259 T-50 Tube Forming tool
While not nearly a complete discourse on XQuery technology, I hope this section gave you enough information about the XQuery implementation in SQL Server 2005 to be able to readily retrieve data from your XML documents.
XML Data Modification Language The XML Data Modification Language (XML DML) was introduced in Chapter 2 with some examples to highlight some of its features. The purpose of this section is to discuss the topics that were not covered in Chapter 2, provide more examples, and list any limitations of XML DML. As listed in Chapter 2, there are three keywords added that need to be added to an XQuery expression to enable XML DML functionality. The keywords are case-sensitive. They are as follows: ❑
insert
❑
delete
❑
replace value of
insert The insert keyword allows for the insertion of one or more nodes into an existing XML document. The placement of the new nodes is determined by the syntax used in the expression. The basic syntax for the insert keyword is as follows: INSERT Expression1 ( (as first | Last) into | after | before Expression2 ) Expression1 is the node or nodes to be inserted into the XML document. This expression can be an
XML instance or an XQuery expression. When specifying multiple nodes, you must wrap the nodes in parentheses and separate them by a comma. If Expression1 contains one or more values, those values are inserted as a single text node. The into keyword signifies that the nodes in Expression1 are inserted into the identified node in Expression2 as child nodes. If Expression2 already has child nodes, then you must include the as first or as last keyword to specify where in Expression2 to insert the new nodes. When inserting attributes, the as first and as last keywords are ignored. The before and after keywords determine where in Expression2 to insert Expression1. The before keyword inserts the nodes in Expression1 into Expression2 before any existing nodes in Expression2. The after keyword inserts the nodes in Expression1 into Expression2 after any existing nodes in Expression2.
99
Chapter 5 Expression2 is the relative node in the XML document into which the nodes in Expression1 are inserted. As with Expression1, Expression2 can also be an XML instance or an XQuery expression.
In all of the following examples, Expression1 is the node that is to be inserted. Here’s the first example: Evel Expression2 is the location where the nodes in Expression1 are added, which is the following: (/Root/Employee/EmployeeInformation)[1]’)
The XML document used in Chapter 2 to demonstrate the insert keyword is used again here:
The first example simply inserts a new node into the listed XML. The following code inserts a new lastname node into the XML instance: DECLARE @xmldoc xml SET @xmldoc = ‘
’ SET @xmldoc.modify(‘ insert Knievel into (/Root/Employee/EmployeeInformation)[1]’) SELECT @xmldoc GO
The results of the SELECT @xmldoc statement looks like the following:
Knievel
Using the as first keyword, modify the original code to look like the following: DECLARE @xmldoc xml SET @xmldoc = ‘
100
Querying and Modifying XML Data in SQL Server 2005
’ --SELECT @xmldoc SET @xmldoc.modify(‘ insert Knievel into (/Root/Employee/EmployeeInformation)[1]’) SET @xmldoc.modify(‘ insert Evel as first into (/Root/Employee/EmployeeInformation)[1] ‘) SELECT @xmldoc GO
Running this query returns the results shown in Figure 5-11.
Figure 5-11
The as first keyword added the FirstName element as the first element in the parent EmployeeInformation element. To finish off this example, make the following changes to the code and rerun the query: DECLARE @xmldoc xml SET @xmldoc = ‘
’ SET @xmldoc.modify(‘ insert Knievel into (/Root/Employee/EmployeeInformation)[1]’) SET @xmldoc.modify(‘ insert Evel as first into (/Root/Employee/EmployeeInformation)[1] ‘)
101
Chapter 5 SET @xmldoc.modify(‘ insert Daredevil as last into (/Root/Employee/EmployeeInformation)[1] ‘) SELECT @xmldoc GO
Figure 5-12 displays the results.
Figure 5-12
Using the as last keyword the JobTitle element was inserted into the XML instance as the last element in Expression2. Inserting multiple elements is nearly identical in operation to inserting single elements. The following example takes the original XML instance (Expression2) and inserts the FirstName, LastName, and JobTitle elements (Expresssion1) into the XML instance: DECLARE @xmldoc xml SET @xmldoc = ‘
’ SET @xmldoc.modify(‘ insert ( Evel, Knievel, Daredevil ) into (/Root/Employee/EmployeeInformation)[1]’) SELECT @xmldoc GO
The results of this query are exactly the same as the results shown in Figure 5-12. The difference is that it took only a single insert to add the elements into the XML instance. Adding attributes in not much different from adding elements. The following example inserts an attribute into the XML instance of the previous results:
102
Querying and Modifying XML Data in SQL Server 2005 DECLARE @xmldoc xml SET @xmldoc = ‘
Evel Knievel Daredevil
’ SET @xmldoc.modify(‘ insert attribute BusesJumped {“14” } into (/Root/Employee[@EmployeeID=1])[1] ‘) DECLARE @Status varchar(10) SET @Status =’Success’ SET @xmldoc.modify(‘ insert attribute JumpStatus {sql:variable(“@Status”) } into (/Root/Employee[@EmployeeID=1])[1] ‘) SELECT @xmldoc GO
In this example, two different attributes are added to the XML instance. The first one is added by specifying the literal string value BusesJumped. The second attribute is added by assigning the value to a variable and using the sql:variable function to pass the variable to the modify method. The attributes are successfully added, as shown in Figure 5-13.
Figure 5-13
The next example adds a new element into an untyped XML column. The following example uses the Motocross table created earlier to insert an element into the XML instance stored in the xml data type column. The XML instance in the MotocrossInfo column in the Motocross table looks like Figure 5-14 (which shows only the important piece of the XML instance):
103
Chapter 5
Figure 5-14
The following code inserts a new Rider element under the Team Suzuki node. The new node is the last node using the as last keyword, as follows: UPDATE Motocross SET MotocrossInfo.modify(‘insert Sebastien Tortelli as last into (/Motocross/Team)[3] ‘) GO
When you re-query the Motocross table, the results now look like Figure 5-15.
Figure 5-15
Elements and attributes can also be added using conditional statements, such as the following: SET @xmldoc.modify (‘ insert If (/Root/Employee/[@EmployeeID=1]) Then attribute BusesJumped {“14”} Else () As first into (/Root/Employee[@EmployeeID=1])[1]’)
This example is very similar to a previous example where you added the BusesJumped attribute. The difference here is that in this example, the addition of the attribute was wrapped around a conditional statement. If the EmployeeID attribute has a value of 1, then add the new attribute; otherwise, don’t add it. Deleting elements and attributes is as easy as adding them, as shown in the next section.
104
Querying and Modifying XML Data in SQL Server 2005
delete You can delete nodes from an XML instance by using the delete keyword. As explained in Chapter 2, the syntax is straightforward. Here it is again for your review: Delete Expression
The following example deletes a node from an xml data type variable: DECLARE @xmldoc xml SET @xmldoc = ‘
Evel Knievel Daredevil
’ SET @xmldoc.modify(‘ delete /Root/Employee/EmployeeInformation/JobTitle ‘) SELECT @xmldoc
Running this query removes the JobTitle node from the XML instance. The following example uses the same expression to delete the EmployeeID attribute from the XML instance: DECLARE @xmldoc xml SET @xmldoc = ‘
Evel Knievel Daredevil
’ SET @xmldoc.modify(‘ delete /Root/Employee/EmployeeInformation/JobTitle ‘) SET @xmldoc.modify(‘ delete /Root/Employee/@EmployeeID ‘) SELECT @xmldoc
Compare the results in Figure 5-16 with those shown previously in Figure 5-13, and you’ll notice that both the EmployeeID attribute and JobTitle node have been removed from the XML instance.
105
Chapter 5
Figure 5-16
The last example of this section illustrates deleting a node from an xml data type column: UPDATE Motocross SET Motocross.modify(‘ delete /Root/Team[3]/Rider’)
When you query the Motocross table, the results show that the Rider node with a value of Sebastien Tortelli has been deleted.
replace value of The update keyword in conjunction with the replace value of keyword allows for the in-place update of a node value in an XML instance. Chapter 2 covered the syntax of the replace value of keyword, but it is shown again here for review and additional information: replace value of expression1 with expression2 Expression1 is the node whose value is being updated. Only a single node can be expressed; if multi-
ple nodes are expressed, an error is generated. Expression2 is the new value of the node.
The following example updates a node in an XML instance with a new value. Using the previous Employee example, the JobTitle value is updated with a new value as follows: DECLARE @xmldoc xml SET @xmldoc = ‘
Evel Knievel Daredevil
’ -- update text in the first manufacturing step SET @xmldoc.modify(‘
106
Querying and Modifying XML Data in SQL Server 2005 replace value of (/Root/Employee/EmployeeInformation/JobTitle[1]/text())[1] with “Retired” ‘) SELECT @xmldoc
The results of the SELECT statement (see Figure 5-17) show that the JobTitle value has been changed from Daredevil to Retired:
Figure 5-17
Attributes can also be updated as shown in the following example (also note the change in information in the XML instance): DECLARE @xmldoc xml SET @xmldoc = ‘
Robby Knievel Son
’ -- update text in the first manufacturing step SET @xmldoc.modify(‘ replace value of (/Root/Employee/EmployeeInformation/JobTitle[1]/text())[1] with “Daredevil” ‘) SET @xmldoc.modify(‘ replace value of (/Root/Employee/@EmployeeID)[1] with “2” ‘) SELECT @xmldoc
This example updates both the EmployeeID attribute as well as the JobTitle value, as shown in Figure 5-18. In the previous examples, a [1] is added to the end of the target value being updated. Since only a single node can be updated, the [1] value specifies which node to update. In these examples, the [1] is not really necessary because there is only one node.
107
Chapter 5
Figure 5-18
However, in the case of multiple nodes with the same name, such as the case with the Motocross examples where there are multiple nodes, the [1] is necessary. A value of [1] updates the first node, while a [2] updates the second node. For example, the following code updates the second rider node of the third Team node of the Motocross table (the column is untyped): UPDATE Motocross SET MotocrossInfo.modify (‘ replace value of (/Motocross/Team/Rider/text())[2] with “Davi Millsaps” ‘)
This example demonstrates the replace value of statement, which updates the name of the second rider for team Suzuki from Broc Hepler to Davi Millsaps. The modified results are shown here: Jeremy McGrath
Ricky Carmichael Davi Millsaps Sebastien Tortelli
James Stewart Michael Byrne
The first expression identifies the node whose value is to be replaced and must be a single node. An error is generated if multiple nodes are found in the results of the query. Equally, if the results of the first expression are empty, no replacement is made. The second expression specifies the new value of the node — either a single value or a list of values. In the case where it is a list of values, the old value is replaced with the list. The last example in this section uses conditional statements to determine the new value. In the following example, the expression queries the number of riders for the first team, and depending on the number of riders found, sets the attribute of the Team element to a different value: DECLARE @xmldoc xml SET @xmldoc = ‘
108
Querying and Modifying XML Data in SQL Server 2005 Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler Sebastien Tortelli
James Stewart Michael Byrne
’ SET @xmldoc.modify(‘ replace value of (/Motocross/Team[1]/@Manufacturer)[1] with ( if (count(/Motocross/Team[1]/Rider) = 3) then “Team Yamaha” else “Yamaha” ) ‘) SELECT @xmldoc
The results shown in Figure 5-19 illustrate that the attribute on the Team element for the Yamaha manufacturer has changed for the Yamaha Team.
Figure 5-19
109
Chapter 5 This example used the count() function to count the number of child nodes. While this example is fairly simplistic, conditional expressions used in XQuery expressions have to use every XQuery function available at its disposal. For example, an expression could check the value of a node and the conditional expression could base its decision on the return of that value.
Summar y The entire purpose of this chapter was to build on the related topics that were discussed in Chapter 2. The implementation of XQuery support in SQL Server 2005, with the addition of the XML DML, makes the querying and modification of the xml data type quite easy. The XQuery language is quickly becoming a very popular and common XML querying language, and it would be wise to start learning it. This chapter got you started with that endeavor by explaining the syntax and structure of XQuery and some discussions of the concepts and terms used with XQuery, such as sequences and atomization. That was followed by an in-depth discussion on the XML DML (Data Modification Language), which is an extension of the XQuery language and used to modify the data within an XML instance. You learned about the three case-sensitive keywords (insert, delete, and replace value of), which allow you to modify XML document content. From here it is time to learn and understand how to improve performance when querying XML documents by learning about indexing the xml data type.
110
Indexing XML Data in SQL Ser ver 2005 Indexing is not new to SQL Server; it has been a feature since the early versions. Indexing is the concept of storing a structure associated with a table that allows for quick retrieval of data. This b-tree structure contains keys built from one or multiple columns in a table. With the introduction of the xml data type and the associated column in SQL Server 2005, the need to index the xml data type column is just as important, if not more important, than an index on any other data type column in a table. In SQL Server 2005, XML instances are stored as BLOBs (binary large objects) in the xml data type column, and the maximum storage size of this column can be up to 2GB. That is a lot of XML data. Querying these XML instances can be a serious undertaking, and without an index on the column, the XML instance is converted to relational data when querying. This is called shredding and is not the best way to query data from a table. An XML index does not use a b-tree index. Instead, an XML index is a shredded depiction of the XML instance contained in the xml column. Indexing the xml data type was first mentioned in Chapter 2 as an introduction to this topic. This chapter assumes that you have at least a basic understanding of how to create indexes and how they work, so no time is spent covering that, as it is outside the scope of this book. However, this chapter focuses entirely on indexing the xml data type in SQL Server 2005 by covering the following: ❑
Creating primary and secondary XML indexes
❑
Indexing XML content
❑
Modifying and deleting XML indexes
❑
Option settings for XML indexes
❑
Best practices for XML indexes
Chapter 6
Primar y XML Index In most of the examples thus far, the queries retrieved data from the Instructions column in the Production.ProductModel table in the AdventureWorks sample database. Looking in SQL Server Management Studio, you can see that there is indeed an index on the Instructions column (see Figure 6-1).
Figure 6-1
Figure 6-2 shows the properties of that index.
Figure 6-2
Looking at Figure 6-2, you can see that it is an index on the Instructions column in the ProductModel table, but the most important piece of information to gather from the figure is that it is a primary XML index. This section discusses creating primary XML indexes. Take a look at the following query against the Instructions column: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $var in /MSAW:Location[@LocationID=50]
112
Indexing XML Data in SQL Server 2005 return $var ‘) as Location FROM Production.ProductModel WHERE Instructions.exist (‘declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; //MSAW:Location[@LocationID=50]’) = 1
Now imagine what would happen if this index did not exist. This example employs the exist() method to look at the Instructions column for a LocationID with a value of 50, as expressed in the path expression. Without an index on the Instructions column, the exist() method must interrogate every row in the table looking for that value. That is a very time consuming process. Creating indexes on an xml data type column greatly improves query performance. The basic syntax for creating a primary XML index is as follows: CREATE [PRIMARY] XML INDEX Indexname ON Tablename (xml_Columnname) Indexname is the new name of the primary XML index to be created. Tablename is the table on which to create the new primary XML index. Finally, xml_columnname is the column on which to create the new
primary XML index. In a query window, run the following SQL, which drops the Employee table if it exists, recreates it, and then adds a primary XML index on the xml data type column EmployeeInfo: if exists (select * from dbo.sysobjects where id = object_id(N’[dbo].[Employee]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) DROP TABLE [dbo].[Employee] GO CREATE TABLE [dbo].[Employee]( [EmployeeID] [int] NOT NULL, [EmployeeInfo] [xml] NOT NULL, ) ON [PRIMARY] GO CREATE PRIMARY XML INDEX PriI_Employee_EmployeeInfo ON Employee(EmployeeInfo) GO
Didn’t work did it? What’s missing? Modify the SQL to add the following highlighted section, and then rerun the query: if exists (select * from dbo.sysobjects where id = object_id(N’[dbo].[Employee]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) DROP TABLE [dbo].[Employee] GO CREATE TABLE [dbo].[Employee]( [EmployeeID] [int] NOT NULL,
113
Chapter 6 [EmployeeInfo] [xml] NOT NULL, CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED ( [EmployeeID] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO CREATE PRIMARY XML INDEX PriI_Employee_EmployeeInfo ON Employee(EmployeeInfo) GO
This time it worked successfully because before a primary XML index can be created, a clustered index must exist on a primary key column (see commandment 4 in the “10 Commandments of XML Index Creation” section later in this chapter). This is for insurance reasons. If the base table is partitioned, the XML index also gets partitioned along with the table. Figure 6-3 shows the results of the table and index creation.
Figure 6-3
As you can see from Figure 6-3, the code created a Non-Clustered index on the EmployeeInfo column. It is not necessary to specify in the CREATE statement whether to create a Clustered or Non-Clustered index because by specifying PRIMARY, the CREATE INDEX statement knew that there was already a Clustered index on the primary key, and to create the PRIMARY XML index as Non-Clustered. Once the index is created you can go into its properties and try to change it to a Clustered index, only to have SQL Server balk at you for trying to drop the existing Clustered index, thus breaking the rule stated previously about needing a clustered primary key to create the index. Trying to drop the primary index with secondary indexes associated to it also generates an error.
Secondar y XML Index Secondary XML indexes can be added to xml data type columns to provide additional query performance. Having a primary index on an xml data type column without any secondary XML indexes may not prove to be beneficial, especially for columns with large XML documents. Querying a large XML instance based on path values can be very time consuming, and a single primary index may not provide the best performance. In these cases, adding secondary XML indexes specifically designed for certain query expressions can prove to be very beneficial.
114
Indexing XML Data in SQL Server 2005 There are three different types of secondary XML indexes depending on what you are querying, but regardless of the secondary XML index type, a primary XML index must exist prior to creating any secondary XML index. The three types of secondary XML indexes are: ❑
PATH
❑
VALUE
❑
PROPERTY
As shown in Chapter 2, the basic syntax of a secondary XML index looks like the following: CREATE XML INDEX SecondaryXMLIndexName ON TableName ( xml_ColumName ) USING XML INDEX PrimaryXMLIndexName FOR [PATH | VALUE | PROPERTY] SecondaryXMLIndexName is the new name of the secondary XML index to be created. TableName is the table on which to create the new secondary XML index. ColumName is the column on which to create the new secondary XML index. PrimaryXMLIndexName is the primary XML index on which to base the secondary XML index.
PATH You use the PATH secondary XML index when using path expressions in your query. You can determine that you need a PATH secondary XML index by looking at the WHERE clause of your SQL statement. If there is an exist() method on the xml column, you are using a PATH expression and could most likely benefit from this type of index. For example, the Instructions column in the Production.ProductModel table already has a primary XML index on it, and the following code adds a secondary PATH XML index: CREATE XML INDEX SecI_PM_I_PATH ON Production.ProductModel(Instructions) USING XML INDEX PXML_ProductModel_Instructions FOR PATH GO
Figure 6-4 shows that the new PATH secondary XML index has been created.
Figure 6-4
115
Chapter 6 The PATH secondary XML index created is based on the primary XML index called PXML_ProductModel_ Instructions and improves any path expression queries made on this column. For example, the following query (used previously) uses the exist() method to check for the existence of a location with a LocationID attribute with a value of 50: SELECT Instructions.query(‘ declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions”; for $var in /MSAW:Location[@LocationID=”50”] return $var ‘) as Location FROM Production.ProductModel WHERE Instructions.exist (‘declare namespace MSAW=”http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ ProductModelManuInstructions”; /MSAW:Location[@LocationID=”50”]’) = 1
Since there is now a secondary XML index on the Instructions column, the XML instance does not need to be shredded, making querying the XML instance faster.
VALUE When querying for specific values in an XML instance, you should use the VALUE index, especially when the name of the node or element isn’t exactly known or the path includes a wild card character. Add a VALUE index onto the Instructions column by running the following SQL statement: CREATE XML INDEX SecI_PM_I_VALUE ON Production.ProductModel(CatalogDescription) USING XML INDEX PXML_ProductModel_CatalogDescription FOR VALUE GO
Figure 6-5 shows that the new VALUE secondary XML index was created.
Figure 6-5
116
Indexing XML Data in SQL Server 2005 Just like the PATH index, the VALUE index was created based on the primary XML index and improves any value expression queries made on this column. For example, the following query executes a value() expression query to return the ProductID and Name columns from the Production.Product table if the picture size “small” is found: WITH XMLNAMESPACES ( ‘http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription’ AS p1) SELECT ProductModelID, Name FROM Production.ProductModel WHERE CatalogDescription.exist(‘//p1:Picture/Size[.=”small”]’) = 1
In the following partial results, there are a number of ProductID and Name columns containing the value “small”: ProductID 19 23 25
Name Mountain-100 Mountain-500 Road-150
There will be times when you are looking for a specific piece of data and want to query off that key piece of information. The VALUE index helps improve query performance when those times arise.
PROPERTY The intent of the VALUE index is to speed up searches for single values within an XML instance, but if you are searching for multiple values, such as “find all manufacturing steps for a specific Location” or “find the steps and material for a specific Location,” the VALUE index for this type of search is not adequate. But the PROPERTY index is made exactly for this type of search. For this example you need to add a PROPERTY index onto the Instructions column by running the following SQL statement: CREATE XML INDEX SecI_PM_I_PROPERTY ON Production.ProductModel(Instructions) USING XML INDEX PXML_ProductModel_Instructions FOR PROPERTY GO
Figure 6-6 shows that the new PROPERTY secondary XML index was created.
Figure 6-6
117
Chapter 6 Consider the following query: WITH XMLNAMESPACES (‘http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelManuInstructions’ AS “PD”) SELECT ProductModelID, Instructions.value(‘(/PD:root/PD:Location/@LocationID)[1]’, ‘int’) AS LocationID, Instructions.value(‘(/PD:root/PD:Location/@MachineHours)[1]’, ‘int’) AS MachineHrs FROM Production.ProductModel WHERE ProductModelID = 7
The results of this query produce three columns, as shown in the following results. The value of the first column is the ProductID, which is returned from the ProductID column in the table. The values of the second and third columns, however, are returned from the XQuery, which retrieved the LocationID and MachineHours attributes from the first location: ProductID 7
LocationID 10
MachineHrs 3
Because multiple values were returned, the PROPERTY index was utilized in the query. The PROPERTY index kicks into play with the value() method of the xml data type. It is also beneficial to know the primary key, in this case the ProductID column. Secondary indexes are a great way to improve query performance, especially when the size of the XML instance is large. Because multiple secondary indexes can be applied to a column, it is a good idea to apply the different types as needed. The secondary indexes are applied to the xml data type to help speed up your queries and return your results to you faster. In addition, indexes can also be applied to content, which is discussed in the next section.
Content Indexing In addition to creating primary and secondary XML indexes on the xml data type column, you can create and use full-text indexes on the column. While the primary and secondary XML indexes index the values and nodes, a full-text index indexes the entire XML instance, ignoring the values, nodes, and other XML syntax. However, unlike primary and secondary XML indexes, only one full-text index per table is allowed. Not per column, but per table. A full-text index is applied to a column, not a table. Because both types of indexes (primary/secondary and full-text) can be applied to both a table and a column, they both can be used together to query an XML instance. In the case of a full-text index, the index is applied first, and then an XQuery expression is applied to sift deeper. The requirements for creating a full-text index are very similar to that of the primary index in that a unique primary key column must already be defined on the table for which the full-text index is created.
118
Indexing XML Data in SQL Server 2005 The basic syntax for a full-text index is as follows: CREATE FULLTEXT INDEX ON TableName ( xml_ColumnName ) KEY INDEX IndexName TableName refers to the table in which the full-text index is being created. ColumnName is the name of the column on which the full-text index will be applied. IndexName is the name of the unique primary
key index. Before a full-text index can be created, a full-text catalog must exist in the database, as all full-text indexes are stored in the catalog. A database can contain one or more catalogs. The following example first creates a full-text catalog in which to store the full-text index, and then creates a full-text index on the Instructions column of the Production.ProductModel table: CREATE FULLTEXT CATALOG FTC AS DEFAULT GO CREATE FULLTEXT INDEX ON Production.ProductModel(Instructions) KEY INDEX PK_ProductModel_ProductModelID ON FTC GO
Figure 6-7 displays the results of the CREATE FULLTEXT CATALOG statement. The full-text catalog FTC has been created in the AdventureWorks database.
Figure 6-7
Double-click on the FTC full text catalog, or right-click and select Properties to display the FTC Properties page. On the left side of the Properties page, select Tables/Views. This page (see Figure 6-8) shows that a full-text index was created on the Production.ProductModel table using the unique index PK_ProductModel_ProductModelID on the Instructions column. Any table that has a full-text index on it is automatically displayed in the list on the right, as it shows only those tables that have a fulltext index.
119
Chapter 6
Figure 6-8
To get the best performance out of full-text searches, in certain scenarios it is possible to combine a full-text search with an XML index. The first step is to filter the XML values using the SQL full-text search, and then query the filtered values.
In the following section, you put the full-text index to use.
CONTAINS() You use the CONTAINS() keyword to search character strings looking for word or phrase matches. It also conducts what is called a proximity search, looking for a word that is near another word. For example, the following statement uses the CONTAINS() keyword to search the Instructions column of the Production.ProductModel table looking for the phrase “Inspection Specification”: SELECT ProductModelID, Instructions FROM production.productmodel WHERE CONTAINS(Instructions, ‘“Inspection Specification”’)
The results of the query are shown in Figure 6-9.
Figure 6-9
120
Indexing XML Data in SQL Server 2005 In this example, the results returned four rows because the query found four instances of the phrase “Inspection Specification” within the Instructions column. The CONTAINS predicate in this example searched each row of the Instructions column looking inside each XML document, and returned only those rows that contain the “Inspection Specification” phrase. The results of the query can be defined even further with an additional AND clause, such as the following, which would return a single row: SELECT ProductModelID, Instructions FROM production.productmodel WHERE CONTAINS(Instructions, ‘“Inspection Specification”’) AND ProductModelID = 47
In the following example, the CONTAINS predicate looks for multiple phrases by including an additional OR condition: SELECT ProductModelID, Instructions FROM production.productmodel WHERE CONTAINS(Instructions, ‘“Inspection Specification” OR “Securely tighten the spindle”’)
The query returns results where the CONTAINS predicate finds instructions containing the phrase “Inspection Specification” or “Securely tighten the spindle”, this time returning the same four rows as previously shown in Figure 6-9 plus an additional row, ProductModelID53. The next example uses the NEAR keyword looking for the word Inspect near the word Specification. The NEAR operator is used to indicate that a word or phrase on the left side of the NEAR operator is in close proximity to the word or phrase on the right side of the operator: SELECT ProductModelID, Instructions FROM production.productmodel WHERE CONTAINS(Instructions, ‘Inspect NEAR Specification’)
The results from this query return seven rows, as shown in Figure 6-10, indicating that the query found seven instances of the word Inspect near the word Specification.
Figure 6-10
As stated earlier, further filtering can be accomplished by combining the full-text index (using the CONTAINS() keyword) with an XQuery expression to filter the results even more. The following example again queries the Instructions column of the Production.ProductModel table to look for a specific value returned from the CONTAINS() keyword:
121
Chapter 6 SELECT ProductModelID, CatalogDescription.query(‘ declare namespace pd=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription”;
{ /pd:ProductDescription/@ProductModelID } { /pd:ProductDescription/pd:Summary }
‘) as Result FROM Production.ProductModel WHERE CatalogDescription.value(‘declare namespace pd=”http://schemas.microsoft.com/sqlserver/2004/07/adventureworks/ProductModelDescription”; contains( (/pd:ProductDescription/pd:Summary//*/text())[1], “smooth-shifting”)’,’bit’) = 1
The results of this query are shown in Figure 6-11.
Figure 6-11
In this example, a full-text index is combined with an XQuery expression to find and filter the results. The SELECT part of the query returns the ProductModelID and the entire Summary node based on the results of the WHERE clause. The WHERE clause is where the filtering takes place using the CONTAINS predicate. The CONTAINS predicate uses an XQuery to look through the Summary node in the CatalogDescription column looking for the text “smooth-shifting”. If the query finds what it is looking for, it returns the information requested by the SELECT portion of the query. As you create and use the indexes, you will also find that there will probably be a need to modify those indexes to better fit your application. The topic of modifying indexes is discussed next.
Altering XML Index Once created, an XML index can be altered. Typically, once an index is in place, it is not necessary to alter it, but if the occasion arises, altering an index is supported with very few exceptions. The syntax for altering an XML index is as follows: ALTER INDEX IndexName On TableName SET ( option ) IndexName is the name of the index to alter. TableName is the table in which the index that is being altered is applied. Option is the option that is being altered on the index.
The following is a list of available XML index options:
122
Indexing XML Data in SQL Server 2005 ❑
PAD_INDEX
❑
FILLFACTOR
❑
SORT_IN_TEMPDB
❑
STATISTICS_NORECOMPUTE
❑
DROP_EXISTING
❑
ALLOW_ROW_LOCKS
❑
ALLOW_PAGE_LOCKS
❑
MAXDOP
❑
ONLINE
All but FILLFACTOR and MAXDOP are an ON/OFF value. For example: PAD_INDEX = ON
The FILLFACTOR option, an integer value between 0 and 100, defines the percentage that dictates to the SQL Server engine the percentage of free space for each index page when the index is created. For example, the following tells SQL Server to reserve 75 percent of free space: FILLFACTOR = 75
The FILLFACTOR option can be used only when the index is first created or rebuilt. The MAXDOP option, an integer value between 0 and 64, overrides the max degree of parallelism option, which limits the number of actual processors used in an execution plan (parallel running). For example, the following sets the MAXDOP option to a value of 1, which executes the plan serially: MAXDOP = 1
The following example alters the primary index on the Employee table and sets the SORT_IN_TEMPDB option to ON: ALTER INDEX PriI_Employee_EmployeeInfo ON Employee REBUILD WITH ( SORT_IN_TEMPDB = ON ) GO
Rebuilding an index drops and recreates the index. This is not a bad thing, as it removes fragmentation and reorders the indexes. If you don’t want to completely rebuild the indexes, you can simply use the following syntax: ALTER INDEX PriI_Employee_EmployeeInfo ON Employee SET ( SORT_IN_TEMPDB = ON ) GO
123
Chapter 6 Several options can be set at one time. For example: ALTER INDEX PriI_Employee_EmployeeInfo ON Employee SET ( SORT_IN_TEMPDB = ON, ALLOW_PAGE_LOCKS = ON, IGNORE_DUP_KEY = ON ) GO
Each of the options can have an impact on your index and resulting performance, so it would be wise to experiment with some of these options to determine which options will benefit your application and environment best.
Setting Options for XML Indexing In Chapter 4, you saw a list of settings for the xml data type, but they are listed again in the following table because they also apply to indexes on the xml data type column. SET Options
Required Values
NUMERIC_ROUNDABOUT
OFF
ANSI_PADDING
ON
ANSI_WARNING
ON
ANSI_NULLS
ON
ARITHABORT
ON
CONCAT_NULL_YIELDS_NULL
ON
QUOTED_IDENTIFIER
ON
These options must be set as shown when creating an XML index; otherwise, indexes will fail to be created or modified, and no data will be able to be inserted or modified in xml data type columns.
Best Practices There are two main reasons why you would want to index your xml data type columns: ❑
You plan to execute queries against the xml data type column.
❑
The amount of data in the xml data type column is large.
Storing XML instances in an xml data type column does not automatically necessitate putting an index on that column. If you don’t plan to execute queries on that column, then putting an index on the column does not make sense. However, if the needs of your organization match both of the items listed here, there are a few things to remember when creating an XML index.
124
Indexing XML Data in SQL Server 2005
Ten Commandments of XML Index Creation You must adhere to the following when creating an XML index: ❑
You can only create a primary XML index on one xml data type column. For example, column ColA can have its own primary XML index, and column ColB can have its own primary index. However, ColA and ColB cannot share the same primary XML index.
❑
You cannot modify primary keys of a table if a primary XML index exists on the same table. To modify the primary key, you must drop the XML index prior to modifying the primary key.
❑
A table cannot have an XML primary index and non-XML index with the same name.
❑
Any table in which you are creating an XML primary index must already have a Clustered index on the primary key.
❑
You must drop an XML index prior to changing the xml data type column from typed to untyped (or untyped to typed).
❑
You cannot create XML indexes on xml type variables or views with an xml type column.
❑
The same restrictions as the previous bullet apply to XML index names as view names.
❑
The ARITHABORT option cannot have the value of OFF when you create an XML index. All queries to an xml data type fail if this value equals OFF.
❑
You can only use the DROP_EXISTING option to drop and recreate a new primary or secondary index, meaning that you cannot use DROP_EXISTING to drop a primary and create a secondary, nor to drop a secondary and create a primary. For example, DROP_EXISTING can drop a primary index and create a primary index, or drop a secondary index and create a secondary index.
❑
You must create an XML index on the same file-group or partition as the table.
Summar y In this chapter, you learned the different types of XML indexes and how they can be applied and used on an xml data type column. You also learned in what situations it is beneficial to apply the different types of indexes, as well as how to alter the indexes and the options available when altering an index. This chapter also talked about the best practices of applying an XML index and the settings options that are needed for creating an XML index. In the next chapter, you learn about XML schema collections.
125
XML Schemas in SQL Ser ver 2005 With the introduction of the xml data type in SQL Server 2005, the capability to natively validate XML documents and instances stored internally in SQL Server provides developers with many more XML validation options than they have had in the past. This flexibility helps developers determine how and where XML validation takes place. By having the ability to move XML validation into SQL Server, developers now have more control over how XML is handled, both from the client and the server. In SQL Server 2005, the concept of XML schema collections is introduced to validate instances of XML in the xml data type column or xml data type variable. The XML schema collection is just that, a collection of XML schemas that validate XML instances and performs type checking when XML data is stored in the database. The focus of this chapter is the creation and management of XML schema collections, the XML schema preprocessor tool, and best practices when using XML schema collections. Specifically, this chapter covers the following topics: ❑
Managing XML schema collections (creating, dropping, and altering)
❑
Viewing XML schema collections
❑
Permissions on XML schema collections (grant, deny, and revoke)
❑
Guidelines and limitations
Chapter 7
Managing XML Schema Collections The management of XML schema collections is provided through enhancements to the DDL (Data Definition Language). Using the DDL, you can create, drop, and alter XML schemas. Likewise, you can manage permissions to the XML schema collections. The XML schemas are stored internally, associated with xml data type columns and variables, and used to validate XML instances, and they ensure that the XML data is typed correctly when stored in the database. When an XML instance is stored in the database, SQL Server uses the schema collection for validation. Depending on the results of the validation, the instance is either accepted and stored, or rejected and not allowed database storage. Think of an XML schema collection in the same terms as a table or any other object in SQL Server. It can be created and dropped, even altered just like other objects. When creating an XML schema collection, the schemas are automatically imported into the collection. Other schemas can be added to an existing collection; schemas can be removed from a collection as well. All schema collections are stored in SQL Server system tables. The following section examines the creation, deletion, and modification of XML schema collections.
Creating XML Schema Collections You create XML schema collections by using the following DDL syntax: CREATE XML SCHMEA COLLECTION RelationalSchema.SqlIdentifier AS Expression
In the syntax, RelationalSchema is the name of the schema being imported. This is optional, but if a relational schema is not specified, a default is relational schema is supplied. SqlIdentifier is the name of the schema collection. Expression is the string constant or variable schema syntax of (n)varchar, (n)varbinary, or xml type. Don’t confuse a relational schema with an XML schema. In this case, a relational schema contains database objects such as tables, views, and stored procedures. Relational schemas have database owners, and the owner can be a database user or database role. For example, there is a dbo relational schema. The following is an example using a relational schema when creating an XML schema collection: CREATE XML SCHEMA COLLECTION dbo.MotocrossCollectionTest AS...
For more information on relational schemas, consult the SQL Server documentation. Before jumping in to some examples, I should make a few comments regarding the components created when a schema is imported into the database. When an XML schema collection is created, several components related to the schema are imported into the database. The components are stored in a number of SQL Server system tables and include the following:
128
❑
Attributes
❑
Elements
❑
Type definitions
XML Schemas in SQL Server 2005 What is important to remember is that when a schema is imported into the database, the schema is not stored, but rather the components themselves are stored. Each component falls into one of the following categories: ❑
ATTRIBUTE
❑
ELEMENT
❑
TYPE
❑
ATTRIBUTE GROUP
❑
MODELGROUP
These categories define how a schema is stored in the database. When a schema is added or imported in the database, the schema is parsed and each component is stored by its type. The schema itself as a whole is not stored. For example, an tag is not stored, but its components — such as values or attributes — are stored by the appropriate type. The last tidbit of information is that a schema can be imported into the collection with or without a namespace. For example, the following code creates an XML schema collection and imports a schema that does not contain a namespace: CREATE XML SCHEMA COLLECTION MotocrossCollection AS ‘
’ GO
129
Chapter 7 The CREATE XML SCHEMA COLLECTION statement allows you to specify which database to create the schema collection in, so make sure you are in the correct database when creating the schema collection. Figure 7-1 shows the newly created schema collection in the appropriate database.
Figure 7-1
In the example, the Manufacturer, NationalNumber, Type, and Class attributes belong to the ATTRIBUTE category, and the Team, Rider, and Name elements belong to the ELEMENT category. Now that the schema is created, you can use it for XML instance validation. For example, it can be applied to an xml data type column, as shown in Figure 7-2.
Figure 7-2
130
XML Schemas in SQL Server 2005 Now any time an XML instance is inserted into this column it is validated against the schema collection. If an XML instance passes the validation, it is inserted into the database. For example, the following XML instance passes the validation and is inserted into the table because it contains the appropriate elements and attributes as defined by the XML schema: INSERT INTO Motocross (MotocrossID, MotocrossInfo) VALUES (1, ‘
Chad Reed
David Vuillemin
Tim Ferry
Kelly Smith
Brock Sellards
Brett Metcalf
Danny Smith
James Stewart
Michael Byrne
Ricky Carmichael
Davi Millsaps
Broc Hepler
Sebastien Tortelli
131
Chapter 7 Jeremy McGrath
Ernesto Fonseca
Travis Preston
Andrew Short
Mike LaRocco
Kevin Windham
‘) GO
However, the following XML instance does not pass validation, nor is it inserted into the table because each rider has a BikeSize attribute, which is not defined by the schema: INSERT INTO Motocross (MotocrossID, MotocrossInfo) VALUES (2, ‘
Tim Ferry Chad Reed David Vuillemin
Kevin Windham Mike LaRacco Jeremy McGrath
Ricky Carmichael Broc Hepler Sebastien Tortelli
James Stewart Michael Byrne
‘) GO
If you try to insert this XML document, the following error is generated: XML Validation: Undefined or prohibited attribute specified: ‘BikeSize’
132
XML Schemas in SQL Server 2005 Another way to create the schema collection is to assign the schema to a variable and use that variable in the CREATE XML SCHEMA statement. The following example creates an xml data type variable, sets the schema syntax to the variable, and then uses the variable in the CREATE XML SCHEMA statement (to save space and for better readability, most of the schema syntax has been left out): DECLARE @xmlvar xml SET @xmlvar = ‘ 1000 ORDER BY ContactID FOR XML AUTO, TYPE
Figure 8-20 displays the results of the query.
Figure 8-20
In this example, the ContactID, FirstName, and LastName values are returned as attributes because the query did not specify to return them as elements. The inner query returned the Title and Birthdate columns as elements because the ELEMENTS directive was specified on the FOR XML clause. The following example also uses a nested FOR XML statement, but it also uses the XQuery data function to retrieve all the associated SalesOrders for each Contact in the inner SELECT. Those results do not need to be returned as an xml data type, so the TYPE directive is left off. The results are passed to the outer SELECT, which queries the ContactID and FirstName columns of the associated SalesOrders table: SELECT ContactID as “@ContactID”, FirstName as “@ContactName”, (SELECT SalesOrderID as “data()” FROM Sales.SalesOrderHeader WHERE SalesOrderHeader.ContactID = Contact.ContactID FOR XML PATH (‘’)) as “@SalesOrderIDs” FROM Person.Contact FOR XML PATH(‘SalesOrders’)
163
Chapter 8 Partial results from this query are shown in Figure 8-21.
Figure 8-21
This query returns everyone from the Person.Contact table, so you can specify an optional WHERE clause to further filter the results if necessary. The final example is a bit more complicated, in that it nests a few FOR XML queries within the inner SELECT statement: SELECT TOP 5 SalesOrderID, SalesPersonID, CustomerID, (SELECT TOP 2 SalesOrderDetail.SalesOrderID, ProductID, OrderQty, UnitPrice FROM Sales.SalesOrderDetail, Sales.SalesOrderHeader WHERE SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID FOR XML AUTO, TYPE), (SELECT * FROM (SELECT Employee.EmployeeID As SalesPersonID, Contact.FirstName, Contact.LastName FROM Person.Contact, HumanResources.Employee WHERE Contact.ContactID = Employee.EmployeeID) As SalesPerson WHERE SalesPerson.SalesPersonID = SalesOrderHeader.SalesPersonID FOR XML AUTO, TYPE, ELEMENTS) FROM Sales.SalesOrderHeader FOR XML AUTO, TYPE
A portion of the results are shown in Figure 8-22.
Figure 8-22
164
Transact-SQL Enhancements to FOR XML and OPENXML
XSD Schema Generation Just when you think that the new FOR XML features can’t get any better, the capability to generate an inline schema associated to your FOR XML query comes along. How easy is it, you ask? Simply by adding the XMLSCHEMA keyword to the end of the FOR XML statement you can quickly and easily generate a nice XSD schema. The syntax looks like this: FOR XML AUTO, XMLSCHEMA
However, you need to follow a couple of rules when using the XMLSCHEMA keyword: ❑
You can use the XMLSCHEMA keyword only in the RAW and AUTO modes.
❑
When nesting FOR XML statements, you can use the XMLSCHEMA keyword only on the outer, or top-level, query.
When you specify the XMLSCHEMA keyword, both the schema and the XML data results are returned with the schema preceding the XML data. The following example returns both an XML instance and the corresponding XSD schema: SELECT ContactID, FirstName, LastName FROM Person.Contact WHERE ContactID = 218 FOR XML AUTO, XMLSCHEMA
In the following results, the schema precedes the XML data, which is listed at the end of the results:
165
Chapter 8
Things to Watch Out For In dealing with XML, it is possible to have attributes and elements with the same name. For example, the following is perfectly acceptable in XML:
Howard Hughes
Daffy Duck
To test this, run the following query: DECLARE @test xml SET @test = ‘ Howard Hughes
Daffy Duck
’ SELECT @test
When you run this query, you get the XML shown at the beginning of this section. However, the following query generates an error trying to deal with the same-name columns: SELECT Contact.ContactID, SalesOrderHeader.SalesOrderID, SalesOrderHeader.ContactID, Contact.FirstName
166
Transact-SQL Enhancements to FOR XML and OPENXML FROM Sales.SalesOrderHeader, Person.Contact WHERE SalesOrderHeader.ContactID = Contact.ContactID AND Contact.ContactID = 218 FOR XML RAW, XMLSCHEMA
The solution to this is to simply add the ELEMENTS directive to the FOR XML statement, like so: SELECT Contact.ContactID, SalesOrderHeader.SalesOrderID, SalesOrderHeader.ContactID, Contact.FirstName FROM Sales.SalesOrderHeader, Person.Contact WHERE SalesOrderHeader.ContactID = Contact.ContactID AND Contact.ContactID = 218 FOR XML RAW, XMLSCHEMA, ELEMENTS
It is important to remember that using the XSINIL directive is permitted when generating schemas. If a column returns a NULL value it is not included in the schema if the directive is not included. Adding the XSINIL directive ensures that both columns are returned in the results, as well as included in the schema. The following example illustrates this: SELECT Contact.ContactID, SalesOrderHeader.SalesOrderID, SalesOrderHeader.ContactID, Contact.FirstName Contact.MiddleName FROM Sales.SalesOrderHeader, Person.Contact WHERE SalesOrderHeader.ContactID = Contact.ContactID AND Contact.ContactID = 226 FOR XML RAW, XMLSCHEMA, ELEMENTS XSINIL
The example specifies that if any values returned from the query are found, to still include an element for that value. In this example, the MiddleName column does not have a value, but the column is still included with the following value:
That covers the FOR XML changes. Microsoft made a number of OPENXML changes as well, so on to that.
OPENXML Two improvements were made to the OPENXML clause in SQL Server 2005. The first is the capability to pass an xml data type to the sp_xml_preparedocument stored procedure. The second enhancement is the capability to use the new data types in the WITH clause. In the following example, the @xmlvar variable is declared as an xml data type and filled with an XML document, which is then passed to the sp_xml_preparedocument stored procedure. The OPENXML statement is then used to query the XML document to retrieve the Employee attribute of all three employees:
167
Chapter 8 DECLARE @xmldocDoc int DECLARE @xmlvar xml SET @xmlvar = N’
José Carreras
Plácido Domingo
Luciano Pavarotti
’ -- Create an internal representation of the XML document. EXEC sp_xml_preparedocument @xmldocDoc OUTPUT, @xmlvar -- Execute a SELECT statement using OPENXML rowset provider. SELECT * FROM OPENXML (@xmldocDoc, ‘/ROOT/Employee’,1) WITH (EmployeeID int, ManagerID int, NationalIDNumber varchar(15)) EXEC sp_xml_removedocument @xmldocDoc
The results of the SELECT statement are shown in Figure 8-23.
Figure 8-23
The example demonstrates both enhancements to the OPENXML statement by passing an xml data type variable to the sp_xml_preparedocument stored procedure, and by using the new data types in the WITH clause. Finally, make the following modifications to the example code: DECLARE @xmldocDoc int DECLARE @xmlvar xml SET @xmlvar = N’
José
168
Transact-SQL Enhancements to FOR XML and OPENXML Carreras
Plácido Domingo
Luciano Pavarotti
’ -- Create an internal representation of the XML document. EXEC sp_xml_preparedocument @xmldocDoc OUTPUT, @xmlvar -- Execute a SELECT statement using OPENXML rowset provider. SELECT * FROM OPENXML (@xmldocDoc, ‘/ROOT/Employee/Tenor’,1) WITH (Position varchar(15), Solo varchar(15)) EXEC sp_xml_removedocument @xmldocDoc
Now rerun the query. You should see results similar to those shown in Figure 8-24.
Figure 8-24
Using the OPENXML clause to go deeper into the XML document, the Position and Solo attributes were queried and returned, as displayed in the figure.
Summar y In this chapter, you learned about the new improvements to the FOR XML and OPENXML statements. The FOR XML statement comes packed with a lot of new functionality that makes shaping your data and creating XML instances much easier. The PATH mode, as you learned, is much easier to learn than the EXPLICIT mode, and the results are just as pleasing with a lot shorter learning curve. The TYPE directive is a very nice addition as well, giving you the option of returning the query results as an xml data type, with very little restriction. The enhancements to the RAW and AUTO modes, specifically the capability to specify the root element, return element-centric XML, and specify the row element name come in very useful when you want to determine the shape of your XML results.
169
Chapter 8 Of all the excellent features in this chapter, the two most important are the capability to automatically generate schemas and nesting FOR XML queries. These two alone should make any developer’s life infinitely easier, and probably should put the EXPLICIT mode out of business. Not as many enhancements were made to OPENXML, but the few that were made were very nice, such as the addition of a capability to pass xml data type to the sp_xml_preparedocument stored procedure. The next chapter discusses CLR enhancements in SQL Server 2005 and how those enhancements benefit you when dealing with XML in SQL Server.
170
CLR Suppor t in SQL Ser ver 2005 If you really look at some of the biggest improvements made to SQL Server 2005, the top two would have to be the addition of native support for the xml data type and the integration of the Common Language Runtime (CLR). It’s up to you to decide which improvement is most important, but regardless of the order, you have to admit that these are the top two from the developer’s perspective. What is the CLR? Good question. The CLR is the heart and soul of the Microsoft .NET Framework. It provides the environment for the execution of all the .NET Framework code. The CLR is also the foundation for many of the built-in services that are required for your programs to run, such as exception handling, thread and memory management, and JIT (just-in-time compilation — code is compiled when it is needed). A term that you need to remember is managed code. What is managed code? Any code that runs within the CLR is called managed code. When it is compiled, managed code compiles down to native code, which means better performance. Why is this important? By integrating the CLR into SQL Server, developers can now write stored procedures and other objects, compile them into managed code, and use them right from SQL Server. This does not mean that Microsoft has set out to put every DBA out of job, so before all you DBAs out there panic, continue reading, especially the section “The Great Debate.” However, neither this chapter nor this book discusses the CLR in any great detail. There are already books out there that do that. Up until this chapter, the focus of the book has been on the native xml data type support. However, this chapter changes direction and focuses on the integration of the CLR in SQL Server 2005.
Chapter 9 When your boss comes to you and says that he or she heard that SQL Server 2005 comes with CLR integration and asks you what you think about using it in some of your application development, how will you answer? You can’t really expect to learn everything there is about the integration of the CLR in a single chapter. That is why this chapter is merely an overview, or introduction, to the integration of the CLR in SQL Server 2005. Integration of the CLR is introduced here as a preface to later chapters in the book, which go into more detail about the topics introduced here, and to simply whet your appetite. The topics discussed in this chapter are the following: ❑
Overview of Common Language Runtime integration
❑
T-SQL language limitations
❑
Introduction to managed code
❑
Advantages of CLR integration
❑
Choosing between T-SQL and managed code
❑
Security
The Great Debate The integration of the CLR in SQL Server 2005 has caused a great stir in the development community, both from database administrators and developers as well as the developers writing the front-end application. There have been great debates from those who are for the integration as well as from those who are against the integration, from database administrators and developers alike. Many rumblings have come from those who say that the integration of the CLR into SQL Server signifies the demise of T-SQL, while the other side of the camp wonders if the integration was even necessary (and possibly dangerous). This book does not jump onto either bandwagon, but rather presents the material in such a way that you will be able to make your own decision, one that is best for your applications. The purpose of this chapter (and other chapters later in the book) is not to persuade you one way or the other, but rather to give you the information you need to make an intelligent decision of when and how to use the CLR over T-SQL and vice versa. Follow-up chapters later in the book dig into the detail on how to use the CLR in SQL Server 2005. Nearly every developer, whether they are a SQL or front-end object-oriented developer, agrees that T-SQL is great at data access and the manipulation of that data at the database. SQL Server is awesome — most developers can agree with that. But these same developers should also admit that it is not as complete of a programming language as C# or Visual Basic .NET. The SQL programming language is built around data access and data manipulation at the database level. It does a tremendous job of that, but it has its limitations. This is where an object-oriented programming language can step in and complement T-SQL. Not replace, complement.
172
CLR Support in SQL Server 2005
Integration Overview The integration of CLR provides database programmers with the capability to include and use businessoriented languages such as C# and Visual Basic .NET within SQL Server 2005. Using this new integration, database developers can now create database objects such as stored procedures, triggers, and user-defined functions using these common business oriented languages as an alternative to T-SQL. This integration provides functionality not found before in SQL Server, such as preemptive threading and memory garbage collection (returning unused memory back to the operating system). While SQL Server and the CLR differ in the way they handle issues such as threads and memory, understanding their integration can be an advantage to you as a developer when trying to get the most of your application. The goals of integrating the CLR in SQL Server 2005 come down to a handful of items. Listed in no particular order, they are as follows: ❑
Scalability
❑
Reliability
❑
Performance
❑
Security
❑
Memory management services (such as garbage collection)
Scalability As mentioned previously, SQL Server and the CLR both have different mechanisms for handling memory and other processes. When running user code inside SQL Server, the last thing you want to do is degrade performance by causing a conflict between two competing processes. For example, SQL Server uses a non-preemptive threading (threads occasionally yield execution) model whereas the CLR uses a preemptive threading model. Another example, the CLR cannot tell the difference between physical and virtual memory, but SQL Server can, because physical memory limits can be set and memory is therefore managed by SQL Server. Careful thought must be given when writing user code that operates inside SQL Server. Any user code dealing with things like memory and threading will conflict with the same functionality in SQL Server and cause serious scalability issues.
Reliability Also known as safety, reliability states that any code running in the CLR should not compromise the integrity of the SQL Server database engine in which the process is running. An example of this is a process that changes the structure of a database.
Performance What good does it do to run .NET code in SQL Server that runs worse than its T-SQL equivalent? Any managed code running in SQL Server must perform as well as, or better than the native T-SQL code. By taking advantage of the CLR in SQL Server, you can take advantage of the fact that both the data and the code are brought closer together. By doing this, you are taking advantage of the processing power of the server and in many cases you will see an increase in performance.
173
Chapter 9 Security Any user code running in SQL Server needs a way to access machine resources that are outside of the database engine. One of the main security reasons for utilizing the CLR in SQL Server is that much less data needs to leave the server, lessening the risk of exposing your data. Managed code running in the database needs to adhere to the same authentication and authorization rules as access database objects such as tables or stored procedures. This ensures that no unwanted processes can gain access to the database engine and database components without going through the correct channels. Later on in the book, Chapter 21 to be exact, you will learn about Assemblies in SQL Server 2005. As a quick introduction, an assembly is a SQL Server hosted DLL or EXE that when created in SQL Server has one of three levels of security in which context the assembly can run. As you will learn later, each level of security either strengthens or lessens the security context in which the assembly is run. This will have a definite impact on how you utilize the CLR security in your environment, so don’t skip that chapter.
Limitations of T-SQL Most diehard T-SQL developers will tell you that whatever you can do in .NET can be done in T-SQL, and they might go as far as to say that they can do it better. While the validity of that statement will be argued until the end of time, the reality is that other programming languages, such as C# and Visual Basic, are more complete. That is not to say that T-SQL is inferior by any means. Previously this chapter mentioned what T-SQL did well, and that was data access and set-based operations within a database. In fact, in SQL Server 2005, you have the ability to do recursive queries, which is the capability of a common table expression to reference itself. SQL Server 2005 also comes with new analytical functions (such as RANK and ROW_NUMBER) and relational operators (such as APPLY, PIVOT, and UNPIVOT), which are used to manipulate table-valued expressions into another table. All of these new features in SQL Server 2005 prove that T-SQL continues to grow and is taken seriously. ANSI SQL is based on open standards that are not owned by a single company, making it easy to be used with any RDBMS that complies with the ANSI standards. That said, though, SQL Server does have its limitations. For example, the following cannot be done in T-SQL but can easily be done in .NET: ❑
Arrays
❑
Collections
❑
FOR EACH loops
❑
Classes
While this list is by no means complete, it gives you an idea of the major differences between T-SQL and other matured programming languages.
174
CLR Support in SQL Server 2005 For example, to loop through a list of Product records using T-SQL, a CURSOR is required, as shown in the following example: Use AdventureWorks GO DECLARE Product_Cursor CURSOR FOR SELECT ProductID, Name FROM Production.Product WHERE Color IS NOT NULL ORDER BY Name OPEN Product_Cursor FETCH NEXT FROM Product_Cursor WHILE @@FETCH_STATUS = 0 BEGIN --DO SOMETHING WITH THE DATA FETCH NEXT FROM Product_Cursor END CLOSE Product_Cursor DEALLOCATE Product_Cursor
To accomplish this same thing in .NET requires the following: Module Module1 Sub Main() Dim tsql As String = “SELECT ProductID, Name FROM Production.Product WHERE Color IS NOT NULL ORDER BY Name” Dim connstr As String = “Provider=SQLOLEDB;Server=(local);Database=AdventureWorks;UID=sa;PWD=hackthis” Dim conn As New OleDb.OleDbConnection(connstr) Dim cmd As New OleDb.OleDbCommand(tsql, conn) Try conn.Open() Dim rdr As OleDb.OleDbDataReader = cmd.ExecuteReader() For Each rdr.Item In rdr.c rdr.Read() Console.WriteLine(rdr.Item(0) + “, “ & rdr.Item(1)) Next rdr.Close() Catch ex As Exception Console.WriteLine(ex.Message.ToString()) Finally conn.Close() End Try End Sub End Module
175
Chapter 9 The purpose of this section was certainly not to paint the T-SQL language as an inferior language. Actually, it was the opposite: T-SQL is a strong and powerful language for data access and manipulation. The new features in SQL Server 2005 discussed here can attest to that. The integration of the CLR is there as a complement to the already strong SQL language, making it that much stronger. So breathe a sigh of relief DBAs; you’re not out of a job.
Introduction to Managed Code Managed code is simply code that runs within the CLR. Prior to SQL Server 2005, it was not possible to mix database engine processes with the CLR with any amount of success, but SQL Server 2005 has integrated the CLR and provides the capability to run safe user code within the confines of a database engine process. Every CLR-compliant language compiles its code down to what is called MSIL, or Microsoft Intermediate Language. The CLR can run this compiled source code because implementation differences are gone, regardless of how it is used or presented in the specific language. For example, a system.string in one language is a system.string in another language when it is compiled. COM (Component Object Model) was the first great step in not having to write everything from the ground up. COM supplied the foundation for the higher-level software and services. It allowed you to build your application by using components written by others. For example, you could buy a third-party grid or calendar control so you didn’t have to create one from scratch. It sped up application development and provided functionality you would have had to otherwise spend the time to develop yourself. As cool as it is, COM has its limitations. Have you ever tried to pass a string value from a VB application to a C++ application? Typically the work had to be done on the C++ side because VB hates working outside the box. These limitations don’t exist with the CLR and managed code. Even better, running managed code within the CLR has been extended to SQL Server 2005. If you have done any work with the .NET Framework and the CLR, you know how easy it is to work with. With the CLR integrated into SQL Server 2005, this same flexibility is accessible from right inside the database engine, providing the capability to write stored procedures, triggers, and user-defined functions in managed code. And don’t forget user-defined types and aggregates. For example, you can use .NET to create your own type and use it in SQL Server (you’ll see an example of this in Chapter 22). Take the following code, for example: Imports Imports Imports Imports Imports
System System.Data System.Data.Sql System.Data.SqlServer System.Data.SqlTypes
Public Class SampleTestClass
176
CLR Support in SQL Server 2005 Public Shared Sub TestMessage() Microsoft.SqlServer.Server.SqlContext.Pipe().Send(“This Stuff ROCKS!”) End Sub End Class
You can compile this simple class into a CLR stored procedure and, using an assembly, execute it via a standard T-SQL stored procedure. This simple example walks you through how to do just that. Open Visual Studio 2005 and select Create New Project. The New Project screen, depicted in Figure 9-1, is displayed. Under Project Types on the left side of the screen, select Database. Then in the Templates section on the right, select SQL Server Project. Give the project a name (such as TestAssembly), browse to where you want to save the project, and then click OK.
Figure 9-1
After clicking OK, you might get a new dialog window named New Database Reference. This dialog is used to inform your project which SQL Server you want to deploy your project to. The only information required is the server name where SQL Server 2005 is running, the username and password you want to use to connect, and the database in which to deploy your project. After you fill in that information, click OK. The project is then created in Visual Studio. After the project is created, right-click the solution name in the Solution Explorer window, select Add, and then select New Item from the menu. This opens the Add New Item dialog shown in Figure 9-2.
177
Chapter 9
Figure 9-2
In the Add New Item dialog, select Stored Procedure and then give the stored procedure a name. In this example, the stored procedure is called TestProc. After you name the stored procedure, click Add. When you click Add, Visual Studio automatically opens and displays your stored procedure with a lot of the necessary code already filled in. Figure 9-3 shows what the template looks like when it is first opened and displayed.
Figure 9-3
The two lines of code to pay attention to in the figure are the following:
Public Shared Sub TestProc ()
The first line tells this assembly that when compiled, it will be compiled into a stored procedure. The second line is the name of the stored procedure when compiled, TestProc.
178
CLR Support in SQL Server 2005 Add the line of code shown in Figure 9-4. Notice that the code in the figure looks similar to the sample code presented at the beginning of this section.
Figure 9-4
The next step is to build the solution, which compiles this code into a DLL (referred to as the assembly) behind the scenes. From the Build menu, select Build Solution, as shown in Figure 9-5.
Figure 9-5
The final step in this process is to deploy the assembly. Deploying the assembly automatically creates the assembly reference and the stored procedure in SQL Server 2005. From the Build menu, select Deploy Solution, as shown in Figure 9-6.
Figure 9-6
179
Chapter 9 Open Microsoft SQL Server Management Studio and open a new query window. Be sure to select the database in which the assembly was deployed. In the query window, execute the following T-SQL statement: EXEC TestProc
Figure 9-7 shows the T-SQL statement and the results of the execution of the stored procedure.
Figure 9-7
While this example was quick and easy, it does a very good job of showing the capabilities of developing SQL Server objects using the .NET Framework.
Advantages of CLR Integration At the beginning of the chapter, you were introduced to some of the limitations of T-SQL and how those limitations are better served by taking advantage of what the CLR provides. You saw a small list of the shortcomings of T-SQL, including such things as a lack of support for arrays, collections, and other things that are more than supported in the CLR. This section briefly discusses some of the advantages of using managed code over T-SQL. What you need to understand is that, while there are some definite advantages of using CLR over T-SQL, it is not a “fix all” for every situation or scenario. T-SQL is specifically designed for quick data access, manipulation, and data management. It is exceptionally good at that. It was not designed, however, to provide support for collections, arrays, or classes, for example. As stated previously, this type of functionality can be imitated using T-SQL, but there are typically performance issues associated with that. Managed code offers what T-SQL does not: much better support for string manipulation and complex logic, thus bridging the gap between what SQL doesn’t do well and what .NET does well, and offering real object-oriented functionality within SQL Server via the integration of the CLR into SQL Server 2005. All of the functionality in the .NET Framework can now be accessed via managed code within SQL Server. Stored procedures and triggers have full access to any class in the .NET Framework, which was not accessible before.
180
CLR Support in SQL Server 2005 The CLR interrogates all user code before it is executed to verify that it is safe, meaning that it won’t break anything when it executes, something that SQL Server does not do. For example, the CLR checks to make sure that any user code being executed does not read into memory that has already been read and written to. The integration of the CLR also provides some object-oriented capabilities that are not provided in SQL Server 2005. Encapsulation, inheritance, and polymorphism are three object-oriented features that currently do not exist in SQL Server. Each of these is defined in the following list: ❑
Encapsulation: The capability to contain and control a group of related items. For example, a class can contain a number of related methods and properties, controlled by the class. All the methods, properties, and events are treated as a single object.
❑
Inheritance: The capability for one class to inherit from another class. In other words, inheritance is the ability to create a new class based on an already existing class. For example, if class B inherits from class A, class B gains access to all the methods and properties of class A, plus any others that it has defined itself.
❑
Polymorphism: The capability to have multiple classes, each with its own methods and properties, which are used in distinct ways even though the names of the methods or properties are the same. A base class may have a method called GetEmployeeInfo, for example. Polymorphism lets you create one or more classes from the base class with each new class implementing its own version of the GetEmployeeInfo method, with each class being used interchangeably.
Even though these three features do not exist in SQL Server 2005, the CLR extends this function to SQL Server through the CLR integration and provides an extra benefit to the already excellent features of SQL Server. With all of this information in mind, you must wonder how to choose between T-SQL and the CLR. Well, read on.
Choosing Between T-SQL and Managed Code With the integration of the CLR in SQL Server 2005, the line that separates what is commonly known as the Business Logic Tier and the Data Tier just got a little fuzzier. That certainly is not meant to be taken negatively; it just means that you will need to do a little more homework when choosing where to do what, and your homework just got a little more complicated. Choosing where to put middle tier logic and database access logic was fairly easy. It is not so obvious now with the CLR being integrated into SQL Server, but with that comes added functionality and flexibility that can certainly enhance your applications. Choosing between T-SQL and managed code is not a cut and dried decision. As explained previously, T-SQL does some things phenomenally well, and managed code does other things well. That doesn’t mean you should throw all data retrieval functionality into a T-SQL stored procedure.
181
Chapter 9 When doing data retrieval, T-SQL is the way to go. Leave the data manipulation to the managed code side of things — those tasks that SQL doesn’t do well — especially if there is complex logic being processed on the returned data. Many developers make decisions like this based on the amount of data being handled. For example, if you know that a certain call to the database always returns a single record, why put that in a stored procedure? Obviously there are other things to consider, such as compiled execution plans that SQL Server provides, but every situation is different and more research is required to find the best-laid plan. The other thing to take into consideration is where the code is executed. Is the client the best place for certain logic, or does that same logic perform better on the server? With SQL Server 2005, both T-SQL and managed code can be run on the server, bringing the added benefit of server processing power, as well as shortening the gap between data and code. Is your application web-based or Windows-based? This also has an effect on where the logic is placed. Don’t discount the client, as workstation computers are very well powered and can handle a lot of the application processing without bringing the workstation to its knees. This means that a lot of the application processing can be offloaded to the client, freeing up the server for other tasks. Keep in mind that managed code can run on either the client or the server, but T-SQL can only run on the server.
Security There are now two security models inside SQL Server 2005. The first is the SQL Server security model, which is built around user-authentication. This is not new — it has been in place since way back. The second security model is the CLR security model, which is a code-access security model. Both of these are combined to support all the features of both SQL Server and the CLR inside SQL Server 2005. The combination of the two security models secures access between both CLR and non-CLR objects operating in SQL Server. When a call is made between objects running on the server, both models may step in to manage the security of the objects. The calls between these objects are called links. There are three types of links: ❑
Invocation
❑
Table-access
❑
Gated
Invocation Invocation links refer to the invocation of code. This could be from, for example, a CLR stored procedure being executed, or a user calling a T-SQL stored procedure. EXECUTE permissions are checked when these types of links are executed.
Table-Access Table-access links refer to the retrieval or modification of data in a table, view, or value-function. These types of links require INSERT, SELECT, DELETE, or UPDATE permissions.
182
CLR Support in SQL Server 2005
Gated In gated links, permissions are not checked once relationships have been verified. When a link is made between two objects, permissions on the second object are checked only at the creation of the first object. In SQL Server 2000, gated links are used for computed columns and fulltext-indexed columns. In SQL Server 2005, gated links are used in the CLR to define a T-SQL entry point into assemblies. This simply means that in order to execute a T-SQL entry point in a CLR-defined assembly, only appropriate permissions are checked on the T-SQL entry point and not the assembly.
CLR Security Integration Goals Integrating the CLR into SQL Server 2005 was a large process, and the following security goals were at the top of the list: ❑
Any managed code running in SQL Server should not compromise the integrity and stability of SQL Server.
❑
Any managed code running in SQL Server should not have unauthorized access to data or other code in the database.
❑
There should be a method for restricting user code from accessing resources outside of the server.
❑
Any user code running in SQL Server should not have unauthorized access to system resources simply by running in a SQL Server engine process.
Summar y The same question posed to you at the beginning of the chapter is now asked of you again. If your boss comes to you and says that he heard that SQL Server 2005 comes with CLR integration and asks you what you think about using it in some of your application development, how will you answer now? Let’s hope your answer will be a simple, “It depends,” as you then explain to him or her that simply utilizing this technology may not be the best solution for you application and that you need to perform careful research before jumping in. Using the CLR in SQL Server 2005 can be a great benefit to your application if used wisely and appropriately in certain situations. Finding that “certain situation” takes time, but it can add great benefits when you find it. This chapter marks the end of the section on server-side XML processing. The next several chapters deal with client-side XML processing. Chapter 10 specifically covers client-side support for the xml data type.
183
Par t III: Client-Side XML Processing in SQL Ser ver 2005 Chapter 10: Client-Side Support for the xml data type Chapter 11: Client-Side XML Processing with SQLXML 4.0 Chapter 12: Creating and Querying XML Views Chapter 13: Updating the XML View Using Updategrams Chapter 14: Bulk Loading XML Data Through the XML View Chapter 15: SQLXML Data Access Methods Chapter 16: Using XSLT in SQL Server 2005
Client-Side Suppor t for the xml data type The last nine chapters have focused primarily on the xml data type within SQL Server 2005, from the xml data type itself to indexing and querying the xml data type. By now you should have a good grasp of the xml data type from the perspective of SQL Server. It is now time to change focus and look at it from the other side, the client side. This part of the book deals strictly with XML from the client side, starting with this chapter, which discusses client-side support for the xml data type and introduces the SqlXml class. The SqlXml class is the means by which the client can interface with the xml data type. In this chapter, the following topics will be discussed: ❑
The SqlXml class and the CreateReader method
❑
Updating and inserting data with the SqlXml class
SqlXml Class The SqlXml class is a new class in the System.Data.SqlTypes namespace. This class represents XML data retrieved from, or stored in, SQL Server. One of the benefits of the SqlXml class is that it contains an instance of the XmlReader-derived type, providing fast, forward-only access to XML data. The SqlXml class implements the INullable interface, allowing SqlTypes to contain null values. The general syntax for using the SqlXml class is as follows: Dim xml As SqlXml = SqlDataReader.GetSqlXml([column/index])
Chapter 10 The SqlXml class, a method of the SqlDataReader class, returns the value of a specified column as an XML value. It takes a zero-based column ordinal that specifies the column whose data you want to return. There are various methods available to the SqlXml class, but the most important method when dealing with XML is the CreateReader method, which is outlined in the following section.
Introducing the CreateReader Method The CreateReader method is a public method on the SqlXml class. It is what gets or returns the value of the XML, always in the form of an XmlReader. It supports XML documents as well as XML fragments. The general syntax for using the CreateReader method is as follows: Dim sdr As SqlDataReader Dim xml As SqlXml = sdr.GetSqlXml(0) Dim xmlrdr As XmlReader = xml.CreateReader
The first line creates an instance of the SqlDataReader. The second line creates an instance of the SqlXml class. The third line uses the CreateReader method of the SqlXml class to create an XmlReader. The XmlReader can then be used to read and parse through XML documents and fragments.
Using the SqlXml Class Before the example begins, the table and data need to be created and populated. Open a query window in SQL Server Management Studio and execute the following T-SQL statements: DROP TABLE Motocross GO CREATE TABLE Motocross ( [TeamID] [int] IDENTITY(1,1) NOT NULL, [TeamInfo] [xml] NULL, CONSTRAINT [PK_Motocross] PRIMARY KEY CLUSTERED ( [TeamID] ASC ) ON [PRIMARY] ) ON [PRIMARY] GO INSERT INTO MOTOCROSS (TeamInfo) VALUES (‘
Chad Reed 22
188
Client-Side Support for the xml data type
David Vuillemin 12
Tim Ferry 15
Kelly Smith 123
Brock Sellards 18
Brett Metcalf 256
Danny Smith 31
‘) GO INSERT INTO MOTOCROSS (TeamInfo) VALUES (‘
James Stewart 259
Michael Byrne 26
‘) GO INSERT INTO MOTOCROSS (TeamInfo) VALUES (‘
Ricky Carmichael 4
Davi Millsaps
189
Chapter 10 188
Broc Hepler 60
Sebastien Tortelli 103
‘) GO INSERT INTO MOTOCROSS (TeamInfo) VALUES (‘
Jeremy McGrath 2
Ernesto Fonseca 24
Travis Preston 70
Andrew Short 51
‘) GO
Now that the data is in place, you are ready to write some code. Open the Visual Studio test application you have been using and open the main form in Design View. Add a new button and text box to the form. Set the following properties for the button: Property
Value
Text
SqlXml Class
Name
cmdSqlXml
Location
12, 12
Next, set the properties for the text box as follows:
190
Client-Side Support for the xml data type Property
Value
Name
txtResults
Multiline
True
ScrollBars
Vertical
Location
12, 62
Size
446, 196
With the properties set on the controls, double-click on the button to view the code behind it. To begin the example, first make sure that the following Imports statements are declared in your form in the declaration section: Imports System.Data.SqlClient Imports System.Data.SqlTypes Imports System.Xml
Next, add the following code in the click event of the button: Dim Connection As SqlConnection Dim Command As SqlCommand Connection = New SqlConnection Command = New SqlCommand Try ‘ENTER YOUR OWN USER NAME AND PASSWORD Connection.ConnectionString = “Server=localhost;Database=Wrox;UID=;PWD=” Connection.Open() Command.Connection = Connection Command.CommandText = “SELECT TeamInfo FROM Motocross” Dim r As SqlDataReader = Command.ExecuteReader r.Read() Dim xml As SqlXml = r.GetSqlXml(0) Dim xmlrdr As XmlReader = xml.CreateReader xmlrdr.Read() Me.txtResults.Text = xmlrdr.ReadOuterXml() Catch ex As Exception MessageBox.Show(ex.Message) End Try Command.Dispose() Connection.Close()
191
Chapter 10 Run the project by pressing F5 or by selecting Start Debugging from the Debug menu. When the form comes up, click the SqlXml Class button. Figure 10-1 shows the results of the query.
Figure 10-1
In this example, the SqlDataReader reads the selected rows from the Motocross table, and then the GetSqlXml method is called to retrieve the value from the first column (specified by the value 0). The CreateReader method is then created from the SqlXml class to retrieve the XML content of the SqlXml as an XmlReader, which allows for the reading of the access and reading of the XML document content. The retrieved XML content is then displayed in the text box as shown in the previous figure. In the example, the results displayed only the first record retrieved from the table, but in actuality all four records were returned by the query. The four records can be displayed by simply looping through the result set and displaying the XML by modifying the code from the example. The SqlDataReader, XmlReader, and SqlXml class are still necessary to return the data as before, but this time it is necessary loop through the result set. Modify the Click event code for the SqlXml button as follows: Dim Connection As SqlConnection Dim Command As SqlCommand Connection = New SqlConnection Command = New SqlCommand Try ‘ENTER YOUR OWN USER NAME AND PASSWORD Connection.ConnectionString = “Server=vssql2005;Database=Wrox;UID=;PWD=” Connection.Open() Command.Connection = Connection Command.CommandText = “SELECT TeamInfo FROM Motocross” Dim Dim Dim Dim
192
r As SqlDataReader = Command.ExecuteReader xml As SqlXml xmlrdr As XmlReader StrVal As String = “”
Client-Side Support for the xml data type Do While r.Read() xml = r.GetSqlXml(0) xmlrdr = xml.CreateReader xmlrdr.Read() StrVal += xmlrdr.ReadOuterXml() + Chr(13) + “-----------------------” + Chr(13) Loop Me.txtResults.Text = StrVal Catch ex As Exception MessageBox.Show(ex.Message) End Try Command.Dispose() Connection.Close()
The changes made to the Click event now allow for the display of all the records. A SqlDataReader is still created, but the difference in the code is that a Do-While loop is used to loop through all the records returned. With each loop of the Do-While loop, a Read() is executed to read and retrieve the next record from the SqlDataReader. Then the same code used in the previous example, the GetSqlXml and CreateReader methods, are used to read the XML from the first column. Figure 10-2 shows the results of the TeamInfo column returned from all the rows.
Figure 10-2
Returning the entire XML document of one or all of the records is great functionality if that is the requirement, but what if the requirement is to return a particular value or set of values from the XML document? This can be accomplished just as easily with only a little modification. In this next example, each rider and their associated bike class is returned for the first row in the table. Modify the Click event code behind the button as follows: Dim Connection As SqlConnection Dim Command As SqlCommand Connection = New SqlConnection
193
Chapter 10 Command = New SqlCommand Try ‘ENTER YOUR OWN USER NAME AND PASSWORD Connection.ConnectionString = “Server=localhost;Database=Wrox;UID=;PWD=” Connection.Open() Command.Connection = Connection Command.CommandText = “SELECT TeamInfo FROM Motocross” Dim r As SqlDataReader = Command.ExecuteReader r.Read() Dim xml As SqlXml = r.GetSqlXml(0) Dim xmlrdr As XmlReader = xml.CreateReader Do While xmlrdr.Read() Dim i As Integer For i = 0 To xmlrdr.AttributeCount - 1 xmlrdr.MoveToAttribute(i) Me.txtResults.Text += xmlrdr.Name + “=” + xmlrdr.Value + vbCrLf Next i xmlrdr.MoveToContent() If xmlrdr.Name = “Name” Then Me.txtResults.Text += xmlrdr.Name + “=” + xmlrdr.ReadElementString + vbCrLf End If ‘Move to the next element element. xmlrdr.MoveToElement()
Loop Catch ex As Exception MessageBox.Show(ex.Message) End Try Command.Dispose() Connection.Close()
Figure 10-3 shows each rider with his associated bike class. This example uses the same method as the previous examples. A SqlDataReader is used to retrieve the data from the xml data type column TeamInfo. The GetSqlXml method is then used to access the xml data field in the rowset of the SqlDataReader.
194
Client-Side Support for the xml data type
Figure 10-3
The CreateReader method is used to retrieve the value of the XML content of the SqlXml class as an XmlReader, which is then used to read and display the results in the text box as shown in Figure 10-3. The XML document is read, looping through the document, reading each element and attribute looking for a specific element name. Every time the specified element is found, the value of that element is read, as is the corresponding Class attribute, and those values are written to the output window. Updating and inserting data using the SqlXml class is not that much different from the previous examples, and actually is quite easy. The following examples demonstrate updating existing records and inserting new records using the SqlXml class.
Updating Data with the SqlXml Class The first example in this section updates an existing xml data type column. First, execute the following SQL statement that inserts a new row into the Motocross table with a NULL value for the xml data type column. Open a SQL query window in SQL Server Management Studio and execute the following INSERT statement: INSERT INTO Motocross SELECT NULL GO
To verify the data in the table, run a query to select all the data from the table. Figure 10-4 shows the table with five rows, including the row with a NULL value for the TeamInfo column you just inserted.
Figure 10-4
195
Chapter 10 The NULL value in the record just created is updated and replaced with a valid XML document in the following example. For this next example, add a new button to the form, setting the following properties: Property
Value
Text
SqlXml Class 2
Name
cmdSqlXml2
Location
118, 12
Double-click the newly added button to view the code behind it, and add the following code: Dim Connection As SqlConnection Dim Command As SqlCommand Dim XmlStr As String Connection = New SqlConnection Command = New SqlCommand XmlStr = “” & _ “Grant Langston8” & _ “” Try ‘ENTER YOUR OWN USER NAME AND PASSWORD Connection.ConnectionString = “Server=localhost;Database=Wrox;UID=;PWD=” Connection.Open() Command.Connection = Connection Command.CommandText = “UPDATE Motocross SET TeamInfo = @xmlvar WHERE TeamID = 5” Dim sqlparam As SqlParameter = Command.Parameters.Add(“@xmlvar”, Data.SqlDbType.Xml) sqlparam.Value = New SqlXml(New XmlTextReader(XmlStr, XmlNodeType.Document, Nothing)) Command.ExecuteNonQuery() Me.txtResults.Text = “SUCCESS!” Catch ex As Exception MessageBox.Show(ex.Message) End Try Command.Dispose() Connection.Close()
196
Client-Side Support for the xml data type Save the project and run the program. Click the new button you just added. The word “SUCCESS!” will display in the text box to let you know the code has finished running and that the update was successful. To validate that the insert was successful, open a SQL query window in SQL Server Management Studio and execute a query to return all the rows in the Motocross table. Figure 10-5 shows the results of the update.
Figure 10-5
In this example, you used the SqlXml class together with the UPDATE statement to update the NULL column for the record recently created. You used the SqlCommand class to execute an UPDATE statement, which included a variable to hold the place of a SqlParameter. You used the Parameter property to add a parameter to the UPDATE statement, which is set in the @xmlvar variable. Notice that you specified the parameter type as an xml type by using the Data.SqlDbType.Xml property. The update resulted in the replacement of the NULL value with the XML document contained in the XmlStr variable. You use the SqlXml class to set the parameter value with the XmlTextReader class to provide the XML document, held in the XmlStr variable, to the SqlCommand parameter collection. When the UPDATE statement executes, the @XmlVar variable is passed to the UPDATE statement to be used, and the value is inserted into the TeamInfo column.
Inserting Data with the SqlXml Class This next example uses the same technique to insert a new record into the Motocross table. Modify the code behind the SqlXml Class 2 button used in the previous example as follows: Dim Connection As SqlConnection Dim Command As SqlCommand Dim XmlStr As String Connection = New SqlConnection Command = New SqlCommand XmlStr = “”&_ “Mike LoRocco5” & _ “Kevin Windham14” & _ “” Try ‘ENTER YOUR OWN USER NAME AND PASSWORD
197
Chapter 10 Connection.ConnectionString = “Server=localhost;Database=Wrox;UID=;PWD=” Connection.Open() Command.Connection = Connection Command.CommandText = “INSERT INTO Motocross (TeamInfo) VALUES (@xmlvar)” Dim sqlparam As SqlParameter = Command.Parameters.Add(“@xmlvar”, Data.SqlDbType.Xml) sqlparam.Value = New SqlXml(New XmlTextReader(XmlStr, XmlNodeType.Document, Nothing)) Command.ExecuteNonQuery() Me.txtResults.Text = “SUCCESS!” Catch ex As Exception MessageBox.Show(ex.Message) End Try Command.Dispose() Connection.Close()
Run the program and click the button. As in the previous example, the text box displays “SUCCESS!” when the code executes. In a query window in SQL Server Management Studio, execute a query to select all the rows from the Motocross table. Figure 10-6 shows the newly added record.
Figure 10-6
This example was really no different from the previous example. Instead of issuing an UPDATE statement, you issued an INSERT statement instead. The method of creating and passing the parameters remains the same using the SqlXml class and XmlTextReader.
Summar y The purpose of this chapter was to introduce you to the SqlXml class of the System.Data.SqlTypes namespace, and to show you how you can interface with the xml data type on the client side. The chapter provided an overview of the SqlXml class, introduced the CreateReader method, and gave the general syntax, to help provide an idea of how both SqlXml and CreateReader are used to query and read XML documents on the client. The remainder of the chapter was dedicated to a number of examples using the SqlXml class and CreateReader method. In the next chapter, you’ll learn about client-side processing with SQLXML 4.0.
198
Client-Side XML Processing with SQLXML 4.0 The previous chapter focused on client-side support for the xml data type and briefly introduced the technologies contained in SQLXML. Realistically, the xml data type in SQL Server is no good if you can’t do anything with it from the client side. The next few chapters focus on SQLXML and XML processing. When SQLXML 3.0 was released, it provided many welcome capabilities, such as Web Service support, as well as enhancements to existing features like XML bulk load, updategrams, and annotated XSD schemas. SQLXML 4.0, which comes with SQL Server 2005, adds features such as client-side support for the xml data type and the new SQL Native Client provider, and builds on existing features such as client-side formatting with FOR XML. The next five chapters focus entirely on client-side processing with SQLXML 4.0, with the focus of this chapter being the enhancements made to SQLXML and an introduction to the new SQL Native Client. The SQL Native Client is discussed in more detail in Chapter 20 when the topic of SQLXML data access methods is highlighted. The topics of discussion for this chapter are as follows: ❑
Introduction to SQL Native Client
❑
ADO and SQLXML 4.0 classes
❑
Client-side formatting with FOR XML
Chapter 11
SQL Native Client SQL Server 2005 introduces a new technology that combines the earlier data access technologies, such as the ODBC driver and OLE DB provider, and replaces them with the SQL Native Client. This new data access client is an API that combines both the ODBC driver and OLE DB provider into single interface. Besides combining these components, this DLL, called sqlncli.dll, also includes additional features such as support for the xml data type and UDTs (user-defined types). In earlier versions of SQLXML, query execution over HTTP was accomplished using SQLXML virtual directories and the SQLXML ISAPI filter. During the installation of previous versions of SQLXML, a utility called the IIS Virtual Directory Management for SQLXML, shown in Figure 11-1, was installed, which gave users the ability to configure IIS virtual directories and run templates via HTTP.
Figure 11-1
Both of these components were removed from SQLXML 4.0 and replaced with two options. The first option is the native SQL Server 2005 Web Service functionality, discussed in detail in Chapter 17. The second option is to utilize the SQL Native Client and the ADO (ActiveX Data Object) extensions built into SQLXML 4.0, which is discussed in this chapter. Both of these technologies accomplish the same task, but give the developer more XML formatting options. Likewise, it does not take a complete application rewrite to accommodate either of these options. The design goal of the SQL Native Client is to make an easy way to access data from SQL Server, regardless if you are using OLE DB or ODBC. Since the SQL Native Client combines both of those into a single interface, a developer can easily adapt to this new client without completely rewriting the application or changing any of the data access components. The SQL Native Client uses many of the components of MDAC (Microsoft Data Access Components) and will work with version 2.6 or higher, or any version that is installed with Windows 2000 SP3 or later. As you will see in the next section, the SQL Native Client also works with ADO (ActiveX Data Objects), providing access to all SQL Native Client functionality via ADO. The following list details the benefits and features of using the SQL Native Client:
200
❑
xml data type: Provides support for the xml data type on the client side.
❑
User-defined types: Provides support for UDTs on the client side.
❑
Execution of multiple result sets: Provides the capability to execute and return multiple result sets via a single connection.
Client-Side XML Processing with SQLXML 4.0 ❑
Asynchronous operation: Methods are now returned immediately, eliminating calling thread blocking issues.
❑
Password expiration: Users can now change their expired passwords without administrator intervention.
SQL Native Client and MDAC Differences Both the SQL Native Client and MDAC provide access to SQL Server, but you need to understand the numerous differences between them. The SQL Native Client incorporates many of the MDAC components, but it is specifically designed to work with the new features and enhancements made to SQL Server 2005. For example, MDAC by itself does not support the SQL Server 2005 xml data type, but the SQL Native Client does. Following is a list that highlights some of the areas where the SQL Native Client and MDAC differ: ❑
SQL Native Client does not support connection pooling, memory management, and other MDAC-accessible features.
❑
SQL Native Client does not support SQLXML integration.
❑
SQL Native Client supports only SQL Server version 7.0 and higher.
❑
SQL Native Client supports only the OLE DB and ODBC interfaces.
❑
To make distribution easier, all the necessary data access functionality in the SQL Native Client has been included in a single DLL interface.
This list is not exhaustive, as it is intended to highlight the bigger differences between the two technologies. The intent of the SQL Native Client is to simplify data access to SQL Server. The client tools for SQL Server are available for those who need a broader range of data access.
Deployment Considerations The SQL Native Client is installed by default when you install SQL Server 2005 and you can also install it as a separate component for client installations. The installation file, SQLNCLI.msi, is on the SQL Server 2005 installation CD; you use it to install the SQL Native Client on client computers. By separating this component into its own install component, SQL Native Client can be more easily distributed, and even included in an application’s installation routine. For now, the SQL Native Client install runs in silent mode only.
xml Data Type Support When you query an xml data type column using the SQL Native Client, the results are returned either as a text stream or an ISequentialStream. The ISequentialStream interface is the preferred method for reading and writing BLOBs (Binary Large Objects). If you recall Chapter 4’s discussion of the xml data type, XML documents and XML instances are stored in the xml data type column and BLOBs, and therefore can either be returned on the client side as strings, or by using the ISequentialStream interface.
201
Chapter 11
CreateReader() The SQLXML class also contains a method called CreateReader(), which returns the results of a query as an XMLReader instance, ready to read the XML. This access is available via ADO.NET 2.0 of the .NET Framework and is discussed in greater detail in Chapter 23.
SQLXML 4.0 Queries with ADO One of the biggest design goals of SQLXML 4.0 was to make data access easier without having to rewrite an entire application. As stated previously, one of the options for data access is to utilize the ADO extensions built into SQLXML 4.0. These extensions were first introduced in early versions of the Microsoft Data Access Components (MDAC) library, so they are available as long as MDAC 2.6 or later is present. This section demonstrates a couple of examples using SQLXML with ADO to query data and format the returned information into XML using client-side XML formatting. Open a query window in SQL Server Management Studio, type in the following T-SQL statement, and execute it against the AdventureWorks database. This code creates a stored procedure called GetProducts, which returns the ProductID and ProductName columns: CREATE PROCEDURE GetProducts AS SELECT ProductID, Name, ProductNumber FROM Production.Product GO
In Visual Studio 2005, create a new Visual Basic Windows project. Name the project SqlCliTestApp and click OK. The project creates a new form called Form1. Open this form in design view, and from the toolbox on the left side of the designer, drop a text box and button onto the form. For the text box, set the following properties to the corresponding values: Property
Value
multiline
True
ScrollBars
Vertical
Width
391
Height
109
Name
txtResults
For the button, set the following properties to the corresponding values:
202
Client-Side XML Processing with SQLXML 4.0 Property
Value
Name
cmdADOExample
Text
ADO Example
Prior to entering any code, you need to add the appropriate references. From the Project menu, select Add Reference. In the Add Reference dialog, select the COM tab and scroll down and select Microsoft ActiveX Data Objects 2.8 Library (see Figure 11-2). Click OK.
Figure 11-2
Now that you have appropriate references set, double-click the button on Form1 to display the code window. In the Click event for the cmdADOExample button, enter the following code: Dim Dim Dim Dim Dim Dim Dim
InStream As ADODB.Stream conn As ADODB.Connection cmd As ADODB.Command strconn As String dbGuid As String Userid As String Password As String
InStream = New ADODB.Stream conn = New ADODB.Connection cmd = New ADODB.Command dbGuid = “{5d531cb2-e6ed-11d2-b252-00c04f681b71}” strconn = “Provider=SQLXMLOLEDB;Data Provider=SQLNCLI;Server=localhost;” & _
203
Chapter 11 “Database=AdventureWorks” Userid = “Type your username here” Password = “Type your password here” Me.Cursor = Cursors.WaitCursor Try conn.Open(strconn, Userid, Password) cmd.ActiveConnection = conn cmd.CommandText = “” & _ “EXEC GetProducts FOR XML NESTED” & _ “” InStream.Open() cmd.Dialect = dbGuid cmd.Properties(“Output Stream”).Value = InStream cmd.Execute(, , 1024) InStream.Position = 0 InStream.Charset = “utf-8” Me.txtResults.Text = InStream.ReadText() conn.Close() Catch ex As Exception MessageBox.Show(ex.Message.ToString) End Try Me.Cursor = Cursors.Default
To run this example, press F5 or select Start from the Debug menu. When Form1 opens, click the ADO Example button. When you press the button, a connection is made to the specified SQL Server 2005 instance and database. The template query is then passed to the command object and executed. The first line to notice is the SQLXMLOLEDB provider command object, which can only execute to a Stream; thus the few lines of code set the OutputStream property value to a Stream and execute it to a Stream. Focus on the template query for a moment. There are two pieces that should stand out. The first is the client-side-xml=1 attribute in the element. This tells the SQL Server that the XML formatting will be done on the client side. When the stored procedure is executed the results are returned to the middle tier for XML formatting. The second item to take note of is the FOR XML NESTED clause after the stored procedure name. Even though the FOR XML clause is passed to SQL Server, it is ignored because the client-side-xml attribute is set to 1. When the results are retuned from SQL Server, the FOR XML clause is then applied to the results. The remaining section of code sets the Stream position and Stream Character set, and then displays those results in the text box.
204
Client-Side XML Processing with SQLXML 4.0 Figure 11-3 shows the results of the query.
Figure 11-3
The previous example used a SQL Server stored procedure to query and return data from a table, and then passed the results back to the client for formatting. The same can be accomplished by passing in a T-SQL statement directly, as shown in the following example: Dim Dim Dim Dim Dim Dim Dim
InStream As ADODB.Stream conn As ADODB.Connection cmd As ADODB.Command strconn As String dbGuid As String Userid as String Password as String
InStream = New ADODB.Stream conn = New ADODB.Connection cmd = New ADODB.Command dbGuid = “{5d531cb2-e6ed-11d2-b252-00c04f681b71}” strconn = “Provider=SQLXMLOLEDB;Data Provider=SQLNCLI;Server=localhost;” & _ “Database=AdventureWorks” Userid = “Type your SQL Server Login here” Password = “Type your SQL Server Password HERE” Me.Cursor = Cursors.WaitCursor Try conn.Open(strconn, Userid, password) cmd.ActiveConnection = conn cmd.CommandText = “” & _ “” & _ “SELECT ProductID, Name, ProductNumber” & _ “FROM Production.Product FOR XML NESTED” & _ “” InStream.Open() cmd.Dialect = dbGuid cmd.Properties(“Output Stream”).Value = InStream
205
Chapter 11 cmd.Execute(, , 1024) InStream.Position = 0 InStream.Charset = “utf-8” Me.txtResults.Text = InStream.ReadText() conn.Close() Catch ex As Exception MessageBox.Show(ex.Message.ToString) End Try Me.Cursor = Cursors.Default
Run the project by pressing F5. The results are the same; the only difference is that the first executed a stored procedure while the second passed in the T-SQL statement. The main thing to notice between the two is the similarity in the FOR XML clause at the end of each. In the first example, the FOR XML clause was appended to the end of the stored procedure, telling SQL Server that the XML formatting will be done at the client. The same principle applies to the second example. The FOR XML clause was appended to the T-SQL statement, telling SQL Server to return the results to the client and let the client do the XML formatting. In either case, the results of the XML formatting are the same. This example is really no different from the first example. The entire T-SQL statement, including the FOR XML clause, is passed to SQL Server for execution, but the FOR XML clause is ignored because the client-side-xml attribute is set to a value of 1. Just like the first example, the query is executed on the server and the results are passed back to the middle for formatting, which is then passed to the client. Both of these examples utilize the SQL Native Client as the data provider, which is specified in the connection string. This section provided a quick introduction to SqlXml 4.0 and ADO and how they can be used to return and format XML on the client. Now that you have the foundation, it is time to dig deeper into client-side formatting with FOR XML.
Client-Side Formatting with FOR XML Formatting XML on the client side is not new. The FOR XML clause has been around for quite a while and has provided great benefits when it comes to client-side XML formatting. There are two main reasons why you would want to consider client-side XML formatting. First, client-side formatting provides a more balanced workload on the server. By letting the client provide the formatting, the server is freed up for other processes. Second, existing stored procedures do not have to be modified for client-side XML formatting. As long as the stored procedure returns a single result set, client-side XML formatting can be applied to the results returned from the stored procedure. The first example in the previous section used a stored procedure to query data from the Production.Product table using the following syntax: EXEC GetProducts FOR XML NESTED
206
Client-Side XML Processing with SQLXML 4.0 The syntax of the GetProducts stored procedure, as seen a few pages ago, contains no XML formatting because the XML formatting happens on the client side. Thus, the GetProducts stored procedure did not have to be modified. This section focuses entirely on the client-side enhancements to FOR XML and walks through some examples to help you become familiar with the client-side FOR XML clause. The next section begins by discussing the SqlXml architecture and moves on to discussing a deeper discussion of client-side formatting with some examples.
SQLXML Architecture XML documents can be formatted on either the client side or the server side. In server-side formatting, which was covered in depth in Chapter 8, the command is sent from the client to the server. The server processes the command and formats the results in XML and sends it back to the client. There are two options when using server-side formatting. The first is to use the SQLXMLOLEDB provider, which uses the new SqlXmo4.dll that is installed when you install SQLXML 4.0. This new DLL is similar to the previous version of the DLL, sqlxmlx.dll. It provides all the necessary XML formatting capabilities and extensions to format your query results into XML. The second option is to use the SQLOLEDB provider, which comes with MDAC (Microsoft Data Access Components) 2.6 or later. The SQLOLEDB provider includes the same SQLXML functionality as the previous version of the SqlXml.dll, Sqlxml.dll (notice this did not say Sqlxml4.dll). In both of these scenarios, SQL Server 2005 is required. If you want to use SQLOLEDB and still get the Sqlxml4 flexibility, the SQLXML version needs to be set in the SQLOLEDB connection object. Regardless of what provider you use, the XML is formatted on the server and returned to the client. For client-side XML formatting, SQLXML 4.0 uses the SQLXMLOLEDB provider, which passes the command from the client to the server for execution. The SQL Server 2005 server generates a rowset with the results, and hands it back to the client for formatting, performed against the returned rowset.
Choosing Between Client-Side and Server-Side XML Formatting There is a handful of formatting differences in SQLXML when deciding between client-side and serverside XML formatting. First and foremost, the use of queries that generate multiple result sets is not supported. This was true of SQLXML 3.0 and still applies in SQLXML 4.0. For example, the following query generates an error:
Adding annotations in an XSD schema, like the one here, specifies the XML to relational mapping. In the example, the sql:relation annotation specifies the table in which this schema retrieves its information. The sql:field annotations specify from which columns in the table the schema retrieves its information for the specified elements or attributes. sql:field cannot specify an empty element. Now that you’ve added the annotations, you can query this XML view using XPath returning an XML document. Querying XML views is covered later in this chapter.
sql:relation You use the sql:relation annotation to map a node in the XSD schema to a database table, specifically the table on which the XML document and schema is based. The value of the annotation holds the name of the table. sql:relation annotations, when specified on an element node, apply the annotation to all other elements and attributes within the complex type definition under which the sql:annotation is specified.
The syntax for specifying a sql:relation annotation is as follows: sql:relation = “tablename”
In this example, the sql:relation annotation maps the XSD schema to the Production.Product table in the AdventureWorks database by specifying the table name for the value of the annotation as follows: sql:relation=”Production.Product”
213
Chapter 12 In this example, the sql:relation annotation was used to map the Production.Product table to the XML node in the XSD schema:
xmlns:sql=”urn:schemas-microsoft-com:mapping-schema”>
...
There are times when a table name and column name are valid in SQL Server but not in XML. For example, a valid column name such as Product Name is valid in SQL Server, but invalid in XML. In these cases, the sql:relation annotation comes in handy, as the sql:relation annotation can be used to specify the mapping to the table:
There is really not much to the sql:relation annotation, but you can do a lot more with the XML view. For example, you can relate elements within an XML document to each other with the sql:relationship annotation, which is the subject of the next section.
sql:relationship The sql:relationship annotation provides the capability to relate elements within an XML document and nest elements hierarchically. In an XSD schema, this annotation nests the elements by the primary and foreign key relationships of the tables on which the schema is based, or in other words, on which the elements map. The syntax for using the sql:relationship annotation is as follows:
When you specify the sql:relationship annotation in an XSD schema, the following must be present: ❑
A parent table and a child table
❑
A join condition
For example, a product can have multiple product reviews; therefore, a element can have subelements. Continuing the example, the element maps to the Production .Product table and the element maps to the Production.ProductReview table, linked together via the ProductID, thus satisfying the join condition. The relationship between the elements is handled by the sql:relationship annotation.
214
Creating and Querying XML Views The following list contains the attributes available with the sql:relationship annotation, which provide the relationship mapping between the tables. These attributes can be used only with the sql:relationship annotation: ❑
Name: The name of the relationship; must be unique.
❑
Parent: The name of the parent table.
❑
Child: The name of the child table.
❑
Parent-key: The parent key (or primary key) of the parent table.
❑
Child-key: The child key (or foreign key) of the child table.
The Parent attribute is optional, and when it is not included, the name of the parent table is retrieved from the XML document based on the child hierarchy. It is possible for the Parent-key attribute to contain more than one column, and in these cases the names are included, separated by a space. Position matters here, as the order of the values that are specified correspond, or map, to the corresponding child key. It is also possible for the Child-key attribute to contain more than one column, and just like the Parent-key attribute, the names are separated by a space. Again, position matters, as the order of the values map to the corresponding parent key. The following annotated schema, using a named relationship, illustrates the relationship mapping between the Product and ProductReview tables using the ProductID column as the key linking the two tables, as identified in the element. As stated previously, a product can have multiple product reviews, so in this example the sql:relationship annotation is on the subelement:
215
Chapter 12
As stated earlier, the example uses a named relationship for the mapping, but the same results could be obtained using an unnamed relationship as follows:
In this example, the elements are unnamed, but the results are the same. A portion of the results is shown here:
...
216
Creating and Querying XML Views In this example, the sql:relationship annotation is used to specify the relationship between the Production.Product table and the Production.ProductRreview table. In that annotation, the parent and child attributes specify the parent and child tables along with the keys on which the two tables are joined. Don’t worry about how to run these examples just yet; they’re for explanation purposes and you’ll get your hands on some examples shortly. The next example joins three tables, chaining together the three tables with the sql:relationship annotation to create two relationships:
In this example, the sql:key-fields annotation (more on that in the next section) is used to identify the CategoryID and SubCategoryID columns that uniquely identify each row in the relationship. A portion of the results is shown here:
217
Chapter 12
...
...
...
...
Pay close attention to the sql:relationship annotation on the element. The annotation specifies two values, which are the name values of the two relationships. As stated previously, the order of these two values is important. The ProdCat relationship defines the parent-child relationship between the Production.ProductCategory and Production.ProductSubCategory tables. The ProdSubCat relationship defines the parent-child relationship between the Production.ProductSubCategory and Production.Product tables. You should have a good grasp now of how the sql:relation and sql:relationship annotations work, so it’s time to look at the sql:key-fields annotation.
sql:key-fields There are two reasons why you would use the sql:key-fields annotation in a schema. First, it is a great way to ensure that the appropriate nesting hierarchy is created, and in these cases it is best to use the annotation on elements that map to tables. Second, you use the sql:key-fields annotation when an element contains a sql:relationship annotation that defines an element and its corresponding child element, but no primary key is specified in the parent element. In these cases the sql:key-fields annotation is required to ensure the proper nesting. The sql:key-fields annotation syntax is as follows: sql:key-fields=”uniquekeycolumns”
The value for this annotation is the column, or columns, that uniquely identify each row in the relationship. If a single column is used it could be the primary key for the primary table. For example, the syntax would look like the following for a single-column value:
218
Creating and Querying XML Views sql:key-fields=”EmployeeID”
In a multiple-column value, each key column is separated by a space, as follows: sql:key-fields=”EmployeeID FirstName”
The following example uses the sql:key-fields annotation to produce proper nesting in the results, as there is no hierarchy specified by the sql:relationship annotation. The sql:key-fields annotation is necessary to identify products in the Production.Product table distinctively:
The next example uses the sql:key-fields to specify the key fields in both the Production .ProductModel and Production.Product tables to ensure that the two tables have the correct hierarchy and the proper node nesting in the resulting XML:
219
Chapter 12
In this example, the sql:key-field annotation is used to properly establish the hierarchy between the two tables and obtain the appropriate nesting for the results. A portion of the results show the ProductIDs for the corresponding ProductModelIDs:
...
Now that you have a good understanding of how relationships can be used in XML schemas, you’re ready to get started with some coding examples.
Quer ying XML Views Querying XML views is really no different than the querying you have done in previous chapters. Up until now, you have executed queries primarily using T-SQL statements either via a stored procedure or in-line T-SQL. The queries in the following examples use XML views just like the ones you have learned so far in the chapter. In a text editor such as Notepad, enter the following and save the file as SqlField.xml in the C:\Wrox\ directory:
xmlns:sql=”urn:schemas-microsoft-com:mapping-schema”>
In Visual Studio 2005, open the SqlCliTestApp project that you created for the examples in Chapter 11. Open Form1 in Design mode, add a button to the form, and set the caption property to XML View. Double-click the button you just added to display the code window. In the click event, enter the following code (similar to the code entered in Chapter 11). Make sure the path to the XML document is specified correctly: Dim Dim Dim Dim Dim Dim Dim
InStream As ADODB.Stream conn As ADODB.Connection cmd As ADODB.Command strconn As String dbGuid As String Userid as String Password as String
InStream = New ADODB.Stream conn = New ADODB.Connection cmd = New ADODB.Command dbGuid = “{5d531cb2-e6ed-11d2-b252-00c04f681b71}” strconn = “Provider=SQLXMLOLEDB;Data Provider=SQLNCLI;Server=localhost;” & _ “Database=AdventureWorks” Userid = “Type your SQL Server Login HERE” Password = “Type your SQL Server Password HERE” Me.Cursor = Cursors.WaitCursor Try conn.Open(strconn, Userid, password) cmd.ActiveConnection = conn cmd.CommandText = “” & _ “/Product” & _ “” InStream.Open() cmd.Dialect = dbGuid
221
Chapter 12 cmd.Properties(“Output Stream”).Value = InStream cmd.Execute(, , 1024) InStream.Position = 0 InStream.Charset = “utf-8” Me.txtResults.Text = InStream.ReadText() conn.Close() Catch ex As Exception MessageBox.Show(ex.Message) End Try Me.Cursor = Cursors.Default
Figure 12-1 shows the results from the query.
Figure 12-1
As specified in the schema, the ProductID, mapped to the ProductID column, is an attribute to the Product element. Both the ProductName and ProductNumber elements are mapped to the Name and ProductNumber columns using the sql:field annotation, while the sql:relation annotation is used to map the Product element to the Production.Product table. This next example uses the sql:relationship annotation to hierarchically nest the schema elements in the result based on the primary and foreign key relationships. Taking from the first sql:relationship example preceding, type the following code into your text editor and save it as sqlrelationship.xml in the C:\Wrox directory:
222
Creating and Querying XML Views
Next, modify the click event code for the XML View button in the SqlCliTestApp project as follows: Dim Dim Dim Dim Dim
InStream As ADODB.Stream conn As ADODB.Connection cmd As ADODB.Command strconn As String dbGuid As String Dim Userid as String Dim Password as String
InStream = New ADODB.Stream conn = New ADODB.Connection cmd = New ADODB.Command dbGuid = “{5d531cb2-e6ed-11d2-b252-00c04f681b71}” strconn = “Provider=SQLXMLOLEDB;Data Provider=SQLNCLI;Server=vssql2005;” & _ “Database=AdventureWorks” Userid = “Type your SQL Server Login HERE” Password = “Type your SQL Server Password HERE” Me.Cursor = Cursors.WaitCursor Try conn.Open(strconn, Userid, password) cmd.ActiveConnection = conn cmd.CommandText = “” & _ “” & _ “/Product[@ProductID=937]” & _
223
Chapter 12 “” InStream.Open() cmd.Dialect = dbGuid cmd.Properties(“Output Stream”).Value = InStream cmd.Execute(, , 1024) InStream.Position = 0 InStream.Charset = “utf-8” Me.txtResults.Text = InStream.ReadText() conn.Close() Catch ex As Exception MessageBox.Show(ex.Message) End Try Me.Cursor = Cursors.Default
Clicking the XML View button should display the results shown in Figure 12-2.
Figure 12-2
The values returned show the relationship between the two tables. The Product element comes from the Production.Product table, returning the individual ProductID as an attribute. The sql:relationship annotation is used to define the relationship between the Product and ProductReview tables, with the associated ProductReview table and columns being mapped accordingly to the elements and attributes. This next example builds on the previous example, mapping relationships among three tables, as shown in the second sql:relationship annotation example. Open your text editor and type the following schema and save it as sqlrelationship2.xmlin the C:\Wrox directory:
Modify the click event code for the XML View button and then run the program: cmd.CommandText = “” & _ “” & _ “/ProductCategory” & _ “”
Clicking the XML View button should display the results shown in Figure 12-3.
Figure 12-3
225
Chapter 12 In this example, the sql:relationship annotation was used to map the relationship among the ProductCategory, ProductSubCategory, and Product tables. For each ProductCategory, the associated product names are returned. The next two examples use the sql:key-fields annotation. The first example uses the annotation to ensure proper nesting. In this example, no sql:relationship annotation is specified; thus no hierarchy is defined. To fulfill the requirement of having a distinctly identified key (in this case, the ProductID), it is necessary to specify the sql:key-fields annotation. Open your text editor and type the following, saving it as sqlkeyfield.xml in the C:\Wrox directory:
In the code behind the XML View button, change the following code and then run the program: cmd.CommandText = “” & _ “” & _ “/Production.Product” & _ “”
Clicking the XML View button should display the results shown in Figure 12-4.
Figure 12-4
226
Creating and Querying XML Views In this example, the sql:key-fields annotation provides the proper nesting and forming of the results because no sql:relationship annotation is specified. The next example continues with the sql:key-fields annotation, but uses it to ensure the proper hierarchy. In this example, you use the sql:key-fields annotation to help uniquely identify the proper hierarchy because the sql:relationship annotation information does not provide the primary key of the parent table in the parent element. Open your text editor and type in the following, saving it as sqlkeyfield2.xml in the C:\Wrox directory:
Now modify the code behind the XML View button and make the following changes: cmd.CommandText = “” & _ “” & _ “/ProductModel” & _ “”
Run the SqlCliTestApp program and click the XML View button. The displayed results should look like Figure 12-5.
227
Chapter 12
Figure 12-5
The results can be filtered by modifying the XPath statement in a similar way as the following: cmd.CommandText = “” & _ “” & _ “/ProductCategory[@ProductModelID=20” & _ “”
This example used an XPath query against the XSD schema to return specific information, in this case key information such as the ProductID. Since a sql:relationship annotation was also specified, the sql:key-fields annotation is used to ensure the correct hierarchy and nesting of the elements. By now you should have a good grasp on how XML schema views work and how to use the annotations to produce the appropriate relationships and relational mappings. You can now move on to learn how best to use your new knowledge.
Best Practices Security can be a big factor when using annotated schemas, so there are a few items to consider. Using default mapping and explicit mapping exposes such database information as the table and column names. Default mapping is an issue because the element and attribute names map to table and column names, respectively, and you must consider the ramifications of making the schemas publicly available if there is such a need. The alternative is to give non-meaningful names to the elements and attributes in the schema to explicitly map them back to the corresponding tables and attributes.
Summar y The purpose of this chapter was to create the basic foundation and building blocks to understanding XML views and how to query them. You have seen the basic makeup of an XML view, and the requirements necessary to map the elements and attributes back to the corresponding table and columns. You
228
Creating and Querying XML Views have seen how to define relationships and when two or more tables are included in the schema, and even how to set the appropriate hierarchy when the relationship information is not enough to appropriately define the relationship. You have also seen how to apply the technology you learned in Chapter 11, using ADO to execute SQLXML queries, query these views, and return the necessary information. Chapter 13 builds on what you learned in this chapter and discusses using the updategram to update the XML View.
229
Updating the XML View Using Updategrams This chapter focuses on updating a database directly using updategrams and the XML view. The information in this chapter builds on the previous two chapters in which you learned about using ADO and OLEDB to query a database via the SQL Native Client. Updategrams, introduced in SQL Server 2000, are another option you can use to update, delete, and insert data into your database. Before updategrams, the first option was to use the OPENXML feature of SQLXML. Both updategrams and OPENXML provide the same results, that is, inserting, deleting, and updating of data. The difference between the two is that updategrams use XML views and the mapping schema, which you learned about in Chapter 12, to provide the functionality. The schema contains the mapping of elements and attributes back to the tables and columns in the database. In particular, this chapter covers the following topics: ❑
Overview and structure of updategrams
❑
Mapping schemas and updategrams
❑
Using updategrams to modify data
❑
Passing parameters
❑
updategram concurrency
❑
NULL handling
❑
updategram security
❑
Guidelines and limitations
Chapter 13
Over view and Structure updategrams offer the capability through XML to modify a database directly without shredding the XML document, as does OPENXML. Although OPENXML is great for dealing with rowset providers by generating operational statements to modify data, updategrams don’t do any XML document shredding. updategrams work directly against XML views (which you learned about in Chapter 12) and their associated mapping schemas (which you learned about in Chapter 7). The mapping schemas contain the information that is required to bind, or map, the elements and attributes back to the associated tables and columns for quick and efficient processing. The structure of an updategram is simply a template with a predefined set of tags that merely give a before and after picture of the data when the updategram is executed. The basic syntax of an updategram looks like this:
...
...
The following list defines each piece of the updategram and explains what the syntax means. ❑
Block: The block encompasses the and blocks, and can contain multiple and blocks. If multiple and blocks are within the block, then each and needs to be designated as a pair.
An updategram can contain multiple blocks, each containing a transactional unit. In other words, everything in the block executes, or nothing executes. The failure of one block to execute does not affect the execution of other blocks. ❑
: Defines the state of the data as it currently is, prior to execution. This is commonly known as the before state.
❑
: Defines the state of the data after the execution occurs. This is commonly known as
the after state. ❑
232
Namespace: The , , and keywords used in an updategram are provided by the urn:schemas-microsoft-com:xml-updategram namespace. The namespace prefix is not predetermined, and can be determined by you. In the previous structure example, the prefix is defined as updgrm.
Updating the XML View Using Updategrams
Mapping Schemas and Updategrams As you learned in Chapter 12, the schema can have either implicit or explicit mapping. Implicit mapping simply means that a mapping schema has not been specified, so the updategram takes on implicit mapping. Explicit mapping means that the elements and attributes in the updategram have been explicitly mapped to the elements and attributes in the mapping schema. The default mapping is implicit mapping (covered in the following section), meaning that proper nesting of elements and subelements is essential. Each element in the and blocks maps to a table, and the subelements and attributes map to a column.
Implicit Mapping The following example illustrates implicit mapping. The element implicitly maps to the Production.Product table and the Name attribute implicitly maps to the Name column in the Production.Product table:
This next example uses implicit mapping to update the same record:
In this example, no mapping schema is associated to the updategram, so it takes on implicit mapping. When you use implicit mapping, the element maps to the Production .ProductModel table and the Name attribute maps to the corresponding columns in the Production .ProductModel table.
Explicit Mapping Explicit mapping, as mentioned previously, simply means associating a mapping schema to the updategram. The following example is a mapping schema that is then mapped to an updategram. For the sake of this example, this mapping schema is called exampleupdgrmschema.xml:
233
Chapter 13
The next step is to map the schema to an updategram that explicitly maps the elements and attributes to the table and columns. The updategram, referencing the mapping schema in the second line, looks like the following:
With the mapping in place, the updategram updates the ModelID column in the Product table with the value specified in the ModelID attribute in the block.
Modifying Data As with all the examples so far in this book, the examples in this section are executed using the new functionality you learned in Chapter 12, using ADO and SQLXML to execute updategrams. In order to execute these examples, the SqlCliTestApp application needs to be modified (just so the previous code and examples aren’t messed with). Open the SqlCliTestApp application, and then open the form in design view. Add a button on the form. Set the Text property to “updategram” and set the Name property to cmdupdategram. Double-click the button you just added to display the code window. In the click event for the updategram button, enter the following code: Dim InStream As ADODB.Stream Dim conn As ADODB.Connection Dim cmd As ADODB.Command
234
Updating the XML View Using Updategrams Dim Dim Dim Dim Dim
strconn As String dbGuid As String Userid as String Password as String cmdText as String
InStream = New ADODB.Stream conn = New ADODB.Connection cmd = New ADODB.Command dbGuid = “{5d531cb2-e6ed-11d2-b252-00c04f681b71}” strconn = “Provider=SQLXMLOLEDB;Data Provide=SQLNCLI;Server=localhost; “ & _ “Database=AdventureWorks” Userid = “Type your SQL Server Login here” Password = “Type your SQL Server Password HERE” Me.Cursor = Cursors.WaitCursor Try conn.Open(strconn, Userid, password) cmd.ActiveConnection = conn cmdText = “” ‘This will be filled in later! cmd.CommandText = cmdText InStream.Open() cmd.Dialect = dbGuid cmd.Properties(“Output Stream”).Value = InStream cmd.Execute(, , 1024) conn.Close() MessageBox.Show(“Record Inserted”) Catch ex As Exception MessageBox.Show(ex.Message) End Try Me.Cursor = Cursors.Default
Now you are ready for some examples. Most of the examples in this section use implicit mapping, but a few use explicit mapping.
Inserting Data There are three ways to specify the data you want to insert. They are as follows: ❑
Attribute-centric
❑
Element-centric
❑
Mixed mode
235
Chapter 13 Attribute-Centric Attribute-centric mapping specifies all the columns in which to insert data as attributes to the table element. For example, the following code uses attribute-centric mapping to add a row to the Production ProductCategory table:
In this example, only the Name column needs to be specified because the ProductCategoryID column is an identity column, so the value is automatically generated. The value of the rowguid column is also automatically generated, and the ModifiedDate column in the table has a default set on it so that when a record is inserted into the table, the current date is automatically applied to the column. A multi-column insert would look like the following:
In both of these examples, the column values are listed as attributes to the table element.
Element-Centric Element-centric mapping simply means that the column values are listed as subelements of the table element, as follows:
Go-Karts
236
Updating the XML View Using Updategrams
Another example of element-centric mapping would appear like the following:
10000000.00 5000.00 10.0 750000.00 900000.00
Mixed Mode Mixed mode means that both attribute-centric and element-centric mapping can exist in the same updategram, as shown below:
-1 Then Return UpdateWsdlForVS2005(strWsdlOrg) End If Return strWsdlOrg End Function Private Shared Sub ReturnWSDL(ByVal strWSDL As String, ByVal spPipe As SqlPipe) Dim iMaxLength As Integer = 4000 Dim oMetaData(1) As SqlMetaData oMetaData(0) = New SqlMetaData(“XML_F52E2B61-18A1-11d1-B105-00805F49916B”, SqlDbType.NVarChar, iMaxLength, 1033, SqlCompareOptions.None) If oMetaData(0) Is Nothing Then spPipe.Send(“Error creating the required SqlMetaData object for response.”) GoTo ret End If If strWSDL.Length < iMaxLength Then iMaxLength = strWSDL.Length End If Dim aoResponse(1) As Object aoResponse(0) = New Object If aoResponse(0) Is Nothing Then spPipe.Send(“Error creating the object to hold the SqlDataRecord value.”) GoTo ret End If aoResponse(0) = strWSDL.Substring(0, iMaxLength) Dim oRecord As SqlDataRecord = New SqlDataRecord(oMetaData) If oRecord Is Nothing Then spPipe.Send(“Error creating SqlDataRecord.”) GoTo ret End If spPipe.SendResultsStart(oRecord) Dim iccLeft As Integer = strWSDL.Length - iMaxLength Dim iLength As Integer = strWSDL.Length While iccLeft > 0 If iccLeft > iMaxLength Then
369
Chapter 19 oRecord.SetString(0, strWSDL.Substring(iLength - iccLeft, iMaxLength)) spPipe.SendResultsRow(oRecord) iccLeft = iccLeft - iMaxLength Else oRecord.SetString(0, strWSDL.Substring(iLength - iccLeft, iccLeft)) spPipe.SendResultsRow(oRecord) iccLeft = 0 End If End While spPipe.SendResultsEnd() ret: Return End Sub Private Shared Function UpdateWsdlForVS2005(ByVal strWsdlOrg As String) As String Const strMaxOccurs As String = “maxOccurs=””unbounded””” Dim strReturn As String = strWsdlOrg If Nothing = strReturn Then GoTo ret End If Dim strTemp As String = “” Dim iIndex As Integer = strReturn.IndexOf(“complexType name=””SqlRowSet”””) If iIndex