287 25 10MB
English Pages 1286 Year 2001
Professional XML 2nd Edition
Mark Birbeck Jason Diamond Jon Duckett Oli Gauti Gudmundsson Pete Kobak Evan Lenz Steven Livingstone
AM FL Y
Daniel Marcus Stephen Mohr Nikola Ozu
Jon Pinnock Keith Visco
TE
Andrew Watt
Kevin Williams Zoran Zaev
Wrox Press Ltd.
Team-Fly®
Professional XML 2nd Edition © 2001 Wrox Press
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews. The author and publisher have made every effort in the preparation of this book to ensure the accuracy of the information. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, Wrox Press, nor its dealers or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
Published by Wrox Press Ltd, Arden House, 1102 Warwick Road, Acocks Green, Birmingham, B27 6BH, UK Printed in the United States ISBN 1861005059
Trademark Acknowledgements Wrox has endeavored to provide trademark information about all the companies and products mentioned in this book by the appropriate use of capitals. However, Wrox cannot guarantee the accuracy of this information.
Credits Authors Mark Birbeck Jason Diamond Jon Duckett Oli Gauti Gudmundsson Pete Kobak Evan Lenz Steven Livingstone Daniel Marcus Stephen Mohr Nikola Ozu Jon Pinnock Keith Visco Andrew Watt Kevin Williams Zoran Zaev
Category Managers Dave Galloway Sonia Mulineux
Technical Reviewers Daniel Ayers Martin Beaulieu Arnaud Blandin Maxime Bombadier Joseph Bustos David Carlisle Pierre-Antoine Champin Robert Chang Michael Corning Chris Crane Steve Danielson Chris Dix Sébastien Gignoux Tony Hong Paul Houle Craig McQueen Thomas B. Passin Dave Pawson Gary L Peskin Phil Powers DeGeorge Eric Rajkovic Gareth Reakes Matthew Reynolds David Schultz Marc H. Simkin Darshan Singh Paul Warren Karli Watson
Project Administrator Beckie Stones
Production Co-ordinator Pip Wonson
Author Agent Marsha Collins
Indexers Andrew Criddle Bill Johncocks
Technical Architect Timothy Briggs Technical Editors Phil Jackson Simon Mackie Chris Mills Andrew Polshaw
Proof reader Agnes Wiggers Production Manager Simon Hardware
Diagrams Shabnam Hussain Cover Chris Morris
About the Authors Mark Birbeck Mark Birbeck is Technical Director of Parliamentary Communications Ltd. where he has been responsible for the design and build of their political portal, ePolitix.com. He is also managing director of XML consultancy x-port.net Ltd., responsible for the publishing system behind spiked-online.com. Although involved in XML for a number of years, his special interests lay in metadata, and in particular the use of RDF. He particularly welcomes Wrox's initiative in trying to move these topics from out of the shadows and into the mainstream.
Mark would particularly like to thank his long-suffering partner Jan for putting up with the constant smell of midnight oil being burned. He offers the consolation that at least he will already be up when their first child Louis demands attention during the small hours.
Jon Duckett Jon has been using and writing about XML since 1998, when he co-authored and edited Wrox's first XML publication. Having spent the past 3 years working for Wrox in the Birmingham UK offices, Jon is currently working from Sydney, so that he can get a different view out of the window while he is working and supping on a nice cup of tea...
Oli Gauti Gundmundsson Oli is working for SALT, acting as one of two Chief System Architects of the SALT Systems, and Development Director in New York. He is currently working on incorporating XML and XSL into SALT’s web authoring and content management systems. He has acted as an instructor in the Computer Science course (Java) at the University of Iceland, and Java is one of his greatest strengths (and pleasures!). As a hobby he is trying to finish his BS degree in Computer Engineering. His nationality is Icelandic, but he is currently situated in New York with his girlfriend Edda. He can be reached at [email protected].
Pete Kobak Pete Kobak built and programmed his first computer from a kit in 1978, which featured 256 bytes of RAM and a single LED output. After a fling as an electrical engineer for IBM, Pete gradually moved into software development to support mainframe manufacturing. He earned geek programmer status in the late '80s when he helped to improve Burroughs' Fortran compiler by introducing vectorization of DO loops. Justified by his desire to continue to pay his mortgage, Pete left Burroughs in 1991 to put lives in jeopardy by developing medical laboratory software in OS/2. In 1997, Pete somehow convinced The Vanguard Group to hire him to do Solaris web development, even though he could barely spell “Unix”. He has helped to add new features to their web site since then, specializing in secure web communication. Pete's current interest is in web application security, trying to find the right techniques to enforce the strong security needed by a serious financial institution while meeting their need to rapidly extend business relationships. Pete is thankful to be able to introduce interesting web technologies in the service of helping millions of people to reach for their financial dreams. He can be contacted at [email protected].
I'd like to dedicate my humble contribution to my wife Geraldine, and to my children Mary, John, and Patricia. They have sacrificed my time and attention for me to be able to complete this project. This chapter is a family effort.
Evan Lenz Evan Lenz currently works as a software engineer for XYZFind Corp. in Seattle, WA. His primary area of expertise is in XSLT, and he enjoys exploring new ways to utilize this technology for various projects. His work at XYZFind includes everything from XSLT and Java development to writing user's manuals, to designing the XML query language used in XYZFind's XML database software. Wielding a professional music degree and a philosophy major, he hopes to someday bring his varying interests together into one grand, masterful scheme. Thanks to my precious wife, Lisa, and my baby son, Samuel, for putting up with Daddy's long nights. And praise to my Lord and Savior, Jesus Christ, without whom none of this would be possible or meaningful.
Steven Livingstone Steven Livingstone is an IT Architect with IBM Global Services in Winnipeg, Canada. He has contributed to numerous Wrox books and magazine articles, on subjects ranging from XML to ECommerce. Steven’s current interests include E-Commerce, ebXML, .NET, and Enterprise Application Architectures. Steven would like to thank everyone at Wrox, especially for the understanding as he emigrated from Scotland to Canada (and that could be another book itself ;-) Most importantly he wants to thank Loretito for putting up with him whilst writing – gracias mi tesoro. Congratulations Celtic on winning the Treble :)
Daniel Marcus Dr. Marcus has twenty years of experience in software architecture and design. He is co-founder, President, and Chief Operating Officer at Speechwise Technologies, an applications software company at the intersection of speech, wireless, and Internet technologies. Prior to starting Speechwise, he was Director of E-Business Consulting at Xpedior, leading the strategy, architecture, and deployment of e-business applications for Global 2000 and dot-com clients. Dr. Marcus has been a Visiting Scholar at Princeton's Institute for Advanced Study, a research scientist at the Lawrence Livermore National Laboratory, and is the author of over twenty papers in computational science. He is a Sun-Certified Java Technology Architect and holds a Ph.D. in Mechanical Engineering from the University of California, Berkeley.
Stephen Mohr Stephen Mohr is a software systems architect with Omicron Consulting, Philadelphia, USA. He has more than ten years' experience working with a variety of platforms and component technologies. His research interests include distributed computing and artificial intelligence. Stephen holds BS and MS degrees in computer science from Rensselaer Polytechnic Institute. For my wife, Denise, and my sons James and Matthew.
Nikola Ozu Nikola Ozu is an independent systems architect who lives in Wyoming at the end of a few miles of dirt road – out where the virtual community is closer than town, but only flows at 24kb/s, and still does not deliver pizza. His current project involves bringing semantic databases, text searching, and multimedia components together with XML – on the road to Xanadu. Other recent work has included the usual web design consulting, some XML vocabularies, and an XML-based production and fulltext indexing system for a publisher of medical reference books and databases. In the early 90s, Nik designed and developed a hypertext database called Health Reference Center; followed by advanced versions of InfoTrac. Both of these were bibliographic and full-text databases, delivered as monthly multi-disc CD-ROM subscriptions. Given the large text databases involved, some involvement with SGML was unavoidable. His previous work has ranged from library systems on mainframes to embedded micro systems (telecom equipment, industrial robots, toys, arcade games, and videogame cartridges). In the early 70s, he was thrilled to learn programming using patch boards, punch cards, paper tape, and printouts (and Teletypes, too). When not surfing the 'net, he surfs crowds, the Tetons, and the Pacific; climbs wherever there is rock; and tries to get more than a day's walk from the nearest road now and then. He enjoys these even more when accompanied by his teenage son, who's old enough now to appreciate the joy of mosh pits and sk8ing the Mission District after midnight. To Noah: May we always think of the next (23 - 1) generations instead of just our own 20. My thanks to the editors and illustrators at Wrox and my friend Deanna Bauder for their help with this project. Also, thanks and apologies to my family and friends who endured my disappearances into the WriterZone for days on end.
Jonathan Pinnock Jonathan Pinnock started programming in Pal III assembler on his school's PDP 8/e, with a massive 4K of memory, back in the days before Moore's Law reached the statute books. These days he spends most of his time developing and extending the increasingly successful PlatformOne product set that his company, JPA, markets to the financial services community. He seems to spend the rest of his time writing for Wrox, although he occasionally surfaces to say hello to his long-suffering wife and two children. JPA’s home page is www.jpassoc.co.uk.
Keith Visco Keith Visco currently works for Intalio, Inc., the leader in Business Process Management, as a manager and project leader for XML based technologies. Keith is the project leader for the open source data-binding framework, Castor. He has been actively working on open source projects since 1998, including the Mozilla project where he is the original author of Mozilla's XSLT processor (donated by his previous employer, The MITRE Corporation) and is the current XSLT module owner. In all aspects of his life, Keith is most inspired after drinking a large Dunkin' Donuts Hazelnut Coffee. Keith relieves what little stress his life does encounter by playing guitar or keyboards (he apologizes to his neighbors). He is also a firm believer that life cannot exist without three basic elements: music, coffee, and Red Sox baseball.
I would like to acknowledge Intalio, Inc. and The Exolob Group for giving me the opportunity to work on many industry-leading technologies. I would like to thank my team at Intalio, specifically Arnaud Blandin and Sebastien Gignoux, for their hard work as well as their invaluable feedback on this chapter. I would also like to thank my family for their unconditional support and incessant input into all phases of my life. A special thanks to Cindy Iturbe, whose encouragement means so much to me and for teaching me that with a little patience and hard work all things are possible, no matter how distant things may seem.
Andrew Watt Andrew Watt is an independent consultant who enjoys few things more than exploring the technologies others have yet to sample. Since he wrote his first programs in 6502 Assembler and BBC Basic in the mid 1980s, he has sampled Pascal, Prolog, and C++, among others. More recently he has focused on the power of web-relevant technologies, including Lotus Domino, Java and HTML. His current interest is in the various applications of the Extensible Markup Meta Language, XMML, sometimes imprecisely and misleadingly called XML. The present glimpse he has of the future of SVG, XSL-FO, XSLT, CSS, XLink, XPointer, etc when they actually work properly together is an exciting, if daunting, prospect. He has just begun to dabble with XQuery. Such serial dabbling, so he is told, is called “life-long learning”. In his spare time he sometimes ponders the impact of Web technologies on real people. What will be the impact of a Semantic Web? How will those other than the knowledge-privileged fare? To the God of Heaven who gives human beings the capacity to see, think and feel. To my father who taught me much about life. My heartfelt thanks go to Gail, who first suggested getting into writing, and now suffers the consequences on a fairly regular basis, and to Mark and Rachel, who just suffer the consequences.
Kevin Williams Kevin’s first experience with computers was at the age of 10 (in 1980) when he took a BASIC class at a local community college on their PDP-9, and by the time he was 12, he stayed up for four days straight hand-assembling 6502 code on his Atari 400. His professional career has been focused on Windows development – first client-server, then onto Internet work. He’s done a little bit of everything, from VB to Powerbuilder to Delphi to C/C++ to MASM to ISAPI, CGI, ASP, HTML, XML, and any other acronym you might care to name; but these days, he’s focusing on XML work. Kevin is a Senior System Architect for Equient, an information management company located in Northern Virginia. He may be reached for comment at [email protected].
Zoran Zaev Zoran is a Sr. Web Solutions Architect with Hitachi Innovative Solutions, Corp. in the Washington DC area. He has worked in technology since the time when 1 MHz CPUs and 48Kb was considered a 'significant power', in the now distant 1980s. In mid 1990s, Zoran became involved in web applications development. Since then, he has worked helping large and small clients alike leverage the power of web applications. His more recent emphasis has been web applications and web services with XML, SOAP, and other related technologies. When he's not programming, you'll find him traveling, exploring new learning opportunities.
I would like to thank my wife, Angela, for her support and encouragement, as well as sharing some of her solid writing knowledge. And, you can never go wrong thanking your parents, so 'fala' to my mom, Jelica and dad, Vanco. On the professional side, I would like to thank Ellen Manetti for her strong project management example, and Pete Johnson, founder of Virtualogic, Inc., for his vision inspiring influence. Finally, thanks to Beckie and Marsha from Wrox for their always-timely assistance and to Jan from "Images by Jan". Zoran can be reached at [email protected]
Introduction
AM FL Y
eXtensible Markup Language (XML) has emerged as nothing less than a phenomenon in computing. It is a concept elegant in its simplicity driving dramatic changes in the way Internet applications are written. This book is a revision to the first edition to keep pace with this fast-changing technology as many technologies have been superseded, and new ones have emerged.
What Does This Book Cover?
TE
This book explains and demonstrates both the essential techniques for designing and using XML documents, and many of the related technologies that are important today. Almost everything in this book will be based around a specification provided by the World Wide Web Consortium (W3C). These specifications are at various levels of completion and some of the technologies are nascent, but we expect them to become very popular when their specifications are finalized because they are useful or essential. The wider XML community is increasingly jumping in and offering new XMLrelated ideas outside the control of the W3C, although the W3C is still central and important to the development of XML. The focus of this book is on learning how to use XML as an enabling technology in real-world applications. It presents good design techniques, and shows how to interface XML-enabled applications with web applications. Whether your requirements are oriented toward data exchange or presentation, this book will cover all the relevant techniques in the XML community. Most chapters contain a practical example (unless the technology is so new that there were no working implementations at the time of writing). As XML is a platform-neutral technology, the examples cover a variety of languages, parsers, and servers. All the techniques are relevant across all the platforms, so you can get valuable insight from the examples even if they are not implemented using your favorite platform.
Team-Fly®
Introduction
Who Is This Book For? This book is for the experienced developer, who already has some basic knowledge of XML, to learn how to build effective applications using this exciting but simple technology. Web site developers can learn techniques, using XSLT stylesheets and other technologies, to take their sites to the next level of sophistication. Other developers can learn where and how XML fits into their existing systems and how they can use it to solve their application integration problems. XML applications can be distributed and are usually web-oriented. This book focuses on this kind of application and so we would expect the reader to have some awareness of multi-tier architecture - preferably from a web perspective. Although we will retread over XML, in case some of the XML fundamentals have been missed in your experience, we will cover the full specification thoroughly and fairly quickly. A variety of programming languages will be used in this book, and we do not expect you to be proficient in them all. The techniques taught in this book can be transferred to other programming languages. As XML is a cross-platform language, Java will be a language used in this book, especially because it has a wealth of tools to manipulate XML. Other languages covered include JavaScript, VBScript, VB, C#, and Perl. We expect the reader to be proficient in a programming language, but it does not matter which one.
How is this Book Structured? Although many authors have contributed towards this book, we have tied the chapters together under unifying themes. As you will read below, the book has effectively been split into six sections. A standard example using a toy company has been used in chapters where possible, so you can see how different technologies can explain, describe, or transform the same data in different ways. A small number of the chapters, e.g. Chapter 23, rely heavily on a previous chapter, but this will be made clear. Most of the chapters will be relatively self-contained.
Learning Threads XML is evolving into a large, wide-ranging field of related markup technologies. This growth is powering XML applications. With growth comes divergence. Different readers will come to this book with different expectations. XML is different things to different people.
Foundation Chapter 1 introduces the XML world in general, discussing the technologies that are relevant today and may be relevant tomorrow, but with very little code. Chapters 2 (Basic XML Syntax) and 3 (Advanced XML Syntax) cover the fundamentals of XML 1.0. Chapter 2 gives you the basic syntax of an XML document, while Chapter 3 covers slightly more advanced issues like namespaces. These chapters form the irreducible minimum you need to understand XML and, depending on your experience, you may want to skip these introductory chapters. Chapter 4 teaches you about the Infoset, a standard way of describing XML, which provides an abstract representation for XML data. In Chapter 5, we cover document validation using DTDs. Although, as you learn in the subsequent two chapters, other schema-based validation languages exist that supersede DTDs, they are not quite dead as many more XML parsers validate with DTDs than any other schema language, and DTDs are relatively simple. Following this, in Chapter 6, we cover XML Schema and show how to validate your XML documents using this new XML-based validation language specified by the W3C. Chapter 7 covers other schema-based validation languages, including James Clark's TREX proposal, and the Schematron.
2
Introduction
In Chapter 8, we explain the XPath specification – a method of referring to specific fragments of XML that is relevant to and used by other XML technologies. These include XSLT, described in Chapter 9. Here we teach you how to transform your XML documents into anything else, based on certain stylesheet declarations. In Chapter 10, we show various linking technologies, such as XLink and XPointer and describe the XML Fragment Interchange specification. These ten chapters are enough for you to learn about all of the immediately useful XML technologies – for those who just use XML. You may already have a lot of experience of XML and so some of these chapters will be re-treading over well-walked ground, but everybody should be able to learn something new, especially because XML Schema acquired Proposed Recommendation status, the penultimate stage of the W3C specifications, just two months before this book was printed. Although a wealth of XML techniques lie ahead, you will have a firm foundation upon which to build. So the Foundation thread includes: ❑
Chapter 1: Introducing XML
❑
Chapter 2: Basic XML Syntax
❑
Chapter 3: Advanced XML Syntax
❑
Chapter 4: The XML Information Set
❑
Chapter 5: Validating XML: Schemas
❑
Chapter 6: Introducing XML Schema
❑
Chapter 7: XML Schema Alternatives
❑
Chapter 8: Navigating XML – XPath
❑
Chapter 9: Transforming XML
❑
Chapter 10: Fragments, XLink, and XPointer
XML Programming XML is both machine and human readable and, not surprisingly, some standard APIs have been created to manipulate XML data. These APIs are implemented in JavaScript, Java, Visual Basic, C++, Perl, and many other languages. These provide a standard way of manipulating, and developing for, XML documents. In Chapter 11, we consider the first API, which emerged from the HTML world, the DOM. This has been released as a specification from the W3C, and Level 2 of this specification has recently been released. XML data can be thought of as hierarchical and object-oriented, and the DOM provides methods and properties for retrieving and manipulating XML nodes. Chapter 12 discusses the SAX, a lightweight alternative to the DOM. When manipulating the DOM, the entire document has to be read into memory; with the SAX, however, it only retrieves as much data as is necessary to retrieve or manipulate a specific node. Chapter 13 is the last chapter in this section, and it covers Declarative Programming with XML. Most programmers use procedural languages, but XML and the XML specifications don't care about how a particular language or application performs a job, just that it does it according the declarations made. This chapter explains how to use schemas to design your applications.
3
Introduction
The Programming thread therefore includes: ❑
Chapter 11: The Document Object Model
❑
Chapter 12: SAX 2
❑
Chapter 13: Schema Based Programming
XML as Data There are four chapters in this section, all targeted specifically at the storage, retrieval, and manipulation of data – as it relates to XML. Chapter 14, Data Modeling, explains how to plan your project 'properly', and so model your XML on your data and build better applications because of it. Chapter 15 extends this concept by covering the binding of the data to XML (and vice versa). Querying XML covers a nascent technology known as XML Query. It aims to provide the power of SQL in an XML format. This short chapter teaches you how to use the technology as it stands at the time of writing. The final chapter covered, is a case study, which describes how to relate your databases to your XML data and so integrate your XML and RDBMS in the best way possible. This means that the Data thread contains: ❑
Chapter 14: Data Modeling
❑
Chapter 15: Data Binding
❑
Chapter 16: Querying XML
❑
Chapter 17: Case Study: XML and Databases
Presentation of XML Chapter 18 covers an XML technology called SVG – Scalable Vector Graphics. This XML technology, when coupled with an appropriate viewer (for example, Adobe SVG Viewer), allows quite detailed graphics files to be displayed and manipulated. In Chapter 19, we describe VoiceXML, an XML technology to allow voice recognition and processing on the Web. XML data can be converted to VoiceXML and using the appropriate technology, can be spoken and interacted with over a telephone. Chapter 20 covers the final technology in this section, XSL-FO. This is an emerging technology that allows the layout of pages to be specified exactly, much in the same way as PDF does now. The main difference is, this is XML too and so can be manipulated using the same XML tools you may be used to. Also, XSL-FO can be converted to PDF if necessary for users without XSL-FO viewers. In the Presentation thread, therefore, we cover:
4
❑
Chapter 18: Presenting XML Graphically
❑
Chapter 19: VoiceXML
❑
Chapter 20: XSL Formatting Objects: XSL-FO
Introduction
XML as Metadata In this thread, we discuss how XML can be used to represent metadata – that is, the meaning or semantics of data, rather than the data itself. In Chapter 21, we cover the setting up of an index of XML data. This chapter uses a Java indexing application, but the techniques are applicable to any indexing tool. Chapter 22 is where we really get to the meat of the topic, where we talk about RDF – a language to describe metadata. We cover the elements and syntax of this technology. In Chapter 23, we go over some practical examples of RDF technology, before describing RDDL – a method of bundling resources at the URL of a namespace, so that a RDDL-enabled application can learn what the technology of which the namespace is referring to, actually is and access schema and standard transforms. In the Metadata thread, we cover: ❑
Chapter 21: Case Study: Generating a Site Index
❑
Chapter 22: RDF
❑
Chapter 23: RDF Code Samples and RDDL
XML used for B2B The final section of this book describes what is quite possibly the most important use of XML – B2B and Web Services. In the past, the communication protocols for B2B (e.g. EDI) have been proprietary, and expensive – both in terms of cost, and processor power. Using XML vocabularies, an open and programmable model can be used for B2B transactions. In Chapter 24, we describe Simple Object Access Protocol. SOAP was a mostly Microsoft initiative (although the W3C are developing the XML Protocol specification, which should be very similar to SOAP), which allows two applications to specify services using XML. We cover the intricacies of this protocol, so that you can use it to web-enable any service you would care to mention. Chapter 25 covers Microsoft's BizTalk Server. This server can control all B2B transactions, using the open BizTalk framework. BizTalk is just one method of using SOAP to conduct business transactions, but it is Microsoft's and is very popular. In Chapter 26, we have a case study discussing E-Business integration using XML. There are a number of business standards for commerce, and this chapter explains how you can integrate all of the standards, without having to write code for every possible B2B transaction between competing standards. We end in Chapter 27, with a discussion of the Web Services Description Language, which allows us to formalize other XML vocabularies by defining services that a SOAP, or other client, can connect to. WSDL describes each service and what it does. In addition, in this chapter, we cover UDDI (Universal Description, Discovery, and Integration), which is a way of automating the discovery and transactions with various services. In many cases, it should not be necessary for human interaction to find a service, and using public registration services, UDDI makes this possible. Both of these technologies are nascent but their importance will grow as more and more businesses make use of them. In summary, in the B2B thread, we describe in each chapter the following: ❑
Chapter 24: SOAP
❑
Chapter 25: B2B with Microsoft BizTalk Server
❑
Chapter 26: E-Business Integration
❑
Chapter 27: B2B Futures: WSDL and UDDI
5
Introduction
What You Need to Use this Book The book assumes that you have some knowledge of HTML, some procedural object-oriented programming languages (e.g. Java, VB, C++), and some minimal XML knowledge. For some of the examples in this book, a Java Runtime Environment (http://java.sun.com/j2se/1.3/) will need to be installed on your system, and some other chapters, require applications such as MS SQL Server, MS Index Server, and BizTalk. The complete source for larger portions of code from the book is available for download from: http://www.wrox.com/. More details are given in the section of this Introduction called, "Support, Errata, and P2P".
Conventions To help you get the most from the text and keep track of what's happening, we've used a number of conventions throughout the book. For instance:
These boxes hold important, not-to-be forgotten information, which is directly relevant to the surrounding text.
While this style is used for asides to the current discussion. As for styles in the text: When we introduce them, we highlight important words We show keyboard strokes like this: Ctrl-A We show filenames, and code within the text like so: doGet() Text on user interfaces is shown as: File | Save URLs are shown in a similar font, as so: http://www.w3c.org/ We present code in two different ways. Code that is important, and testable is shown as so: In our code examples, the code foreground style shows new, important, pertinent code
Code that is an aside, shows examples of what not to do, or has been seen before is shown as so: Code background shows code that's less important in the present context, or has been seen before.
In addition, when something is to be typed at a command line interface (e.g. a DOS prompt), then we use the following style to show what is typed, and what is output:
6
Introduction
> java com.ibm.wsdl.Main -in Arithmetic.WSDL >> Transforming WSDL to NASSL .. >> Generating Schema to Java bindings .. >> Generating serializers / deserializers .. Interface 'wsdlns:ArithmeticSoapPort' not found.
Support, Errata, and P2P The printing and selling of this book was just the start of our contact with you. If there are any problems, whatsoever with the code or the explanation in this book, we welcome input from you. A mail to [email protected], should elicit a response within two to three days (depending on how busy the support team are). In addition to this, we also publish any errata online, so that if you have a problem, you can check on the Wrox web site first to see if we have updated the text at all. First, pay a visit to www.wrox.com, then, click on the Books | By Title(Z-A), or Books | By ISBN link on the left hand side of the page. See below:
Navigate to this book (this ISBN is 1861005059, if you choose to navigate this way) and then click on it. As well as giving some information about the book, it also provides options to download the code, view errata, and ask for support. Just click on the relevant link. All errata that we discover will be added to the site and so information on changes to the code that has to be made for newer versions of software may also be included here – as well as corrections to any printing or code errors.
7
Introduction
All of the code for this book can be downloaded from our site. It is included in a zip file, and all of the code samples in this book can be found within, referenced by chapter number. In addition, at p2p.wrox.com, we have our free "Programmer to Programmer" discussion lists. There are a few relevant to this book, and any questions you post will be answered by either someone at Wrox, or someone else in the developer community. Navigate to http://p2p.wrox.com/xml, and subscribe to a discussion list from there. All lists are moderated and so no fluff or spam should be received in your Inbox.
Tell Us What You Think We've worked hard to make this book as useful to you as possible, so we'd like to know what you think. We're always keen to know what it is you want and need to know. We appreciate feedback on our efforts and take both criticism and praise on board in our future editorial efforts. If you've anything to say, let us know on: [email protected]
Or via the feedback links on: http://www.wrox.com
8
Introduction
9
Introduction
10
Introducing XML
AM FL Y
In this chapter, we'll look at the origins of XML, the core technologies and specifications that are related to XML, and an overview of some current, and future applications of XML. The later sections of this introduction should also serve as something of a road map to the rest of the book.
Origins and Goals of XML
TE
"XML", as we all know, is an acronym for Extensible Markup Language – but what is a markup language? What is the history of markup languages, what are the goals of XML, and how does it improve upon earlier markup?
Markup Languages
Ever since the invention of the printing press, writers have made notes on manuscripts to instruct the printers on matters such as typesetting and other production issues. These notes were called "markup". A collection of such notes that conform to a defined syntax and grammar can certainly be called a "language". Proofreaders use a hand-written symbolic markup language to communicate corrections to editors and printers. Even the modern use of punctuation is actually a form of markup that remains with the text to advise the reader how to interpret that text. These early markup languages use a distinct appearance to differentiate markup from the text to which it refers. For example, proofreaders' marks consist of a combination of cursive handwriting and special symbols to distinguish markup from the typeset text. Punctuation consists of special symbols that cannot be confused with the alphabet and numbers that represent the textual content. These symbols are so necessary to understanding printed English that they were included in the ASCII character set, and so have become the foundation of modern programming language syntax.
Team-Fly®
Chapter 1
The ASCII character set standard was the early basis for widespread data exchange between various hardware and software systems. Whatever the internal representation of characters; conversion to ASCII allowed these disparate systems to communicate with each other. In addition to text, ASCII also defined a set of symbols, the C0 control characters (using the hexadecimal values 00 to 1F), which were intended to be used to markup the structure of data transmissions. Only a few of these symbols found widespread acceptance, and their use was often inconsistent. The most common example is the character(s) used to delimit the end of a line of text in a document. Teletype machines used the physical motion-based character pair CR-LF (carriage-return, line-feed). This was later used by both MS-DOS and MS-Windows; UNX uses a single LF character; and the MacOS uses a single CR character. Due to conflicting and non-standard uses of C0 control characters, document interchange between different systems still often requires a translation step, since even a simple text file cannot be shared without conversion. Various forms of delimiters have been used to define the boundaries of containers for content, special symbol glyphs, presentation style of the text, or other special features of a document. For example, the C and C++ programming languages use the braces {} to delimit units of data or code. A typesetting language, intended for manual human editing, might use strings that are more readable, like ".begin" and ".end".
Markup is a method of conveying metadata (information about another dataset). XML is a relatively new markup language, but it is a subset of, and is based upon a mature markup language called Standard Generalized Markup Language (SGML). The WWW's Hypertext Markup Language (HTML) is also based upon SGML; indeed, it is an application of SGML. There is a new version of HTML 4 that is called Extensible Hypertext Markup Language (XHTML), which is similarly an application of XML. All of these markup languages are for metadata, but SGML and XML may be further considered meta-languages, since they can be used to create other metadata languages. Just as HTML was expressed in SGML, XHTML and others will use XML.
SGML-based markup languages all use literal strings of characters, called tags to delimit the major components of the metadata, called elements. Tags represent object delimiters and other such markup, as opposed to its content (no matter whether it's simple text or text that is program code). Of course, there has often been conflict between different sets of tags and their interpretation. Without common delimiter vocabularies, or even common internal data formats, it has been very difficult to convert data from one format to another, or otherwise share data between applications and organizations. For example, the following two markup excerpts (Chapter_01_01.html & Chapter_01_01.xml) shows familiar HTML and similar XML elements with their delimiting tags:
Product Catalog (Toysco-only)
Product Catalog (Internal-use only!)
2
Introducing XML
Product Descriptions
Mega Wonder Widget
The Mega Wonder Widget is a popular toy with a 20 oz. capacity. It costs only $12.95 to make, whilst selling for $33.99 (plus $3.95 S&H).
Giga Wonder Widget
The Giga Wonder Widgetis even more popular, because of its larger 55 oz. capacity. It has a similar profit margin (costs $19.95, sells for $49.99). ...
Updated: 2001-04-01 by Webmaster Will
This rather simplistic document uses the few structural tags that exist in HTML, such as , , , and for headers, and
for paragraphs. This structure is limited to a very basic presentation model of a document as a printed page. Other tags, such as and , are purely about the appearance of the data. Indeed, most HTML tags are now used to describe the presentation of data, interactive logic for user input and control, and external multimedia objects. These tags give us no idea what structured data (names, prices, etc.) might appear within the text, or where it might be in that text. On the other hand, XML allows us to create a structural model of the data within the text. Presentation cues can be embedded as with HTML tags, but the best XML practice is to separate the data structure from presentation. An external style sheet can be used with XML to describe the data's presentation model. So, we might convert – and extend – the above HTML example into the following XML data file (Chapter_01_01.xml):
> > >
> > > >
3
Chapter 1
]>
Product Catalog 2001-04-01 Webmaster Will Toysco-only (TRADE SECRET)
Product Catalog Product Descriptions
&MWW;
The &MWW; is a popular toy with a 20 capacity. It costs only 12.95 to make, whilst selling for 33.99 (plus 3.95 S&H).
&GWW;
The &GWW; is a popular, because of its larger 55 capacity. It has a similar profit margin (costs 19.95, sells for 33.99).
...
The XML document looks very similar to the HTML version, with comparable text content, and some equivalent tags (as XHTML). XML goes far beyond HTML by allowing the use of custom tags (like or ) that preserve some structured data that is embedded within the text of the description. We can't do this in HTML, since its set of tags is more or less fixed, changing slowly as browser vendors embrace new features and markup. In contrast, anyone can add tags to their own XML data. The use of tags to describe data structure allows easy conversion of XML to an arbitrary DBMS format, or alternative presentations of the XML data such as in tabular form or via a voice synthesizer connected to a telephone. We have also assumed that we will use a stylesheet to format the XML data for presentation. Therefore, we are able to omit certain labels from our text (such as the $ sign in prices, and the "oz." after the capacity value). We will then rely upon the formatting process to insert them in the output, as appropriate. In a similar fashion, we have put the document update information in the header (where it can be argued that it logically belongs). When we transform the data for output, this data can be displayed as a footer with various string literals interspersed. In this way, it can appear to be identical to the HTML version. It should be obvious from the examples that HTML and XML are very similar in both overall structure and syntax. Let's look at their common ancestor, before we move on to the goals of XML.
4
Introducing XML
SGML and Document Markup Languages SGML is an acronym for Standard Generalized Markup Language, an older and more much complex markup language than XML. It has been codified as an international standard by the ISO (International Organization for Standardization) as ISO 8879 and WebSGML. The ISO doesn't put very much of its standards information online, but they do maintain a website at http://www.iso.ch, and offer the paper version of ISO 8879 for sale at http://www.iso.ch/cate/d16387.html. General SGML information and links can be found at http://www.w3.org/MarkUp/SGML and http://xml.coverpages.org. WebSGML (ISO 8879:1986 TC2. Information technology – Document Description and Processing Languages) is described online at http://www.sgmlsource.com/8879rev/n0029.htm. SGML has been widely used by the U.S. government and its contractors, large manufacturing companies, and publishers of technical information. Publishers often construct paper documents, such as books, reports, and reference manuals from SGML. Often, these SGML documents are then transformed into a presentation format such as PostScript, and sent to the typesetter and printer for output to paper. Technical specifications for manufacturing can also be exchanged via SGML documents. However, SGML's complexities and the high cost of its implementation have meant that most businesses and individuals have not been able to afford to embrace this powerful technology.
SGML History In 1969, a person walked on the Moon for the first time. In the same year, Ed Mosher, Ray Lorie, and Charles F. Goldfarb of IBM Research invented the first modern markup language, Generalized Markup Language (GML). GML was a self-referential language for marking the structure of an arbitrary set of data, and was intended to be a meta-language – a language that could be used to describe other languages, their grammars and vocabularies. GML later became SGML. In 1986, SGML was adopted as an international data storage and exchange standard by the ISO. When Tim Berners-Lee developed HTML in the early 1990s, he made a point of maintaining HTML as an application of SGML. With the major impact of the World Wide Web (WWW) upon commerce and communications, it could be argued that the quiet invention of GML was a more significant event in the history of technology than the high adventure of that first trip to another celestial body. GML led to SGML, the parent of both HTML and XML. The complexity of SGML and lack of content tagging in HTML led to the need for a new markup language for the WWW and beyond – XML.
Goals of XML In 1996, the principal design organization for technologies related to the WWW, the World Wide Web Consortium (W3C) began the process of designing an extensible markup language that would combine the flexibility of SGML and the widespread acceptance of HTML. That language is XML. The W3C home page is at http://www.w3.org, and its XML pages begin with an overview at http://www.w3.org/XML. Most technical documents can be found at http://www.w3.org/TR... XML version 1.0 was defined in a February 1998 W3C Recommendation, which, like an Internet Request for Comments (RFC), is an informal "standard". Various minor errors in documentation and some changes in underlying standards led to the publication of a Second Edition in October 2000, which corrects and updates the documentation, but doesn't change XML itself.
5
Chapter 1
The current XML 1.0 Recommendation (which we'll abbreviate as XML 1.0 REC) can be found at http://www.w3.org/TR/REC-xml. The W3C developed ten design goals for XML, to quote from the Recommendation: The design goals for XML are:
1.
XML shall be straightforwardly usable over the Internet.
2.
XML shall support a wide variety of applications.
3.
XML shall be compatible with SGML.
4.
It shall be easy to write programs that process XML documents.
5.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6.
XML documents should be human-legible and reasonably clear.
7.
The XML design should be prepared quickly.
8.
The design of XML shall be formal and concise.
9.
XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance. Like all specifications intended to be standards, XML has been defined in a formal and concise manner, using a formal notation, Extended Backus-Naur Form (EBNF) that satisfies design goal 8. The other design goals have been met by several characteristics of XML 1.0 and its "normative" (pre-requisite) references to existing Internet standards. We can categorize these as: ❑
Extensibility and separation of semantics and presentation (an implicit goal)
❑
Simplicity (design goals 4, 5, 6, 7, and 10)
❑
Internationalization (1, 2, 6, and 9)
❑
Usable over the Internet (1 and 2)
❑
Interoperability with SGML (3)
We'll look at a few of these in slightly greater depth, and then show some additional resources for XML information, vocabularies, and software tools.
Extensibility SGML is a highly extensible and open-ended markup language, and so is XML. A significant aspect of this was illustrated in the example in the Markup Languages section, where we were able to add our own tags to a document and thus include some structured data within the text. XML provides a basic syntax but does not define the actual tags; the tag set can be extended by anyone for their own purposes.
6
Introducing XML
Separation of Semantics and Presentation HTML is a markup language that describes data and its presentation. Despite the advent of external Cascading Style Sheets (CSS) to format HTML data, most web pages still use numerous presentation tags embedded within the data.
XML is all about the description of data, with nothing said about its presentation. HTML combines some rudimentary descriptive markup, plus a great deal of markup that describes the presentation of the data. The XML specification not only describes the XML data format and grammar, it also specifies a two-tier client architecture for handling XML data. The first tier is the XML Processor (also known as the XML parser, which is the term we'll use in this book). The parser ensures that the presumed XML data is well-formed (has the correct structure and syntax), and may be used to check the validity of the user's data structure. The parser must comply with the XML specification, and pass the content and structure of the XML data to a second tier application (the XML Application) in a prescribed manner. The XML parser can use a separate document, generically called a schema, to describe and validate that instance of XML data. One type of schema, called a Document Type Definition (DTD), is specified in the XML 1.0 REC. There are other forms of schemas in use and under development. We will look at several of these, including the W3C's XML Schema and XML-Data Reduced (XDR), currently used by Microsoft as their schema language – although they will be using XML Schema in the future, just not at this time of writing.
The initial layer of XML processing is the XML parser, which can optionally use a DTD or schema to describe and validate the XML data. As we mentioned earlier, the presentation of XML data is also defined in a separate document, the style sheet, which uses the Extensible Stylesheet Language (XSL). XSL is to XML, as CSS is to HTML. XML can be transformed for presentation or just simple data conversion using an XSL Transformations (XSLT) tool, such as MSXML, Saxon, or XT. These tools can be used on server side XML data, transforming it to a combination of HTML and CSS, for display in existing web browsers. An XSL processor can also be implemented on the client side (such as MSXML in the IE5 browser), so that XML data is sent directly to the client, without requiring server side transformation.
The application layer of XML processing, such as a browser or editor, can use an XSL style sheet to describe the presentation of the XML data. One stylesheet can ensure a uniform presentation style for multiple XML documents. Contrariwise, a single XML document might be presented in a variety of ways, using multiple style sheets. The application layer can choose to present the XML data as synthetic speech over a telephone or radio using VoiceXML, or reformat it to fit a PDA display screen (using WML, for example). For that matter, there is no requirement that XML be presented as text, or displayed at all. XML data can also be used as the basis of a messaging system, either computer-mediated person-to-person messages like e-mail, or computer-to-computer messages such as Remote Procedure Calls (RPCs). We will look at some of these, like XML-RPC and SOAP, later in this chapter.
7
Chapter 1
XML data can be used for computer-to-computer messages, as well as for humanreadable documents. This non-document use of XML is one of the most exciting applications of XML and its supporting tools and specifications. Just as XML might be used to present web pages that are more sophisticated to users and tagged data to search engines, XML may also serve in the underlying technical infrastructure. Ecommerce applications may use XML to describe business rules and XML (as SOAP or XML-RPC) for distributed processing calls and messages. Financial transactions may be encoded in signed XML packets, and so on, right on down to the configuration and administration of the very computers that implement the world-wide e-commerce system.
Internationalization (I18N) Although the WWW is already an international phenomenon, XML was designed for much better support of non-European languages and internationalization (also known as "i18n" or "I18N". This is yet another shorthand notation, obviously from the minds of programmers, that is derived from the first and last letters, and the count of 18 letters between. XML is based upon several ISO standards, including the Universal Character Set (UCS), defined in the ISO/IEC 10646 character set standard (which is currently congruent with the somewhat better-known Unicode standard). The current Unicode 3.0 specification can be found at: http://www.unicode.org ISO/IEC 10646 documentation can be ordered at http://www.iso.ch Like most aspects of XML, names have been extended beyond the old-fashioned Anglo-centric ASCIIonly limitation to allow the use of most any of the world's languages.
XML text and names may use any of the world's different alphabets, scripts, and writing systems as defined in the ISO/IEC 10646 and Unicode 3.0 standards. The value of this design goal extends far beyond merely presenting text in different human languages. The XML metadata can also be described in the local vernacular, and style. XML is the basis of a truly international Internet, accessible to people all over the world, in their native language.
XML Works with the Internet XML is based upon a simple text format. Even though this means Unicode text, not just simple ASCII text, it may be converted to the UTF-8, or ASCII encoding for reliable transmission over the oldest of Internet connections and hardware (Teletype, anyone?). This also eliminates some considerable issues related to the interpretation of binary data formats on different computer hardware and operating systems. XML also uses existing Internet protocols, software, and specifications wherever possible, for easier data processing and transmission. These range from basic syntax, like Uniform Resource Identifiers (URIs), to directories of code numbers, like ISO Country Codes. We will look the more important of these Internet specifications in some detail, in Chapter 3. There is even a version of HTML represented in an XML-compatible form, called (rather verbosely): "XHTML 1.0: The Extensible Hypertext Markup Language – A Reformulation of HTML 4.0, in XML 1.0". This provides a migration path from millions of existing HTML pages to pure XML representation. Once XHTML support in browsers is widespread, further development of HTML and Dynamic HTML (DHTML) can leverage all the benefits of expression in XML syntax and common XML tools.
8
Introducing XML
XML is a text format that is easily transmitted over the Internet and other communications links. XML works with basic WWW protocols, including HTTP or HTTPS. Like HTML, XML is often transmitted using the WWW's Hypertext Transfer Protocol (HTTP). This means that XML can be handled easily by existing web server software, and pass through corporate network firewalls. XHTML 1.0 is described at http://www.w3.org/MarkUp. The current HTTP specification can be found at http://www.w3.org/Protocols. Although XML is not a direct replacement for HTML, future versions of HTML will be expressed in XML syntax as XHTML. XML enables enhanced web architecture by moving more of the burden of presentation from the server to the client's browser or other application. XML provides a syntax that can be used for almost any data, its descriptive metadata, and even the message protocols used to move the XML data between server and clients.
XML will enable an enhanced WWW architecture. XML can also be used as a universal data exchange and long-term storage format, with or without the Internet. Improved searching is another benefit – instead of attempting to find a price buried within a lump of text, enclosed in HTML
tags, the price information can be found easily and reliably using explicitly tagged XML data. This same tagging will provide for vastly improved data exchange between a website and its users, between co-operating websites, and/or between software applications. XML will enable a much more powerful Web, and it will also empower most other computing applications.
XML is Simplified SGML A major design goal for XML was ease-of-use, so the XML design team was able to use SGML as an already working starting point, and focus upon simplifying SGML. Due to its many optional features, SGML was so complex that is was difficult to write generic parsers, whereas XML parsers are much simpler to write. XML is also intended to be easy to read and write by developers using simple and commonly available tools.
XML is constrained by design to be interoperable with SGML. This design constraint allowed early adopters of XML to use SGML tools. However, it also means that there are some quirky constraints on XML data, declarations, and syntax necessary to maintain SGML compatibility. This is the downside of XML being a subset of SGML. At some point in the future, there may be a break between XML and SGML, but for some years to come, SGML-based XML 1.0 syntax is likely to be the norm.
Resources The formal XML specifications, including grammars in EBNF notation, are readily available on the Web from the W3C. There is also an excellent annotated version of the XML 1.0 Recommendation by Tim Bray (one of the co-editors of the XML specification). These web pages are tremendous resources that also provide extensive links to various other topics related to XML.
9
Chapter 1
The current XML 1.0 REC is at http://www.w3.org/TR/REC-xml. This is the Second Edition, and so there is a very useful color-coded version showing changes from the 1998 edition at http://www.w3.org/TR/2000/REC-xml-20001006-review.html. The first edition is at http://www.w3.org/TR/1998/REC-xml-19980210, with Bray's annotated version available at http://www.xml.com/axml/axml.html. There is an XML 1.0 FAQ (Frequently Asked Questions) website, maintained by Peter Flynn, et al. on behalf of the W3C's XML Special Interest Group. The XML 1.0 FAQ can be found at http://www.ucc.ie/xml. There are some other non-commercial resources that are very useful XML information sources, and serve as depositories for communally developed XML vocabularies, namespaces, DTDs, and schemas. There are also numerous e-mail lists devoted to various XML-related issues. The Organization for the Advancement of Structured Information Standards (OASIS) is a non-profit, international consortium that is devoted to accelerating the adoption of product-independent vocabularies based upon public standards, including SGML, HTML, and XML. This organization is working with the United Nations to specify a modular electronic business framework based on XML (ebXML), and with various other organizations to produce other XML vocabularies. OASIS hosts The XML Industry Portal for news and information of XML at XML.org, and The XML Cover Pages, one of the best websites for researching all aspects of XML, including current tools and vocabularies. OASIS has also become the host for XML-DEV, the XML developers mailing list. This list is primarily for the developers of XML parsers and other tools, so the technical level is quite high and focused upon some of the more esoteric issues of XML syntax, grammar, schemas, and specifications. Any questions concerning XML may be posted to this list, but browse the archives first for pertinent threads (and a sense of the list's scope) before posting any questions – this is not the list for simple XML questions. OASIS is at http://www.oasis-open.org. The XML Industry Portal is at http://www.xml.org, The XML Cover Pages are at http://www.oasis-open.org/cover/, and the XML-DEV home page and archives are at http://www.xml.org/xml-dev/index.shtml. The XML-L and dev-xml e-mail lists are much better choices than XML-DEV for basic questions, and for developers of XML applications. Questions about XSL should be posted to the xsl-list e-mail list, rather than posting to any of the more generic XML lists. In addition, cross-posting between these lists is strongly discouraged. The XML-L home page is at http://listserv.heanet.ie/xml-l.html, dev-xml is available at http://groups.yahoo.com/group/dev-xml, and xsl-list is at http://www.biglist.com/lists/xsllist/. All of these sites provide subscription information and list archives. There is also a USENET newsgroup for XML at comp.text.xml.
10
Introducing XML
The Various Stages of W3C Specifications Before we delve deeper into XML and all of the specifications of its related technologies, it would be a good idea to explain what each level of the specifications actually means. More detail can be found at the W3C at: http://www.w3.org/Consortium/Process-20010208/tr.html than is given here, but we give a quick overview to help understand how near completion the various standards are. Once the W3C wants to publish a standard, it moves through five stages before reaching its final form. They are detailed below, from the first appearance at Working Draft, until it reaches the final Recommendation status. Every specification enters the W3C through a Note; it is then considered by a working group who will want to move it through the various stages so it can become a Recommendation. There are various processes that have to be performed and conditions to be satisfied before it can be moved up. A specification can be returned to an earlier stage at any time before it becomes a Recommendation, so its position in the different stages is no guarantee that it is any nearer completion.
Working Draft At this stage, there is no guarantee as to the quality of the specification; it just means that a working group is working with the specification, redrafting it in association with external parties.
Last Call Working Draft
Candidate Recommendation
AM FL Y
After a number of conditions have been met, the specification is put through to Last Call Working Draft. It generally remains at this stage for three weeks only. It can last longer if the "...technical report is complex or has significant external dependencies" and the length of this review period has to be specified at the start of this stage. During this stage, the working group must address all comments on the specification from all parties, including external agencies. If the Director is satisfied that all objections have been noted and all comments addressed, it may move up to Candidate or Proposed Recommendation status. Once it moves up from this stage, the technical report or specification, will change very little, unless it is rejected further up the process and sent back to Working Draft status.
Proposed Recommendation
TE
At this stage, the comments made during the Last Call have to have been addressed, and the Working Group have to attempt to implement all features of the technical report. The technical report can be updated during this stage for clarity, and the period lasts as long as it takes to implement all the details.
For the specification has reached this level, a working implementation of the technical report has to exist. All issues raised during the previous implementation period have to be addressed, possibly by amending the technical report. During this stage, the working group should address all "...informed and relevant issues..." raised by the public or other Working Groups. The specification must remain at this stage for at least four weeks before moving on. It can either move up to Recommendation status or move back down to Candidate Recommendation, or Working Draft, status.
Recommendation This is the final stage of the process. The Director must be satisfied that there is significant support for the technical report before progressing to this stage. The W3C should make every effort to maintain the Recommendation, updating any errata and assisting in the creation of test bed software. We will now move on to summarize the rest of XML, starting with the XML core.
11
Team-Fly®
Chapter 1
The XML Core The core of XML and its key components and extensions are: ❑
XML 1.0 syntax, including Document Type Definitions (DTDs)
❑
Namespaces in XML
❑
XML Schema (or one of its alternatives or supplemental validation tools: XDR, SOX, RELAX, TREX, and The Schematron)
These basic specifications describe the syntax of XML 1.0 and provide a standard validation grammar (DTDs). The extensions support multiple and shared vocabularies (Namespaces), and more rigorous and powerful validation (XML Schema et al.). In conjunction with the XML parser (which is also defined in the XML 1.0 REC), these comprise the first tier of XML processing. Technology without application is useless, so several important (second tier) applications of XML are also becoming part of XML-based systems. These related specifications provide some of the key features that are commonly required in XML applications. ❑
Describing XML data structure: The XML Information Set (XML Infoset) and XML Path Language (XPath)
❑
Navigating & Linking: XML Linking (XLink), XML Pointer Language (XPointer), XML Inclusions (XInclude), XML Fragment Interchange (XFI), and XML Query Language (XQuery).
❑
Transforming & Presenting: XSLT and XSL-FO (XSL Formatting Objects)
We will look at the core syntax first, and then we'll look at the practical applications of these technologies and some widely shared XML vocabularies.
XML 1.0 Syntax As we've seen before, the basic syntax of XML is described in a W3C recommendation called Extensible Markup Language (XML) 1.0 (Second Edition). This recent revision (2000-10-06) is strictly a documentation update, including some clarifications and minor code changes. There are no fundamental changes to XML as described in the original XML 1.0 recommendation (1998-02-10). Several facets of basic XML should be understood to fully appreciate and effectively use XML for markup.
Self-Describing Data The tags that delimit the different components of an XML document can also be interpreted to provide some semantic information about the document content. The use of descriptive element tag and attribute names, in a shared XML vocabulary allows software to extract structured data from XML documents. For instance, if we consider an excerpt from the example XML text earlier in this chapter, we can see how a product name, price or other data can easily be extracted from the text, by searching for the appropriate tag:
12
Introducing XML
floor(position() div 2)]"/>
As before, we are using list to render the boxes that are the visual representation of elements. There are too many elements to fit in a single row that would be visible without scrolling, but we do not anticipate so many that we need to do complex calculations to determine the number of rows needed. We assume that two rows will be good enough. The filter: position() div 2 = floor(position() div 2)
gives us all the elements whose ordinal position in the node list is an even integer. Hence, our first call to the list template renders the even numbered elements, while the second call renders the oddnumbered categories (since the floor function will truncate the results of the integer division). The next view on the page is the view for technology specifications. We name this specsView and assign it an event name of OnTechClick:
24
Schema Based Programming
The task here is to obtain all the Technology elements belonging to the currently selected category. In general, there should be no selected category since this code is only executed on the initial request, but this will not be true when we implement the server-side solution. In order to share the same XSLT stylesheet between implementations, we have to have code that determines the selected category, whenever there is a selected technology that does not belong to the core category. The variables selNode and event will be set up for the server-side solution. In the current case, these will always evaluate to false. When it is executed (in the server-side implementation), we determine the selected category by looking at the value of the category attribute if the selected node is a element and by looking at the id attribute if the selected node is a element.
Similarly to the catView view, we use two calls to the list template to render our technologies in the view. In the initial pass of the client-side implementation, there is no selected category so no specifications are displayed. Finally, we throw in a horizontal rule to set off the publication's abstract, and then create the physical view, abstractView, for it. Since there is no selected category, there are no specifications. Without specifications, there can be no selected specification, and therefore there is nothing to display in this view initially. We shall study this part of the template in more detail when we look at the server-side solution. A similar body of XSLT will appear in the transformation associated with this view.
25
Chapter 13
An Event Handler Transformation When LivingXML.xsl completes its processing, a finished HTML with the initial representation of the views and the controller script code is returned to the requesting browser. At that point and for this implementation, LivingXML.xsl is finished. All further transformations come in response to user events and originate in the logical views embedded in the page. The transformation that handles the rendering of the specifications view is representative. It is found in the element whose id attribute is showSelSpecs. Each logical view contains exactly one child element, an XSLT element. This is synonymous with the element and is a complete XSLT stylesheet:
The task in this view is to render boxes for all the elements belonging to the selected category. This view may be asked to render in response to a number of events other than a change in category, so we have to provide for the case in which a specification is selected.
As with categories, we assume that two rows will fit everything in well enough. We begin with a rootmatching template that we call list. This is where we run into our first problem. Since list is written in LivingXML.xsl and this element is in the body of the HTML page, we have to repeat list here. This is a problem with the client-side only implementation. We have no mechanism for reuse. We could use an element or an element, but that would violate the spirit of a self-contained entity. This area requires further study. For now, copying the list template and its dependencies is sufficient for our purposes. We check the selectedCat attribute of to determine which category is selected. This helps us with the XPath expression that selects the nodelist we pass to list.
26
Schema Based Programming
We never studied the list template when we discussed LivingXML.xsl. Since the only thing left in this event handler transform is list and its dependencies, this is a good time to look into it. Here is the list template:
This template is an excellent example of some of the changes in technique that result from the shift to declarative programming. In a procedural language, you would normally iterate through the nodelist outputting boxes on the view. You would have some sort of explicit (as in a for-next loop) or implicit (as in a for-every loop) iteration variable. The value of this variable would change with each iteration. XSLT, however, lacks an assignment operator. Once created, a variable holds its value. We have need of two changing values in list: the current node ($firstnode) and the current x coordinate ($x). Recursion gets around this lack. When we enter list, we test to see if a null list has been passed. This is because a selected category might be empty. In addition, the first call to update this view will occur when the page is loaded and no category is selected. The other views have no such problem and can guarantee that their nodelists will never be empty. If you examine the other copies of list in LivingXML.xml, you will find they do not perform this test. If a non-empty list was passed, we immediately split the nodelist into two parts: the first node and the rest of the list. We call the named template output-box to render the first node. That template takes x and y coordinates, the node itself, and a Boolean value indicating whether the specification is currently selected. We determine the value of the Boolean to pass, by comparing the first node's id attribute against the value of the attribute, selectedSpec, which tells us the name of the currently selected specification.
27
Chapter 13
Now we have one box on the view and we need to know what to do next. First we see if there is anything left to process. If the remainder of the nodelist, rest-of-list, is not null, there are nodes left to render. At this point, list recursively calls itself. Now the passed nodelist is rest-of-list, and we can pass in the current x coordinate plus a known offset of 4.5 centimeters as the new value of $x. The value of the $x variable has not changed: on the call stack for this call, $x has the value that was passed in. The next call to list will have the incremented value for its $x value, and it will never change this. Eventually we will make a call to list with a nodelist consisting of a single node. We will output that box, discover rest-of-list is null, and terminate processing. The output-box template is fairly simple. We will make fairly extensive use of DHTML and the style attribute in particular to create boxes with the interactive behavior we specified at the outset:
Each box consists of an HTML element. We establish a title attribute whose value is the Abstract element in our model – remember, this is the short, one line abstract we provided – to get the tool-tip feature of Internet Explorer to give us some information about the publication when we hover over it. The id of the box, needed to update our selection state when the user generates a click event, is set equal to the value of the id attribute for the Technology element. This allows us to tie events in the view to the corresponding element in the model.
cursor:help;border:1px solid black;position:absolute; top:cm; left:cm;
The style attribute of the controls the appearance and placement of the box. In this case, we set the absolute position of the on the page to match the x and y coordinates we passed into the template. z-index:1;width:4cm;height:2cm; background:
yellow
;
28
Schema Based Programming
The background of the box should be yellow if the specification is selected, or the proper color representing the publication's status in the W3C process otherwise. If the Boolean parameter $selected is true, we can generate the string yellow to control the background. Otherwise, we pass the value of the element's status attribute to the color-code template to get the proper color:
red blue #990000 #cc9966 #663300 white
That template outputs a named color or hexadecimal color code based on the value of the status attribute. The label placed in the box is going to be a hyperlink to the text of the publication on the W3C site. To accomplish this, we generate an anchor element:
color:black;text-decoration:none;font-family:verdana; font-weight:bold;font-size:10pt;position:relative; top:0.75cm;width:4cm;text-align:center;
new
Since we don't want to lose the application if the user follows the hyperlink, we set the target attribute of the anchor element to the value new so that the browser will open the publication in a new window. The text of the hyperlink is the ShortTitle element's value. You'll recall that we created this element precisely for this purpose because of concerns that the full title of the publication would not fit in our boxes. After that, we simply close all our open XSLT elements to complete the template. As you can see, there is quite a lot of source to the templates list, output-box, and color-code. In the client-side implementation, these are repeated three times, once for each view. Obviously, we need to devise some mechanism that will permit us to reference templates that are not physically contained within a given logical view. The server-side implementation also suffers from this problem because it uses the same XML document. Although it will not call the client-side views, the text must be passed to the requesting browser, and that will include all this dead code.
Server-Side If you've mastered the client-side implementation, you will find the server-side implementation much easier. Even though the controller is split between the client and the server, it is simpler than its client-side counterpart. This is because there is no need to link events and views. All views are updated on each server round-trip. The full body of LivingXML.xsl is executed for each request.
29
Chapter 13
In addition, much of the XSLT is similar to what you have seen in the client-side implementation. Where this is the case, we'll mention it in passing and pass quickly over the common source with little comment.
Controller: HTML We recall from our discussion of the client-side implementation, that the templates that copy code into the HTML page control use elements, which are copied based on the mode parameter:
We looked at what is copied when the mode is XML. If you request the server-side version instead, that is, by calling for the ASP rather than the XML document, the ASP code which we shall see in a bit sets the mode parameter to ASP before performing the transformation. Here is the script code that is copied when the mode is ASP: window.onload = initialize; function initialize() { catView.onclick = notify; specsView.onclick = notify; coreView.onclick = notify; } function notify() { if (event.srcElement.id != "") location.href = "LivingXML.asp?event="+ this.event+"&selected="+event.srcElement.id; }
This makes up the entire client-side controller for our server-side implementation. As with the client-side implementation, we invoke initialize, but this version is a good deal simpler. There is no longer a need for transform, and there are no functions to perform the linkage between events and views. We simply pass the name of the event off to the server-side controller as a parameter in the request URL. In this version of the notify function, note that we are setting the current page's href property to be a URL to LivingXML.asp that includes the current selection and the name of the event. This triggers an HTTP request for the ASP causing the server portion of the controller takes over.
Controller: ASP The server-side portion of the controller consists of an ASP page whose only function is to set some parameters and perform an XSLT transformation using LivingXML.xml and LivingXML.xsl. The most efficient way to do this using MSXML 3.0 on the server is with two free-threaded instances of the parser, one for each document, and an XSLT template object. The latter object is Microsoft's optimization for server-side XSLT processing. It allows a stylesheet to be compiled and cached for use in subsequent requests. We create the three objects with tags. Note the RUNAT attributes. The SERVER value tells the ASP processor that these objects are to be instantiated on the server and that the tags are not to be passed to the client:
30
Schema Based Programming
Now we need to run some script on the server to perform the transformation and pass it back to the client. The first thing we need to do is get two parameters passed in the URL:
Once the parameters have been set, the processor is commanded to perform the transformation. When the script finishes, ASP streams the Response object, which now contains the HTML page with updated views, back to the client. Although the server-side controller is a good deal simpler than its client-side equivalent, this simplicity comes at the cost of the purity of the MVC implementation. The client-side implementation enforced the strict linkage between events and the views created by them. The server-side implementation blindly updates all views in response to each event.
View: Transforms All the views are created physically by the stylesheet with each request to the server. Since the controller is simplified, there is no need for logical views. The remainder of the server-side implementation's behavior, therefore, is found in the XSLT stylesheet. Although much of the XSLT is similar to what you have seen before, we will be investigating some processing that we glossed over in our previous discussion. In the clientcase, the parameters did not exist, so certain paths through the XSLT were never executed. The server-side controller sets those parameters, so we need to look at what those elements do in our application. To refresh your memory of the stylesheet, here are the parameters declarations:
OnCatClick XML
Note that this time, event, mode, and possibly selected will be overridden with values passed into the stylesheet by the processor. In particular, mode will have the value ASP.
32
Schema Based Programming
Core Material Processing begins with the creation of the shell of an HTML document:
XML Standards
This is unchanged from the previous implementation, but since the mode parameter is now ASP, different elements are copied into the client page. This is the controller code you saw in the section on the client-side controller. Next, we enter the body of the page and create the that contains the legend and the core XML technology boxes. This is unchanged from the preceding version, and processing is exactly the same:
Recommendation
. . .
As before, we rely on list to generate a row of boxes corresponding to the elements selected by matching their category attribute value against the string core. List is the same, but output-box is a bit different this time. Since it is called from a variety of places, we need to test what sort of node is being processing by looking at the element's name:
33
Chapter 13
cursor:help;border:1px solid black;position:absolute; top:cm; left:cm; z-index:1;width:4cm;height:2cm; background:
yellow
;
color:black;text-decoration:none;font-family:verdana; font-weight:bold;font-size:10pt;position:relative; top:0.75cm;width:4cm;text-align:center;
new
Category:
cursor:help;border:1px solid black;position:absolute; top:cm; left:cm; z-index:1;width:4cm;height:2cm; background:
yellow
34
Schema Based Programming
gray; ;
color:black;text-decoration:none;font-family:verdana; font-weight:bold;font-size:10pt;position:relative; top:0.75cm;width:4cm;text-align:center;
We use the XPath function name to give us the element name of the node being processed, then output the DHTML for a box with slight adjustments to what is contained therein based on whether we are processing a Category or Technology element.
Categories Next, the view containing the category boxes is generated. This is fixed data and the server-side code merely calls the list template with fixed XPath expressions:
+
Although this is unchanged from what got executed in the client-side implementation's initial pass through the data, there is one slight difference in behavior. This gets executed every time a request is made. Unlike the initial pass, there may be a category selected. Since the controller does not modify the model document using DOM calls (and cannot, as the document is cached and may be shared by multiple clients), the selectedCat attribute will not have a non-empty value set for it. The server-side implementation, consequently, does not keep track of the selected category when a specification is selected. The only time a category box will be drawn with a yellow background is when the category is initially selected and no specification selection has been made. Since the specifications themselves indicate which category is selected, this is a minor loss.
35
Chapter 13
Selected Specifications We are still inside the template that matches the document element. We now create the view for the specifications. The client-side implementation bypassed a lot of the processing since no category was selected. This is where processing begins to differ between the implementations. This template will be processed many times, so we need to see if a category has been selected and if a specification is selected:
If the user is clicking on one of the core specification boxes, the selection category is cleared and no specification boxes should be displayed. Otherwise, we proceed.
If we are processing a element, the category attribute gives us the category ID. If it is a element, then the id attribute does the same thing. We process specifications in two rows, first selecting all elements whose position in the node list is an odd number, then all elements whose position is an even number.
Abstract Finally, we have to decide whether we need to display the full text of an abstract, which we have to do if a specification is selected. The client-side implementation did not deal with this in the stylesheet because nothing could possibly be selected on the initial pass. Since the stylesheet is always involved in the server-side implementation, this part of the template will get executed whenever a specification is selected.
36
Schema Based Programming
Besides outputting the horizontal rule, we generate a element with the id abstractView. This is the view for the abstracts in our application:
The only case where we want to display an abstract is if a specification is selected. The first half of the XPath test expression tells us whether the selNode variable is empty. Since we may have selected a element, we have to use name to see if the selected node is a element.
If a element corresponds to the selected box in the user interface, we proceed to call the named template Abstract_TM_Statement. This template is responsible for displaying the correct copyright notice. Although the full abstract is stored in our model, it is excerpted from a W3C document and we must comply with their legal requirements. It is not possible to link to the copyright notice in the original document. Most W3C publications are published as HTML and therefore the copyright information is not consistently tagged. Also, some documents do not have a copyright notice, in which case we must display the default notice specified by the W3C. Since we did not discuss the template that generates the abstract, we shall do so here.
This DTD validates XML documents conforming to My Favorite Namespace.
This XML Schema augments the infoset of XML documents conforming to My Favorite Namespace.
element can appear. The content model for allows anything allowed by the element, including more elements. The ATTLIST declaration for elements looks like this:
#IMPLIED #IMPLIED #IMPLIED #FIXED "http://www.rddl.org/" #FIXED "simple" #IMPLIED "http://www.rddl.org/#resource" #IMPLIED #IMPLIED #FIXED "none" #FIXED "none"
The XLink attributes that will consistently have the same value, have been declared as #FIXED. The standard id, xml:lang, and xml:base attributes are optional. It's usually a good idea to give your resources unique IDs so that they can be referenced directly. The xlink:title attribute is optional and can be used to provide a human-readable label for the resource. The remaining attributes, xlink:href, xlink:role, and xlink:arcrole will be described shortly. With the exception of this new element, RDDL is the same well-formed XHTML we've come to know and love. We can enter text, links, tables, images, and even forms.
What a RDDL Resource Represents So what does a element represent? It represents a resource, of course – hopefully, one that's somehow related to the namespace URI at which the document resides. The actual URI of the resource and its relationship to the namespace are identified by attributes from the XLink namespace: ❑
The resource's URI is indicated using an xlink:href attribute
❑
The xlink:role and xlink:arcrole attributes specify the nature and purpose of the related resource
Natures and purposes in the general sense aren't specific to RDDL but the terminology is. Basically, a resource's nature (specified using an xlink:role attribute) describes the resource's type. The word "type" is far too overloaded in developer circles, and so the synonym "nature" was adopted by RDDL in an attempt to be less ambiguous in our discourse. When describing a resource, we could ask, for example, if it's a DTD, an XSLT stylesheet, or an HTML document. These natures are identified by a URI. If the resource nature is already known by some commonly accepted URI (like a namespace URI), then it's preferred that that URI be used to identify the resource's nature. XML Schema documents, for example, should specify http://www.w3.org/2001/XMLSchema as their nature. RDF Schema documents would most likely use http://www.w3.org/2000/01/rdf-schema#.
39
Chapter 23
It is not a requirement that resources be XML documents, however. DTDs, legacy HTML documents, CSS stylesheets, plain text files, ZIP archives, and more, can all be described using RDDL. Since these types of resources don't have well-known namespace URIs, their MIME type can be used to create a suitable URI in its place. Practically everything has a MIME type. To create the URI, append the MIME type to http://www.isi.edu/in-notes/iana/assignments/media-types/. Try appending application/xml to that URI and resolving it and see why we do this. A list of common natures can be found at http://www.rddl.org/natures. Some of them are listed below: Nature
URI
Legacy HTML
http://www.isi.edu/in-notes/iana/assignments/media-types/text/html
CSS Stylesheet
http://www.isi.edu/in-notes/iana/assignments/media-types/text/css
XHTML Document
http://www.w3.org/1999/xhtml
RELAX Grammar
http://www.xml.gr.jp/xmlns/relaxCore
Schematron Schema
http://www.ascc.net/xml/schematron
TREX Pattern
http://www.thaiopensource.com/trex
Like natures, purposes (specified using an xlink:arcrole attribute) also use URIs. But a resource's purpose identifies how it's meant to be used. The purpose of an XML Schema is most likely to validate an instance document. The URI we would use would probably be http://www.rddl.org/purposes#schema-validation. RELAX grammars can be used for the same purpose, though. And so can Schematron schemas or TREX patterns. It's the combination of nature and purpose that allows RDDL-aware processors (of which none currently exist) to select and use the validation technology that they prefer. A list of common purposes is found at http://www.rddl.org/purposes. We highlight some of them below: Purpose
URI
Parse-time validation (DTDs)
http://www.rddl.org/purposes#validation
Post-parse validation (Schemas)
http://www.rddl.org/purposes#schema-validation
Software Implementation
http://www.rddl.org/purposes#software-package
Documentation
http://www.rddl.org/purposes#reference
Specification
http://www.rddl.org/purposes#normative-reference
Representative image
http://www.rddl.org/purposes#icon
Arbitrary Transforms One particularly interesting combination of natures and purposes occurs when relating an XSLT resource (using the http://www.w3.org/1999/XSL/Transform as its nature URI) capable of transforming an instance document from this namespace into an instance of some other namespace. The purpose URI should be set to the nature URI of the result of the transform. This isn't easy to explain, so perhaps an example can help make it clearer:
40
RDF Code Samples and RDDL
...
My Favorite Namespace to XHTML Transform ...
My Favorite Namespace to RSS Transform ...
...
In a world where every namespace URI resolves to a RDDL document, it's conceivable that a RDDLaware tool could attempt to transform a document conforming to some arbitrary namespace to a document conforming to some other arbitrary namespace without any special knowledge of either.Resolving the namespace URI of the source document and searching for a element with a nature equal to the XSLT namespace URI, and a purpose equal to the namespace URI of the desired result document could yield a transform capable of performing this task (as indicated by that element's xlink:href attribute). This tool need not have any hard-coded knowledge of the namespaces it's working with. And that's the beauty of the whole concept behind RDDL. If a stylesheet exists with the appropriate nature and purpose values then it can be discovered and applied automatically. Of course, RDDL isn't restricted to stylesheets – any type of resource usable for just about any reason can be located via a RDDL document at the end of a namespace URI.
RDDL Examples Since XHTML and, therefore, RDDL are well-formed XML, we can use our favorite XML tools to process and discover resources within it.
Getting the RDDL DTD When creating our own RDDL directories, it'd probably be a good idea to copy the RDDL DTD files to our local machine (and modify DOCTYPE accordingly) to improve performance and reduce the load on the www.rddl.org. Since RDDL only adds a single element to XHTML, the chances of it changing frequently are slim.
41
Chapter 23
The files that make up the RDDL DTD are bundled into a convenient ZIP archive and available for download. To discover the current URI for this archive, try searching the RDDL document at http://www.rddl.org/ for a resource with an ID of "ZIP" (either by viewing the source, using some sort of RDDL-aware processor or a custom tool – we'll be making our own using XSLT later on). At the time of writing, the RDDL doc has the following RDDL resource to hold the link to the ZIP file containing the DTD.
The RDDL Report Transform A simple XSLT transform like the following could generate a report containing all of the resources in a directory.
Resource:
Nature:
Purpose:
URI:
When you look at the description of the operation (message tag section in the previous code), you can see that our web service has one input, the InvoiceAdd operation that is being invoked, and an output message – the InvoiceAddResponse. The input operation defines seven input parameters and the output of the service has one output parameter, that being Result. The data types of those parameters are specified in the previous XML code, as well. The order of the parameters is provided next, within the operation tag section in the following code. It is interesting to note that within the binding section, you can see the binding style specified as being rpc and the transport being specified as HTTP. Next, the operation line points to the entry point in this service, that being the http://localhost/soap/InvoiceSubmit.asp.
24
SOAP
In this last section of the WSDL, you can see that the service name is InvoiceSubmit. Finally, within the service section of the WSDL, we have the location attribute specifying the location of the service being an HTTP address to an ASP page. This is the actual endpoint or listener of our service.
At this point we were ready to look at the ASP that acts as the SOAP listener. The SOAP listener simply connects our COM object with the web server. This file is created automatically by the WSDL Generator utility used to create the WSDL file described earlier. When running the wizard, you can choose to create an ASP SOAP listener or an ISAPI (Internet Server API or Application Programming Interface) DLL file as the SOAP listener. The ASP version is slower, but it provides more flexibility, such as easy editing. The entire code for the ASP listener file is included in the code download from the Wrox web site, but the core section of this file is the following ASP/ VBScript code:
0 and not OrderUpdate.[__Exists__]. Connect this rule to the Send OK Ack action shape and the Else to the Send Fail Ack action shape. Ensure that the join shapes read OR, if it hasn't been done previously. We can do this by double clicking on the shape and selecting the appropriate radio button in the Properties dialog.
38
B2B with Microsoft BizTalk Server
At this point, our business process diagram should look like it did in our original screenshot (see the "Business Process Diagram" section). All the action shapes should be bound to message implementations. We might think that the schedule is finished, however we've not established a way to create the OrderUpdate or OrderAck messages. While a complicated, production-ready schedule might use a call to a COM+ component to do this, we can get by with creating them in the Data view. This will give us a chance to inspect that view and see what data it contains and what it can do for us.
Data Flow Click on the tab at the bottom of the screen labeled Data. The page that appears is quite a bit different from the business process diagram. We will see a series of labeled tabular shapes. Each message we created in the message binding process has a table listing the system fields available for it as well as any fields we selected. These tables are what permit us to gain access to the field values passed in the messages at runtime and copy them to subsequent messages. There is also a table titled Port References with a list of the messaging implementation ports. The values in this table are COM+ monikers for the instantiated ports. Such monikers are strings that allow the COM+ runtimes to uniquely reference some entity. We are going to use the Data view, as noted, to create the OrderAck and OrderUpdate messages. The basic procedure for this is to add an empty message shell to the data view as a constant, then fill in the values of the fields using the appropriate values passed in other messages, for example the request reference ID. To do this, go to the Constants table and right mouse click, then select the Properties… menu item from the context menu. In the Constants message properties dialog that appears, click the Add… button. In the Constant Properties dialog, enter the name OrderAckShell, ensure string is the data type, and type the following string into the value field:
AM FL Y
TE
This is the empty shell of an OrderAck message. We can obtain all the data we need for the fields except for the value for the ackType attribute. To provide this, we'll add two more string constants, OK, with the value "ok", and Error, with the value "error". Once we have added these, arrange the message tables so that they look something like the diagram overleaf. Ignore the connections for a moment.
39
Team-Fly®
Chapter 25
Now that we have things arranged, we are ready to link fields together to create a data flow. From the business process diagram, we know that we have two variations of the OrderAck message: OKAck and FailAck. Let's start with OKAck. The first step is to drag a connection from the right side (the output) of the OrderAckShell string constant to the Document field of the OKAck message. This provides the message with a shell. Next, since we want the ackType field to have the value "ok", drag a connection between the OK constant and the ackType field. At runtime, XLANG Scheduler will insert the shell, then fill in the ackType field with the constant we've provided. The Location, requestID, and Part fields must have values that reflect the actual LowInventory message received by the schedule. These values are available in the CommodityRequest message table as the Location, requestID, and partNumber fields, so drag connections from them to the OKAck message fields. Now that we've done this, XLANG Scheduler can generate a complete OKAck message. When the schedule reaches the Send OK Ack action, XLANG Scheduler will assemble a complete OrderAck message with the ackType field set to ok based on our data flow and pass this message to the messaging service for transmission. Now make similar connections between the string constants, the CommodityRequest message fields and the FailAck message fields. Remember to link the Error string constant to the ackType field in FailAck to properly set the ackType field and indicate a failed order request.
40
B2B with Microsoft BizTalk Server
Finally, we need to create the data flow for the OrderUpdate message. Return to the Constants table and add a string constant named OrderUpDateShell with this value (whitespace has been added for clarity on the page):
Since we are only using this message in this schedule to cancel an order, not modify it, we can get away with hardcoding the action attribute's value in the shell. Link this constant to the Document field in the OrderUpdate message. Link the requestID, partNumber, quantity, and Location fields to requestID, part, quantity and Location in the OrderUpdate message, respectively. This completes the data flow for the sample schedule. We could also have created the CommodityRequest message this way, but it would be unrealistic to expect to do this in a production application. In addition, the Data view would become very difficult to read. With the schedule now complete, save the file as InventoryOrderProcess.skv. This is a Visio file, however, not an XLANG document. To create the XLANG document we need to run the schedule, select the main menu's File Make XLANG InventoryOrderProcess.skv… menu item to compile the schedule. Move the file to the location provided in BizTalk Messaging Manager when we configured the port named RecvLowInventoryPort. All that stands between us and watching this schedule execute is a stub for the warning application and HTTP receive functions to accept the messages on behalf of the external vendors.
Putting the Schedule into Production The focus of this sample is on BizTalk, not the applications it coordinates. We are only interested in the outside applications to the extent that we can get a LowInventory message into the system and watch other messages come out of it. We'll create a simplistic HTML page with some client-side Javascript to put a prewritten and saved LowInventory message into the warning_adv queue, then create two ASP pages to act as HTTP receive functions. Both receive functions should write the messages they receive to a log file so that we can verify that messaging is, in fact, taking place. In real life, these pages would access the applications used by the vendors through whatever messaging or API the applications support. In addition to logging the messages it receives, the service technician's receive function ASP must also submit a SchedConfirmation message to the svc_sched queue to keep things going.
Simulating the Warning Application The stub for the warning application is called stuff.html. This page consists of a single button whose onclick event is handled by the Stuff function: function Stuff() { var mqInfo, mqQueue, mqMessage, ofsFSO, mqTxDisp, mqTx; mqInfo = new ActiveXObject("MSMQ.MSMQQueueInfo"); ofsFSO = new ActiveXObject("Scripting.FileSystemObject"); mqTxDisp = new ActiveXObject("MSMQ.MSMQTransactionDispenser"); if (mqInfo != null)
41
Chapter 25
{ mqInfo.FormatName = "DIRECT=OS:.\\private$\\warning_adv"; mqQueue = mqInfo.Open(2, 0); mqQueue.Refresh; if (mqQueue != null && mqQueue.IsOpen == true) { mqMessage = new ActiveXObject("MSMQ.MSMQMessage"); if (mqMessage != null) { mqMessage.Body = ofsFSO.OpenTextFile( "c:\\temp\\lowinv_sample.xml",1).ReadAll(); mqMessage.Label = "LowInventory"; mqTx = mqTxDisp.BeginTransaction(); mqMessage.Send(mqQueue, mqTx); mqTx.Commit(); mqMessage = null; ofsFSO = null; } mqQueue = null; } mqInfo = null; } }
This is relatively straightforward MSMQ and FileSystemObject programming. Note the FormatName string refers to a queue on the same machine as is running this code. You may wish to refer to Designing Distributed Applications (Wrox Press, 1999, ISBN 1-861002-27-0) for an introduction to MSMQ programming in Javascript. The FileSystemObject programming model is documented for Javascript on the Microsoft Scripting Technologies site at http://msdn.microsoft.com/scripting/jscript/doc/jsFSOTutor.htm. In this script, we create MSMQQueueInfo and MSMQTransactionDispenser objects to handle the queue transactionally, and a FileSystemObject object to open a text file on disk that contains a sample LowInventory message. We've provided lowinv_sample.xml in the code download for this purpose. We access the queue by providing the format DIRECT=OS:.\\private$\\warning_adv and opening a queue for write access. If this works, we open the text file and read it in its entirety. The call to ReadAll returns the contents of the file as a string, so the following line assigns then opens the file, reads it, and assigns the contents to the body of the message: mqMessage.Body = ofsFSO.OpenTextFile( "c:\\temp\\lowinv_sample.xml",1).ReadAll();
If you use another sample file or move it on your system, be sure to change the file path appropriately. The event handler function ends by committing the queue transaction and cleaning up its objects. Opening this page in a browser and clicking the button drops a LowInventory message into the warning_adv queue. Don't bother looking at the queue for the message, though. If our MSMQ receive function is working properly, it plucks the message from the queue and tries to submit it to the messaging service.
42
B2B with Microsoft BizTalk Server
Commodity Vendor's HTTP Receive Function The commodity vendor's receive function is an active server page we've adapted from a sample in the BizTalk SDK samples. All it does is grab the text posted to it and, if all goes well, writes it to the end of a textfile named CommodityLog.txt on disk, returning an HTTP 200 response. The sample, which is found in the download as CommodityOrders.asp (or in the unmodified form as Sample3.asp in the Sample3 folder under the SDK's Messaging Samples folder in our BizTalk Server installation), checks the content headers and grabs the posted contents with the following line: EntityBody = Request.BinaryRead (Request.TotalBytes )
After converting to Unicode, it writes the contents to the disk file with the following lines: Set objFS = CreateObject("Scripting.FileSystemObject") Set objStream = objFS.OpenTextFile("c:\temp\CommodityLog.txt", 8, True) objStream.WriteLine "-------- Received at " & Now() & " ------------" objStream.WriteLine PostedDocument objStream.WriteLine "-------- End Received at " & Now() & " --------" & _ VbCrlf objStream.Close Set objStream = Nothing Set objFS = Nothing
Note that this sample is a minimal HTTP receive function, although it performs everything needed to process most simple messages. It is conspicuously lacking support for processing multipart MIME messages (although it does recognize them), so if our messaging applications use binary attachments to go with our text messages, we will need to extend the code found in this sample. Here is a sample for one such log using our sample message: -------- Received at 2/23/2001 10:06:34 -----------2001-0220WD40100 -------- End Received at 2/23/2001 10:06:34 --------
Make a new virtual directory on your web server that matches the URL entered for the messaging port named CommodityOrderPort and you are ready to receive messages on behalf of the commodity vendor.
Service Technician's HTTP Receive Function The service technician's receive() function must perform the same functions as we just saw, so save the ASP as ServiceOrders.asp. Change the following file writing line as follows to reflect a different log file: Set objStream = objFS.OpenTextFile("c:\temp\ServiceLog.txt", 8, True)
This receive() function must also send a SchedConfirmation message to a message queue to keep things going. After the lines that write the message body to the log file, insert the following lines. They constitute a VBScript translation of the MSMQ code we saw in stuff.html:
43
Chapter 25
Dim mqInfo, mqQueue, mqMessage, ofsFSO, mqTxDisp, mqTx Set mqInfo = CreateObject("MSMQ.MSMQQueueInfo") Set mqTxDisp = CreateObject("MSMQ.MSMQTransactionDispenser") If IsObject(mqInfo) Then mqInfo.FormatName = "DIRECT=OS:.\private$\svc_sched" Set mqQueue = mqInfo.Open(2, 0) If (mqQueue.IsOpen) Then Set mqMessage = CreateObject("MSMQ.MSMQMessage") If IsObject(mqMessage) Then mqMessage.Body = "" mqMessage.Label = "SchedConfirmation" Set mqTx = mqTxDisp.BeginTransaction() mqMessage.Send mqQueue, mqTx mqTx.Commit() Set mqMessage = nothing End if Set mqQueue = Nothing End if Set mqInfo = nothing End if
There are a few details to note. First, the FormatName property value reflects the name of the svc_sched queue. Next, the message label matches the document element name of our message. Finally, we've hardcoded the SchedConfirmation message body into the script. The reqID won't be set properly, of course, but we know that our sample will always be dealing with a single message at a time, so it isn't worth parsing the incoming message to retrieve a value that will never be used by the system. After saving this page, place it in the folder that we configured as a virtual web directory in the preceding section. If we used a different URL when configuring the ServiceRequestPort messaging port in BizTalk, place the page in that folder instead. A sample log entry for this receive function is: -------- Received at 2/23/2001 10:06:34 -----------AXY1020(A8769-B9),Refinery Unit 7,2001-02-20,A-8917-01 -------- End Received at 2/23/2001 10:06:34 --------
We may test the proper operation of the BizTalk application by changing the value of the status field to zero. Since IIS may be configured to cache ASPs, we may have to stop and restart the web server to reflect the changed page.
Testing the Application When all our receive functions and BizTalk configurations are ready, we may test the operation of your completed application by opening stuff.html and clicking the button. After a brief interval (longer the first time as the messaging service has not run any messages), we should see the receive() function message logs appear (if we did not create them ourselves) in the directory specified in the script. We can open them and verify that the vendors received their messages.
44
B2B with Microsoft BizTalk Server
If we go into the Computer Management utility and inspect the queues, we will not see any messages in warning_adv because the MSMQ receive() function removed them when it submitted the incoming message to the messaging service. The schedule, in turn, removed the SchedConfirmation message from the svc_sched queue when it arrived, so that queue should be empty as well. If we browse the warning_ack queue, however, we should see the OrderAck message the schedule sent to the warning application with the ok value set in the ackType field. Since we didn't bother to implement this portion of the warning application, there is nothing to consume this message. We must manually purge the queue to delete it. To test the error handling in the queue, deny write access to the svc_sched queue to the Everyone account. Click on the button in stuff.html again, then look at the commodity log. We should see an OrderUpdate message canceling the order. Restore the queue privileges, then modify ServiceOrders.asp to send a value of zero in the status field. When the OrderAck message arrives in the warning_ack queue, view the message body. The ackType field should be error.
Summary We've covered a lot of ground in this chapter. You were provided with a necessarily brief overview of the needs, requirements, and technology issues surrounding business-to-business applications. You were introduced to some messaging frameworks and the web sites that discuss them. We reviewed the functional areas of BizTalk Server 2000, a newly released product from Microsoft implementing enterprise application integration and B2B messaging services. These include a messaging service for transmitting XML and flatfile messages over a variety of protocols and a process orchestration engine called the XLANG Scheduler. The messaging service offers the ability to dynamically translate message formats and change protocols. We set up a B2B messaging situation using configuration rather than extensive programming. XLANG Scheduler is a powerful way to implement complex business processes in a declarative fashion. We ended our review of the product with a quick tour of the design and administrative tools that are included in the package. The second-half of the chapter was given over to a sample B2B application implemented with BizTalk Server. We illustrated the main features of the product, including the following: ❑
Orchestration
❑
HTTP and MSMQ messaging using the messaging service
❑
Message format mapping
❑
HTTP receive functions
❑
MSMQ receive functions
❑
Launching a schedule in response to message arrival
This was a simplified example whose messages are not nearly as complicated as we encounter in a realworld B2B environment, and we did not even begin to cover the tasks involved with programmatically accessing and extending BizTalk Server using the various COM+ interfaces provided. We hope, though, that this sample gave some indication of how XML is being used to implement B2B messaging over Internet protocols.
45
E-Business Integration
Introduction to E-Business The purpose of this chapter is to illustrate the methods used to link diverse e-business standards and provide a degree of interoperability between standards. Some of the frameworks define not only the format of the standards, but also the implementation and routing details. We will not consider these latter aspects, such as exchange protocols, messaging, registry and repository in any depth, as they are extensive and out of this chapter's scope. However, we will look at how we can use the vocabularies of each to allow operation between the standards. We will look first at the RosettaNet and xCBL standards, which have been around slightly longer than BizTalk, and ebXML, which is currently still being defined as a standard. Next we will look at the issues that need to be addressed when looking at e-business integration, and look at the solutions to problems faced. Finally, we shall look at an actual implementation of these solutions – an application, which generated a RosettaNet Purchase Order, and submits it to a remote marketplace, where it is converted for use by a Biztalk compliant system.
Chapter 26
You will find the sample code for this chapter in the book's code download, available from Wrox.com, arranged into 5 subfolders inside the chapter folder Also, bear in mind that this case study (particularly the XSLT section) was developed with the Microsoft XML version 3.0 parser freely downloadable at http://msdn.microsoft.com/xml. To view XML output with Internet Explorer (which you will do at the end of the chapter), you must also download the XMLInst.exe (from the same location) file that replaces the default IE parser.
RosettaNet The RosettaNet specification can be found at http://www.rosettanet.org, and was designed to allow e-business system implementers and solution providers to create or implement interoperable software application components. Business processes such as catalogs, invoices, purchase orders etc. can be defined and exchanged in XML according to RosettaNet XML standard DTD or XDR schema, and routed to software programs running on business partners servers. RosettaNet defines itself as shown below: "A self-funded, non-profit organization, RosettaNet is a consortium of major Information Technology, Electronic Component and Semiconductormanufacturing companies, working to create and implement industry-wide, open e-business process standards. These standards form a common e-business language, aligning processes between supply chain partners on a global basis." The key to this is the RosettaNet standard set, which defines a set of business and technical specifications such as Partner Interface Processes (PIPs) for defining processes between trading partners and data dictionaries that define a common set of properties to be used by PIPs. There is also the RosettaNet Implementation Framework (RNIF) that defines exchange protocols allowing for quick and efficient implementation of PIP's. Finally, there is a series of codes for products and partners to align business processes and definitions. PIPs are schema definitions for various business processes (such as a purchase order) and are the key components to the RosettaNet standards– based on XML vocabularies as defined in DTDs and schemas – and there are eight clusters (groups of business processes): ❑
RosettaNet Support Administrative functionality.
❑
Partner, Product and Service Review Allows information collection, maintenance and distribution for the development of tradingpartner profiles and product-information subscriptions.
❑
Product Information Enables distribution and periodic update of product and detailed design information, including product change notices and product technical specifications.
❑
2
Order Management
E-Business Integration
Allow partners to order catalog products, create custom solutions, manage distribution and deliveries, and support product returns and financial transactions. ❑
Inventory Management Enables inventory management, including collaboration, replenishment, price protection, reporting and allocation of constrained product.
❑
Marketing Information Management Enables communication of marketing information, including campaign plans, lead information and design registration.
❑
Service and Support Provides post-sales technical support, service warranty and asset management capabilities.
❑
Manufacturing Enables the exchange of design, configuration, process, quality and other manufacturing floor information to support the "Virtual Manufacturing" environment.
The RNIF can be very technical, and defines how to exchange this information between businesses and facilitate the execution of the business components. It is very extensive and out of scope for the case study. I suggest you have a look at it – the URL is given at the end of the chapter.
xCBL (CommerceOne) The Common Business Library (xCBL) is defined by CommerceOne as: A set of XML building blocks and a document framework that allows the creation of robust, reusable, XML documents for e-commerce. These elemental building blocks were defined based on extensive research and collaboration by CommerceOne and the leading XML industry initiatives. xCBL can help accelerate any trading partner's XML efforts by providing these building blocks and a document framework. Consistent with this purpose, xCBL is available free of charge in prominent XML repositories. xCBL (the "x" states that there are multiple versions of the CBL, the most up-to-date being 3.0) is a single vocabulary and allows for data typing and validation between documents being exchanged in an e-commerce transaction. It allows you to integrate and create business processes such as the following: ❑
Invoice
❑
OrderRequest
❑
PaymentRequest
❑
ProductCatalog
❑
ShippingSchedule
❑
TimeSeries
❑
TradingPartnerUserInformation
3
Chapter 26
There are many more document types that can be created from the extensive vocabulary. Furthermore, xCBL defines a very detailed list of common data types that can be used industry wide when defining a business document – some examples are shown below (you should look at the CBL Reference Library for more information on these data types): ❑
CurrencyCode e.g. "CLP" – Chilean Peso. So, a business could use CLP within its XML instance to state that a particular currency was Chilean Pesos (or one of the many other currencies defined in this data type).
❑
CountryCode e.g. "CL" - Chile
❑
DateQualifierCode e.g. "36030" – calculation is based on year of 360 days and 30 day months
❑
LanguageCode e.g. "en" - English
❑
RegionCode e.g. "CANMB" – Canada, Manitoba
❑
ServiceCode
❑
UOMCode
AM FL Y
e.g. "AdvanceChargesHandling" – Advance Charges Handling
e.g. "BX" – Box (for a box of something)
Again, the best way to understand the extensive e-business framework offered by xCBL is to work through an implementation (such as this case study) and extend that understanding to your own particular business needs.
BizTalk (Microsoft)
TE
http://www.BizTalk.org defines its site purpose as follows:
"… the goal of providing the software and business communities with resources for learning about and using XML for Enterprise Application Integration (EAI), and business-to-business (B2B) document exchange, both within the enterprise and over the Internet." The BizTalk library is not a standards body, but rather a resource of XDR Schemas and vocabulary definitions created and entered by diverse organizations. It is different to xCBL and RosettaNet in that it lets the businesses define the vocabularies and allows users to understand those vocabularies and implement them. For example, there are a few Purchase Order definitions entered by companies that want solution providers to be able to interact directly with their e-commerce systems. When a party uses a schema it is registered, and the business owner can send them update information. To be initially registered, it must pass an XDR schema verification test and is kept on the BizTalk site which controls it's versioning of the schema, which can be updated by the created with a new version at any point.
4
Team-Fly®
E-Business Integration
BizTalk has identified the following benefits for working with BizTalk: ❑
Road map for consistent XML implementations. Core set of elements and attributes defined for creating effective business processes.
❑
Easier mapping across schemas. Allows software developers and ISV (Independent Software Vendors) to map business processes between each other.
❑
Design target for software vendors. Allow software vendors to target consistently formatted schemas.
❑
Framework for standards bodies. Allows the migration of existing standards such as EDI to XML and schemas.
❑
Repository for BizTalk schemas. Allows industry groups and developers to publish their schemas with versioning, searching and specialization for support of the BizTalk Framework.
❑
Showcases for best practices in developing XML interchanges. Allows the same groups and developers to discover best practices for implementing their schemas.
The BizTalk Framework architecture defines rules for formatting, transmitting, receiving and processing a standard XML message. Each BizTalk message (such as a Purchase Order) is wrapped in an XML document, which defines routing and location information. It is a good idea to familiarize yourself with the BizTalk Framework although a detailed explanation is outside the scope of this study. You can find resources at the end of the chapter.
ebXML (Oasis/UN) The ebXML initiative (http://www.ebxml.org) is currently still under development and there is a lot of activity surrounding a myriad areas that will eventually form the ebXML framework. It is similar to the other frameworks and is defined as the following: A set of specifications that together enables a modular electronic business framework. The vision of ebXML is to enable a global electronic marketplace where enterprises of any size and in any geographical location can meet and conduct business with each other through the exchange of XML based messages. ebXML is a joint initiative of the United Nations (UN/CEFACT) and OASIS, developed with global participation for global usage. There is some strong support behind ebXML – it is derived from the work processed by the following workgroups: ❑
ebXML requirements Focuses on defining the requirements of an ebXML compliant application.
5
Chapter 26
Business Process Methodology To accomplish cross-industry XML-based business process integration and to allow organizations to build business processes according to a specification while ensuring that they are understandable by other organizations by cross-process integration. ❑
Technical Architecture The focus of this area is to define a technical architecture for the standard, allowing for highlevel business views and improvement of business process integration with common standard intra-business standards.
❑
Core Components Contains the context and Metamodel (linking ebXML metamodel with context classification schemes), Methodology (for capturing core components) and Analysis (semantic definitions and cross-industry analysis).
❑
Transport/Routing and Packaging Focusing on defining the specification for a universal business-level messaging service for secure, reliable ebXML business document exchange.
❑
Registry and Repository To define ebXML registry services, and an ebXML registry information model and repository.
❑
Quality Review To review the ebXML deliverables for quality and consistency.
❑
Proof of concept Reinforces the viability of the ebXML specifications by understanding how the initiative will be employed in a global market place.
❑
Trading Partners To provide a Unified Modeling Language (UML) model for trading partner information as well as its markup and specification.
❑
Marketing Awareness To stimulate, inform and communicate to the business community the opportunities, benefits and outcomes of ebXML.
This case study will not make much use of ebXML as it is currently still being defined, however, it is worth checking out the ebXML site, as it is making fast progress, and it will become a major player in global business e-commerce in the future. Now that we have had an overview of each of the major technology initiatives, let's look at some of the current problems we run into when integrating these systems.
Integration Issues & Solutions With the Business-to-Business solutions in place, one would imagine that communication between diverse business systems would be simple. Unfortunately, there are still integration and communication issues because of the differences in the schemas of these frameworks.
6
E-Business Integration
Each standard makes a unique contribution to e-business, and to fully exploit the ability to integrate your business processes with other companies, you must be able to use the distributed power of these efforts. RosettaNet is over two years old and is supported by over 200 of the industry leading vendors (such as IBM and Oracle), xCBL also has industry support and works with large software vendors (such a Microsoft and SAP), BizTalk is gaining rapid ground and it's growth will continue, while ebXML is expected to be hugely successful with the backing of many major corporations in the industry. Businesses may not want to support every e-business standard in the market so they must be able to choose what they want to support and have methods to implement this support. So, for example, the different purchase order schemas for RosettaNet, xCBL, BizTalk and ebXML are completely different. So, where the RosettaNet schemas may expect the name of the city the delivery is to be made, the BizTalk schemas may use the street address; and where the xCBL purchase order schema expects a element, in the RosettaNet PO schemas, this element is defined as , as illustrated below: A RosettaNet Purchase Order section instance:
Walnut Creek
1600 Riviera Ave
Suite# 200
94596
CA
US
A similar CBL Purchase Order section instance:
Mr. Muljadi Sulistio Attention: Business Service Division 1600 Riviera Ave Suite# 200 Walnut Creek CA 94596 US
So, for this reason we would have to effectively "black-box" all of the transactional applications working to similar frameworks. The following diagram illustrates a potential e-business scenario in this case:
7
Chapter 26
e-marketplace
Company 1
Company 2 RosettaNet
RosettaNet
RosettaNet Company 3
x
No Interoperability
e-marketplace
Company 1
Company 2 xCBL
xCBL
xCBL Company 3
x
No Interoperability
Other Marketplaces ...
Although this allows tremendous opportunity for businesses to share with and purchase from others in their immediate marketplaces, we can clearly see defined borders and lost opportunity. An even worse scenario is that a business operating in one environment would not be able to work with an existing supplier because they had adopted diverse e-business frameworks. This scenario defeats the goals of all the e-business frameworks, which is to allow universal communication between distributed transactional applications. However, it is obvious that we are never going to have absolutely everyone talking the same language using the same e-business framework, so some solution much be reached. The following diagram represents a high level (but simple) architecture that would allow interoperability among current and future e-business standard frameworks:
8
E-Business Integration
Company 1 (BizTalk)
Company 2 (RosettaNet)
Schema Translator Company 3 (ebXML)
Company 4 (xCBL)
Company 5 (Proprietary)
This allows the best of all worlds. It is similar to current market trading throughout the world. There are many varying languages, yet trading across borders is common practice today. The e-business translation world is a lot simpler as there are well-defined structures to each standard (the schema) and so we can understand meaning and context of the instances a system may be presented with. Technically speaking, we can define the above architecture as follows: A global market place based on diverse e-business XML standard vocabularies communicating with XSLT for schema-to-schema translation. The location of the XSLT's brings up many issues that are outside the chapter scope, such as their versioning and maintenance, so we will assume that each XSLT is stored with each company and can be applied to any XML file that is sent to that business and has to be transformed. To prove this solution, we are now going to work through an extensive example based on e-business purchase order exchanges.
Integrated Purchase Order As we stated above, we are going to use a fictitious example (but one very common in the real world) of business purchase orders to illustrate diverse e-business framework interoperation. The following diagram shows a typical process model for an e-commerce solution.
9
Chapter 26
Search Catalog
Online Order Entry
Check Inventory
Approve Order
Delivery
Generate Purchase Order
Invoice
This is a scenario that will become common throughout online business.
Scenario Overview A high-level diagram of the scenario we will be working with is given below:
e-markeplace
internal Client Browser
Market Broker Submit PO Supplier 1
Submit PO Supplier 2 Internal Inventory Processing Application
Submit PO Supplier 3
When an employee requires some office items (such as pens, paper etc.), the purchase order is initially submitted internally and processed by the Internal Inventory Processing Application. To avoid getting into the details of supply chain management and inventory control, we are going to assume a simplified situation. The internal application will only process orders for simple items, such as pens, pencils etc. When it receives an order for something more extensive such as bulk orders for printing paper it forwards the order to an e-market, which contains two supplier sites, supplier 1 and supplier 2. In turn, orders for heavy goods, such as tables and chairs are forwarded initially to supplier site 2, who uses supplier site 3 to process the orders. In the real world, there would probably be a rather more detailed ordering process, but we are just exploring the important concepts of e-business communication rather than supply chain and inventory management issues.
10
E-Business Integration
Purchase Order Schemas and Instances An essential component to any e-business communication is a well-defined structure to the data. This is when we can make use of the various XML vocabularies defining purchase orders. Let's look in turn at each of these vocabularies we are working with and define our sample instances that will be communicated in the sample.
RosettaNet The full RosettaNet XML standard is rather large so instead let's look at some of the key elements in the XDR Schema that will be important to our sample. XDR is the type of schema that comes with the RosettaNet downloads. You can find the full XDR file in the code download. The root of the RosettaNet Purchase Order is an element called Pip3A4PurchaseOrderRequest, which contains an xmlns attribute that references the XDR Schema for validation. It has five child elements, or which the first three, PurchaseOrder, toRole and fromRole play the major roles:
The PurchaseOrder element contains the delivery information, such as the name and address where the items ordered should be sent to.
The toRole element serves as an important part of the order, containing the PartnerRoleDescription element described below:
The PartnerRoleDescription is a collection of business properties that describes a business partners' role in a partner interface process. It provides a partner classification code and contact details (in ContactInformation) for the person from the company that is purchasing the goods, such as the name, telephone number and email address. The PartnerRoleDescription identifies business partners' and their function in a supply chain.
11
Chapter 26
Finally, the fromRole element can contain much of the same information as the toRole but is based on the business selling the item(s):
The following instance is based on the RosettaNet standard that we are going to use in our sample because this is what the businesses' existing systems have been using for some years. You should look through it and understand its relationship to the RosettaNet standard, as well as areas you may consider expanding on or removing details based on your own business experiences. A section of the full file is here and you can find the file in the RosettaNet directory in the samples as the file RosettaNet_PO.xml.
Winnipeg
223746 Red Oak Drive
3rd Floor
R2J 29A
MB
CA
1234123411234
10 1 ...
12
E-Business Integration
US
19990809T14:30:00
EACH
...
Seller
[email protected]
Steven Livingstone
(0928) 839 7820
...
xCBL Let's now look at the CBL XDR schema elements that will be important to our sample instance. The BillToParty element contains information on the person to whom the delivery will be made, such as the name and address information and the contact information:
It contains the Party element, which has various types of contact information and some other related information:
13
Chapter 26
The OtherContacts element contains all the information to contact the person, whether they are the seller of the goods, or the buyer. The NameAddress collects postal information about this person. Details on how the product(s) are moved to the customer are contained in the Transport element (shown below), which contains an ID and shipment method (Air, for example):
AM FL Y
...
The OrderDetail element details the item(s) in the PO, as well as the quantity and unit item cost.
TE
So now we have discussed some of the sections of the Purchase Order xCBL instance, let's look at the instance we are going to work with in our case study. You should examine this sample – look up the meanings in the xCBL reference in the reference library downloadable from the CBL documentation site at http://www.xcbl.org/xcbl30/xcbl30xdrdoc.html.
REF-002-99-0-3000
REF002-44556677
14
Team-Fly®
E-Business Integration
20010104T09:00:00
...
InformationOnly
ResponseExpected
CAD
en
...
MyOffice International Inc. Supplies H67 76W Winnipeg
MB
Steven Livingstone
DeliveryContact
[email protected]
15
Chapter 26
EmailAddress
...
1000
Air
TransportCondition
AdvancePrepaid
...
...
16
E-Business Integration
BizTalk The BizTalk schema is divided into the header section, contact information, shipping information, billing information and finally details on the goods ordered. However, unlike the previous schemas, this schema is based mainly on attributes and it also uses the XML Data datatypes (see Chapter 6 for more information on these) to define valid types for these attributes. So, for example, consider the following statement.
This defines an attribute called qty, defining the quantity of goods ordered, and it can accept any number as defined below:
Decimal, with no limit on digits; can potentially have a leading sign, fractional digits, and, optionally, an exponent. Punctuation as in U.S. English. (Values have the same range as the most significant number, 1.7976931348623157E+308 to 2.2250738585072014E-308.) Let's look at the BizTalk schema below:
17
Chapter 26
18
E-Business Integration
Now we have seen the schema used for a BizTalk document, here is an example of an instance that will be used in our sample application.
Schema Translation As we discussed above, we must use some mechanism to allow purchase order instances derived from separate schemas to talk to each other. In other words, we must be able to send the BizTalk instance above to a business that works in with RosettaNet formatted instances and seamless processing occurs and the order works as expected. Now, to do this, we have to have some way of mapping information within the BizTalk schema to information containers within the RosettaNet schema. So, for example, if we specify the contact information for the delivery of the items in the BizTalk schema, we must be able to specify that same information correctly within the RosettaNet schema. In other words, we must have some way of transforming the contents of the BizTalk POShipTo element to the equivalent within the ContactInformation element of the RosettaNet schema. This must occur for as much information as possible to satisfy the necessary requirements for each schema. Therefore, we must define some mapping between the elements and attributes of each schema – the diagram below illustrates this:
19
Chapter 26
Now, let's look at mapping the elements of each schema so we can interoperate between schemas.
Mapping Schema Elements Now we are going to look at how to map the elements and attributes of the Purchase Order schemas we are going to look at in our sample to each other. First we will look at RosettaNet to BizTalk, followed by BizTalk to ebXML and finally RosettaNet to xCBL.
XSLT Modules To perform the translation from a given vocabulary (such as BizTalk) to another standard (such as xCBL) we are going to use XSLT. For each translation, a separate XSLT "module" will be created, which will contain the code needed to transform the schema. We can therefore build up a set of XSLT modules that can be applied to one XML instance based on a standard to get another instance based on another standard (for example, to get a BizTalk XML instance from a xCBL defined instance). Mapping the schema elements can be a difficult and time-consuming job. Understanding the relationships can also be difficult. The method I use for determining the mapping between elements involves creating a table using the instance we want to get to. This is accomplished by creating a structural hierarchy that helps us define the XPath queries we will need in the XSLT transform, by mapping the values of elements back to the source instance. First you should create a table with two columns. The title of the left column should contain the name of the originating schema (that is, the schema that you are wanting to transform) and the title of the right column should be the title of the result schema (the schema you are transforming TO). Now, take an instance of the schema you are mapping to (the result). Start from the root of the instance and work down each branch of the tree and, in the right column, write down the XML to get to each node that is either an attribute or element and should have a value (very much similar to how you would define an XPath query). So, you should start at the root – everything below it should be RELATIVE to the root UNTIL you reach the end of a particular branch, in which case you start at the root again (in my case, I mark each root with a grey cell background color). For those elements or attributes that should have a value, put an equals sign before them. The following diagram illustrates this concept.
20
E-Business Integration
. . .
RosettaNet Elements
=FreeFormText
=FreeFormText
Let's look at an example of what we have just done for the case of wanting to translate from an xCBL PO schema to RosettaNet PO schema xCBL Elements
RosettaNet Elements
=FreeFormText
=FreeFormText
=GlobalLocationIdentifier
So, in the first case, we have the XML from the root node
and in the following cell, we have
=FreeFormText
which is relative to the element in the first grey cell above and actually refers to the FreeFormText element in the Pip3A4PurchaseOrderRequest schema as below:
21
Chapter 26
Glasgow ...
Therefore, the text in the cell at the bottom is relative to the node in the first grey cell above it and so the full XML is:
82823 ...
This works the same for the entire column and if you write out each column at the end, it should match your XDR schema. Once the right-hand column has been completed, it is necessary to enter some information in the lefthand column (the source schema) to describe what element(s) map to the element in the result schema. This works in the exact same way as described above, except the source schema does NOT have grey cells highlighted. So, the above schema mapping would look like this: xCBL Elements
RosettaNet Elements
=City
=FreeFormText
=StreetSuppliment1
=FreeFormText
=Ident
=GlobalLocationIdentifier
22
E-Business Integration
Finally, there are some cases where there is no mapping and a default value is provided for the field. To do this, enclose the value in square brackets, such as [Scotland]. You may have another method for doing this work and there are some tools coming out that help you with the process, such as the Microsoft BizTalk Mapper tool, which is a graphical editor that loads the two XML specifications and allows the programmer to specify how the records and fields map to one another. The following tables show how this technique is used to map several schemas that will be used in our sample. Don't rush through them, but work through them and see how they map to the actual schemas we have defined above. You should pick up the technique fairly quickly, and get great benefit from it.
xCBL to RosettaNet xCBL Elements
RosettaNet Elements
=City
=FreeFormText
=StreetSuppliment1
=FreeFormText
=StreetSuppliment2
=FreeFormText
=PostalCode
=NationalPostalCode
=RegionCoded
=FreeFormText
=Country
=GlobalCountryCode
=Ident
=GlobalLocationIdentifier Table continued on following page
23
Chapter 26
xCBL Elements
RosettaNet Elements
=QuantityValue
=ProductQuantity
=LineNumber
=BuyerLineItemNum
=PartID
AM FL Y
=ProprietaryProductIdentifier =GlobalPartnerClassificationCode
TE
[shopper]
=CountryCoded
=GlobalCountryCode
24
Team-Fly®
E-Business Integration
xCBL Elements
RosettaNet Elements
=StartDate
= UOMCoded
=GlobalProductUnitOfMeasureCode
=TransportModeCoded
=FreeFormText
= UnitPriceValue
< FinancialAmount>
=CurrencyCoded
< FinancialAmount> =GlobalCurrencyCode
=DateStamp
=MonetaryAmount
Table continued on following page
25
Chapter 26
xCBL Elements
RosettaNet Elements
=ShippingMethodOfPaymentCoded
=GlobalShipmentTermsCode
[1]
=RevisionNumber
= GlobalFinanceTermsCode
=ContractTypeEncoded []
=PartnerDescription
[Dropship]
=GlobalPurchaseOrderTypeCode
[Seller] (is defined as this in document)
=GlobalPartnerRoleClassificationCode
=GlobalPartnerRoleClassificationCode
[Buyer]
=ContactNumberValue
26
=EmailAddress
E-Business Integration
xCBL Elements =ContactName
RosettaNet Elements
=FreeFormText
= ContactNumberValue
=CommunicationsNumber
=OrderRequestIssueDate
=SellerOrderRequestNumber
=ProprietaryDocumentIdentifier
[Request]
=GlobalDocumentFunctionCode
=DateTimeStamp
RosettaNet to BizTalk RosettaNet Elements
BizTalk Elements
[0]
=refPromise
[]
=description
= GlobalFinanceTermsCode
=paymentType
=FreeFormText
=shipType
=GlobalLocationIdentifier
=fromCust
=ProprietaryDocumentIdentifier
=PoNumber
Table continued on following page
27
Chapter 26
RosettaNet Elements
BizTalk Elements
=contactName =EmailAddress =telephoneNumber
=contactName =contactEmail =contactPhone
=city
=FreeFormText
=contactName
=attn
=GlobalCountryCode
=country
=startProvince
=FreeFormText
=street2
=FreeFormText
=street1
=FreeFormText =NationalPostalCode
=postalCode
[1]
=startAt
[1]
=count
28
E-Business Integration
RosettaNet Elements
BizTalk Elements
=GlobalProductUnitOfMeasureCode
=uom
=MonetaryAmount
=unitPrice
=ProductQuantity
=qty
=partno
=ProprietaryProductIdentifier
=DateStamp
=needAfter
[0]
=discount
=LineNumber
=line
=DateStamp
=needBefore
Business Rules in Translation When looking at the schema Translation methodology above, you probably noted that the format of some values in the xCBL schema is slightly or completely different from the values in the RosettaNet schema. The same would apply for conversions to all of the other schema types as well. Let's look at how we can approach and solve these challenges.
The Need for Business Rules Between Schemas As we previously mentioned, there are many cases when mapping between schemas in which data types between mapped elements differ. This is best illustrated with an example. Let's consider the element of the RosettaNet schema and the element of the BizTalk schema.
29
Chapter 26
The RosettaNet schema species that the format of the date within the element is:
Based on the ISO 8601 specification. The "Z" following the day identifier (DD) is used to indicate Coordinated Universal Time. Informal format: CCYYMMDDThh:mm:ssZ So, in other words, we can specify Wednesday 10 March at 2:30 PM at US Eastern Standard Time
as 20010310T14:30:00Z-05:00
So CC is the century (20), YY is the year (01), MM is the month (03) and DD is the day (10). The T represents the Time in hours, minutes and seconds and the Z is used to specify the time zone, which is – 5 hours in our example. However, the format of the BizTalk date is CCYY-MM-DD Or, as in our example, 2001-03-10
This illustrates a key point that sometimes information can be discarded between translations (as we see the time is lost in this conversion). What it does show, however, is that sometimes some manipulation is required to reformat a particular datatype for a given schema. We are going to look at how to work with the date type and unit of measurement codes for the different schemas we have be using. We are NOT going to look at others, as converters for every type are out of scope for this chapter – however, with this understanding and some simple scripting, you should be able to write rules for all of your XML business schema needs. In the case study, we use JavaScript (because it's simple and most scriptors are familiar with it) although you could use any other script of your choice, and depending on your choice of XSL Parser (we work with the Microsoft XML Parser) you could use COM objects or Java objects. Furthermore, using this technique, you can create business objects containing all the rules and reuse the objects within the relevant schemas (for example a BizTalk object may convert other formats to the BizTalk format). Many of the examples you will come across could also be done in XSLT, but these are simple examples. Often you may have very complex rules when translating. For example, it would be better to call out to a business object to determine more complex statements. For instance, some of the standards use very large enumerations and rather than having 1000 xsl:if comparisons, you can call out to a business object that may use an external source (such as a database) to get the result. Remember this as you look through the examples.
30
E-Business Integration
The following table summaries the uses and relationships of these types within our schemas. xCBL
RosettaNet
BizTalk
Element
Element
Element
Format CCYYMMD DZ
Format CCYYMM-DD
Example 20010310
Example 2001-0310
Element
Element
Format CCYYM MDDZ Example 20010310 Element
Format BX (box), DZ (Dozen) etc… Example BX
Format Each, Dozen etc…
Format PC or UNIT Example PC
Example EACH
The JavaScript code below shows how to modify the datatype for the xCBL to RosettaNet transformation. Note that it illustrates the concepts as outlined in this case study and does not cover all the types. This method is used to convert between the definition of a Unit Of Measurement between the xCBL schema and the RosettaNet schema. If you look at these definitions in the relevant specifications, you will see they differ greatly and the list is very extensive. For the purposes of this application, we map only the "BOX" (BX) unit or define it as "Dozen". If you look at the code, you can see that when the measurement unit is "BX"; the return value is "EACH", which is the equivalent in the RosettaNet world. function xCBL2RosettaNet_UOM(str_UOM) { //you should increase this to cover every type var str_BizUom=""; if (str_UOM=="BX") str_BizUom="EACH"; else str_BizUom="Dozen"; return(str_BizUom); }
You should enhance this code for your own particular business application to cover the many other measurement types they have. Similar code is used for the RosettaNet to BizTalk conversion, with the addition of the date format function:
31
Chapter 26
function RosettaNet2BizTalk_Date(d_RNet) { // Returns date in CCYYMMYY format // e.g. 2001-03-10 var var var var
d_Rnet = new Date(d_RNet); d_year=d_RNet.substr(0,4); d_month=d_RNet.substr(4,2); d_day=d_RNet.substr(6,2);
return(d_year+"-"+d_month+"-"+d_day); } function RosettaNet2BizTalk_UOM(str_UOM) { //BizTalk says PC or UNIT var str_BizUom=""; //you should increase this to cover every type if (str_UOM=="EACH") str_BizUom="PC"; else str_BizUom="UNIT"; return(str_BizUom); }
The date format function simply takes in the date in the RosettaNet format and splits it up and formats it as a BizTalk expected date format.
Solution Architecture In addition to the downloads cited at the start of the chapter, you should download the useful IE XML Validation Tool from http://msdn.microsoft.com/xml, which allows you to view the XSL Output from an XSLT transform. Now that we understand the business processes behind integrating diverse e-business systems, we are going to look at how we actually implement this and the architecture of the solution we employ. We will see how we can effectively employ XSL transforms and combine this with the mapping procedure and business logic we worked on in the two sections above. The following diagram demonstrates the complex marketplace we have in our scenario.
32
E-Business Integration
e-markeplace
internal Client Browser (xCBL)
XSLT (xCBL to RosettaNet)
RosettaNet Market Broker
BizTalk
Submit PO (HTTP POST)
RosettaNet xCBL Processing Application
Submit PO (when out of stock)
Submit PO (when out of stock)
ebXML
The internal businesses involved have a set of internal processes, which work based on the xCBL standard. However, when they are out of stock of a particular item, they work directly with their online marketplace, which happens to work to the RosettaNet standard. Hence, any purchase orders sent to this marketplace must be converted using one of the XSLTs we will define later. This marketplace works mainly with RosettaNet based businesses, but in order to support as wide a range of businesses as possible, it also supports the BizTalk standard. PO's are also converted to the BizTalk standard when necessary, and posted to the relevant recipients. Finally, there is one business in the marketplace that has a backup supplier, which operates outside the market place using the ebXML standards. PO's to this business must be converted accordingly during processing. Now we must look at these XML transformation scripts and how we define them.
Integration Practice You will recall that we have defined all the mappings between the elements in the different schemas above. As a result we can now easily create our own XSLTs that will perform the actual transforms. This is because in the right hand column of the mapping tables we defined earlier, we followed the branching of the XML instance for the result schema (RosettaNet in the first case and BizTalk in the second case). As we move through the schema (and hence down the right hand side of the table), we can quickly look at the corresponding mapping entry in the left of the table and get the information needed to perform successful transformations. This is illustrated below.
xCBL Elements
At the supplier location, they will want to view the information and see what the request entails. To do this, ShowPurchaseOrder.asp is used – the code is shown below. Initially the order information stored in the XML in our Orders directory is loaded – here we know the order id we have been working with, but it is likely that either a production version would iterate through the Orders directory to show all orders that have been sent and allow the user to choose one, or an advanced queuing system (using something like MSMQ or IBM MQSeries) could be used to allow the user to view each order on the queue.
Finally, an order page is displayed to the user using the variable values we defined above.
Internal Inventory Purchase Order Request
Order # Contact Name :
Email :
Tel :
Shipping/Billing Information Attn :
Street 1 :
street 2 :
Postal Code :
City :
State/Province:
Country :
Shipped by
Payment Method : Order Details Part :
Quantity :
Price : Delivery Date :
46
E-Business Integration
The final purchase order screen looks like this:
This sample illustrates only one small part of the potentially huge set of sub-processes involved in the process flow of a business transaction. This sample can be built upon to provide advanced functionality such as business-to-business invoice and shipping processes.
Summary This chapter introduced you to some of the e-business standards that are either being used now, or created on the Internet. It was not intended to give you detailed information on the framework implementations themselves, as there are many resources you can use to understand this (see "Important URLs"). Rather, it was to illustrate how these diverse and changing frameworks can be brought together to enable businesses to integrate their e-business processes.
47
Chapter 26
Using the methods explored in this chapter, you should be able to develop an understanding of how to integrate your e-business systems with partners systems and access the full potential of electronic marketplaces.
Important URLs What follows is a list of useful online resources for those wishing to find out more about the different standards discussed in this chapter, or download some of the examples we mentioned.
RosettaNet Homepage http://www.rosettanet.org Manage Purchase Order Downloads http://www.rosettanet.org/rosettanet/rooms/displaypages/layoutinitial?containe r=com.webridge.entity.Entity[OID[6AB37A9DA92DD411842300C04F689339]] RosettaNet Implementation Framework http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/LayoutInitial?Containe r=com.webridge.entity.Entity[OID[AE9C86B8022CD411841F00C04F689339]]
xCBL Homepage http://www.xcbl.org xCBL 3.0 Download http://www.xcbl.org/xcbl30/xcbl30.html
BizTalk Homepage http://www.biztalk.org BizTalk PO download http://www.biztalk.org/Library/library.asp BizTalk Framework Resources http://www.biztalk.org/Resources/resources.asp
ebXML Homepage http://www.ebxml.org ebXML Message Specification http://www.ebxml.org/ ebXML Technical Specification http://www.ebxml.org/
48
B2B Futures: WSDL and UDDI
Let's Work Together It's become a commonplace to say that the Internet has changed everything. HTML and the World Wide Web have completely revolutionized the way in which many of us live, work and organize our lives. The way in which the Web has been used so far has been mainly for interactions between users on the one side and computers on the other. The next step, however, is to get those computers to talk to each other in a sensible, structured way, so that program can talk to program, application to application, and, finally, business to business, so that we really do ultimately achieve "Business at the Speed of Thought". This chapter is about two emerging technologies that are looking to make all this possible. The first of these is the Web Services Description Language, or WSDL. This is a way of formalizing the description of services provided by a system so that another remote system can use those services. The second of these is Universal Description, Discovery and Integration, or UDDI. This provides a framework for the registration and discovery of these services. Both of these initiatives have the full backing of IBM, Microsoft, and e-commerce specialists Ariba, among others, and in this chapter we're going to see completely independent sets of software from both IBM and Microsoft inter-operating with each other. We can't stress how significant all of this is for the future of systems development. If all goes according to plan, this may well constitute the second phase of development of the Web – the semantic web, as Tim Berners-Lee likes to call it. This chapter inevitably involves a certain amount of programming; in particular, we're going to be using Microsoft Visual Basic version 6, and Java (JDK version 1.3). Familiarity with one or other of these is therefore an essential requisite for this chapter.
Chapter 27
WSDL We're going to start by looking at how WSDL works. For the purposes of this section, we're going to need access to Microsoft's SOAP Toolkit Version 2.0 and IBM's Web Services Toolkit. The former is available as a free download from http://msdn.microsoft.com/downloads/default.asp?URL=/code/sample.asp?url=/MSDNFILES/027/001/580/msdncompositedoc.xml, while the latter can be downloaded from http://www.alphaworks.ibm.com/tech/webservicestoolkit (incidentally, it's better to download this in preference to the WSDL toolkit, as this package contains the WSDL toolkit plus a whole lot else; however, you should be aware that there are 15 MBytes of it). The Microsoft toolkit can only be run on Windows systems, while the IBM system is Java based, and so can, in theory, be executed anywhere. Also, it must be noted that the Visual Basic Runtime is required if Visual Basic 6 is not installed. There is one final word of warning before we get started. At the time of writing, this is all seriously cutting edge stuff. The examples we see here were put together using Beta 2 of the Microsoft SOAP Toolkit and version 2.2 of the IBM Web Services Toolkit. By the time this book is printed, it is quite possible that what we see when we run the various tools will be different from the illustrations in this chapter. Some of the workarounds described in the text may no longer be necessary. It is also even possible that some new ones will be required instead. However, the basic principles will remain unchanged.
WSDL: The Theory There are two ways of going about introducing WSDL. There is the theoretical approach, where we discuss the entire language in abstract, before launching into an example, or there is the pragmatic approach, where we cover as little of the theory as we can get away with before launching headlong into the practical stuff. The author's preference is for the latter, and it's especially appropriate in this case, because the real meaning of WSDL doesn't really become apparent until we begin to play with it in earnest. However, much as we'd like not to, we need to go over at least some of the theory. This short section covers as much as we need to get started. WSDL has an XML format for describing web services. Version 1.0 of the protocol was published on September 25, 2000. WSDL describes which operations can be carried out, and what the messages used should look like. Operations and messages are described in abstract terms, and are then implemented by binding to a suitable network protocol and message format. As at version 1.0, bindings for SOAP 1.1, HTTP GET/POST, and MIME have been defined. In this chapter, we will be concentrating on SOAP, as this binding is available in both IBM and Microsoft implementations. This is very much an evolving technology, and future versions are expected to define frameworks for composing services and describing the behavior of services. If you're familiar with the Interface Description Language (IDL) used by the likes of COM and CORBA, then you can think of WSDL as like a highly generic IDL, rewritten in XML. Each WSDL file describes one or more services. Each service consists of a group of ports, where each port defines an endpoint that can be accessed by a remote system. Readers who are familiar with TCP/IP sockets will recognize the use of the term "port" for a network endpoint, although here we are referring to more than a simple numeric identifier. A port belongs to a particular port type, and has a particular binding (SOAP or another type), which specifies the address that the remote system must use to access the port. Each port type defines a number of operations that may be carried out, each of which may have either an input message and an output message, or both, depending on the type of the operation. Each message is defined in terms of a number of parts. Each part is defined in terms of a name and a type. All types are defined, either by reference to a particular schema, or by local definition.
2
B2B Futures: WSDL and UDDI
So we can summarize the function of the WSDL file by saying that it defines: ❑
Where the service is implemented
❑
What operations the service supports
❑
What messages need to be passed between the client and server for those operations
❑
How the parameters for that message are to be encoded
This is how it looks in practice:
Each of the WSDL elements is defined in a section of the WSDL file, and we'll be looking at an example shortly. However, now it's time for a practical demonstration.
Generating WSDL In practice, very few people will want to hand craft WSDL. The most likely approach to developing a web service is that a developer will put together a web service component using one of the standard techniques (COM, Java, and of course, .NET), and then use a tool to generate the associated WSDL; and that's exactly what we're going to do. In fact, we're going to do it twice: once the Microsoft way (COM), and once the IBM way (Java). There are actually plenty of good reasons why we might want to start with the WSDL, but we'll look at that a little later on in the chapter, in the section titled "Chickens and Eggs".
WSDL from COM Before we get started, we're going to need a COM component, written in Microsoft Visual Basic 6. If Visual Basic isn't available, don't worry, because the COM DLL that we're going to develop is part of the code download from the Wrox site. However, we will need to register it, as so: > regsvr32 ArithServer.dll (Incidentally, on Windows 98, regsvr32.exe is located in c:\windows\system, in case the path doesn't include this.) Having done that, we can skip the next section, where we discuss how the COM object is developed.
3
Chapter 27
Developing COM objects under Visual Basic is relatively straightforward. We'll start by creating a new Visual Basic project, as type "ActiveX DLL". Save this project as ArithServer.vbp. Now create a new class module in the project, called Arithmetic.cls. Add the following code to this class module: Option Explicit Function Square(ByVal InValue As Double) As Double Square = InValue * InValue End Function Function SquareRoot(ByVal InValue As Double) As Double SquareRoot = Sqr(InValue) End Function Function Capitalize(ByVal InString As String) As String Capitalize = UCase(InString) End Function
In the grand tradition of these things, we have chosen a somewhat simple example: an arithmetic server, which offers the calling application three functions: ❑
Square – which takes a single double value as its input, squares it and returns the result as its double output
❑
SquareRoot – which takes a single double value as its input, takes its square root and returns the result as its double output
❑
Capitalize – which takes a string as its input, capitalizes it and returns the capitalized string as its output.
There is a reason for keeping the server simple, by the way. It will be tricky enough as it is to see the wood for the trees once the WSDL starts to emerge, and the simpler our basic application is, the better. Once we have set up our code, all we need to do is compile it, and we have our ArithServer COM object ready for web delivery.
wsdlgen The Microsoft WSDL generation tool provided with Version 2 of the SOAP Toolkit is called wsdlgen. Let's try it out on ArithServer.dll. This is what we see when we execute it:
4
B2B Futures: WSDL and UDDI
If you don't have Visual Basic installed, this command won't work. The wsdlstb.exe command line utility will have to be invoked instead. Moving on, in the next dialog, we need to choose a name for our web service (we've chosen "Arithmetic"), and then browse to locate our COM
Next, we choose which of the services we wish to expose. We expose all of them:
5
Chapter 27
Next, we select where the SOAP listener is going to be located (see Chapter 24). Incidentally, the Microsoft documentation tends to refer to Internet IIS throughout; however, be assured that Personal Web Server works quite satisfactorily as well, provided that we stick to using ASP rather than ISAPI for the listener type
AM FL Y
Finally, all we need to do is decide where our generated files are going to be located. For the time being, we'll co-locate them with the server DLL. However, we may subsequently move them around.
TE
Notice that is says "generated files" in the above paragraph. This is because Microsoft have introduced a little extra something themselves, called WSML. We'll take a look at what that does in a little while.
6
Team-Fly®
B2B Futures: WSDL and UDDI
And we're done:
So What Did wsdlgen Do For Us? Let's look at what we've got. There are basically three things that have been generated: ❑
An Active Server Page file, Arithmetic.asp
❑
A WSDL file, Arithmetic.wsdl
❑
A WSML file, Arithmetic.wsml
We'll deal with each one in turn.
The ASP File Arithmetic.asp is the Active Server Page that will drive our SOAP server. All SOAP requests for Arithmetic will be directed towards this. Most of the code is to do with setting up and trapping errors, so we've highlighted the lines that do the real work:
java com.ibm.wsdl.Main -in Arithmetic.WSDL This is the point at which we encounter our first problem: >> Transforming WSDL to NASSL .. >> Generating Schema to Java bindings .. >> Generating serializers / deserializers .. Interface 'wsdlns:ArithmeticSoapPort' not found. The IBM generator does not like the wsdlns namespace in front of the port name in our WSDL file. However, it's not necessary in this particular case, so we can safely remove it:
Our next attempt gets a little further: >> Transforming WSDL to NASSL .. >> Generating Schema to Java bindings .. >> Generating serializers / deserializers .. >> Generating proxy .. Created file E:\JPA\Books\Pro XML 2e\Java ArithClient\ArithmeticSoapPortProxy.ja va Call to extension function failed: method call/new failed: java.lang.reflect.Inv ocationTargetException target exception: java.lang.IllegalArgumentException: The name attribute of all type elements must be namespace-qualified. I was unable to convert an object from java.lang.Object to null. And so on. Confusingly, what we now have to do is remove that wsdlns namespace from the messages as well, therefore:
21
Chapter 27
Now we get this: >> Transforming WSDL to NASSL .. >> Generating Schema to Java bindings .. >> Generating serializers / deserializers .. >> Generating proxy .. Created file E:\JPA\Books\Pro XML 2e\Java ArithClient\ArithmeticSoapPortProxy.java No mapping was found for 'http://www.w3.org/2000/10/XMLSchema:string'. No mapping was found for 'http://www.w3.org/2000/10/XMLSchema:string'. No mapping was found for 'http://www.w3.org/2000/10/XMLSchema:string'. I was unable to convert an object from null to java.lang.Object. I was unable to convert an object from java.lang.Object to null. And so on. The problem now is that the IBM generator doesn't recognize the 2000 schema, so we need to revert to the 1999 model: xmlns:xsd='"http://www.w3.org/1999/XMLSchema"
This time, it works, and we have a Java proxy class, ArithmeticSoapPortProxy.java. Not only that, but the IBM generator will have compiled it for us, too. Now we need a client. As with our VBScript client, we are going to make it very simple. Here is our Java client, ArithClient.java: import ArithmeticSoapPortProxy; public class ArithClient { public static void main (String[] args) throws Exception { ArithmeticSoapPortProxy arith = new ArithmeticSoapPortProxy (); System.out.println (arith.Capitalize ("Hello world")); System.out.println (arith.Square (8)); System.out.println (arith.SquareRoot (4)); } }
They come much simpler than that, really. This is how we compile it: > javac ArithClient.java So let's see what happens when we run it:
22
B2B Futures: WSDL and UDDI
Exception in thread "main" [SOAPException: faultCode=SOAP-ENV:Client; msg=No Des erializer found to deserialize a ':Result' using encoding style 'http://schemas. xmlsoap.org/soap/encoding/'.; targetException=java.lang.IllegalArgumentException : No Deserializer found to deserialize a ':Result' using encoding style 'http:// schemas.xmlsoap.org/soap/encoding/'.] at org.apache.soap.rpc.Call.invoke(Call.java:244) at ArithmeticSoapPortProxy.Capitalize(ArithmeticSoapPortProxy.java:50) at ArithClient.main(ArithClient.java:8) We encountered this before in Chapter 24, when we first looked at SOAP, so we will not go into it in any detail here. Suffice to say that we need to make the following couple of small changes to the generated Java class: import import import import import import import
java.net.*; java.util.*; org.apache.soap.*; org.apache.soap.encoding.*; org.apache.soap.rpc.*; org.apache.soap.util.xml.*; org.apache.soap.encoding.soapenc.StringDeserializer;
And: public ArithmeticSoapPortProxy() throws MalformedURLException { call.setTargetObjectURI("http://tempuri.org/message/"); call.setEncodingStyleURI("http://schemas.xmlsoap.org/soap/encoding/"); this.url = new URL("http://laa-laa/soaplisten/Arithmetic.ASP"); this.SOAPActionURI = "http://tempuri.org/action/Arithmetic.Capitalize"; StringDeserializer oStrDeserializer = new StringDeserializer (); SOAPMappingRegistry smr = new SOAPMappingRegistry (); smr.mapTypes ("http://schemas.xmlsoap.org/soap/encoding/", new QName ("", "Result"), null, null, oStrDeserializer); call.setSOAPMappingRegistry (smr); }
We compile this as follows: > javac ArithmeticSoapPortProxy.java This is what we see when we run our client: HELLO WORLD Exception in thread "main" [SOAPException: faultCode=SOAP-ENV:Server; msg=WSDLRe ader: The operation requested in the Soap message isn't defined in the WSDL file . This may be because it is in the wrong namespace or has incorrect case] at ArithmeticSoapPortProxy.Square(ArithmeticSoapPortProxy.java:126) at ArithClient.main(ArithClient.java:10) Well, the first line is looking good. However, this is the point at which we realize that our interoperability is currently limited to services containing a single method, unless we want to do some severe hacking of our WSDL file. Look at this unusual line in the generated Java class: this.SOAPActionURI = "http://tempuri.org/action/Arithmetic.Capitalize";
23
Chapter 27
This effectively pins us down to Capitalize for the time being. Moreover, if we want to change to Square, for instance, we also need to change our deserializer, like this: //import org.apache.soap.encoding.soapenc.StringDeserializer; import org.apache.soap.encoding.soapenc.DoubleDeserializer;
And: public ArithmeticSoapPortProxy() throws MalformedURLException { call.setTargetObjectURI("http://tempuri.org/message/"); call.setEncodingStyleURI("http://schemas.xmlsoap.org/soap/encoding/"); this.url = new URL("http://laa-laa/soaplisten/Arithmetic.ASP"); // this.SOAPActionURI="http://tempuri.org/action/Arithmetic.Capitalize"; this.SOAPActionURI = "http://tempuri.org/action/Arithmetic.Square"; // StringDeserializer oStrDeserializer = new StringDeserializer (); DoubleDeserializer oDblDeserializer = new DoubleDeserializer (); SOAPMappingRegistry smr = new SOAPMappingRegistry (); // smr.mapTypes ("http://schemas.xmlsoap.org/soap/encoding/", // new QName ("", "Result"), null, null, oStrDeserializer); smr.mapTypes ("http://schemas.xmlsoap.org/soap/encoding/", new QName ("", "Result"), null, null, oDblDeserializer); call.setSOAPMappingRegistry (smr); }
We had also better change our client: import ArithmeticSoapPortProxy; public class ArithClient { public static void main (String[] args) throws Exception { ArithmeticSoapPortProxy arith = new ArithmeticSoapPortProxy (); // System.out.println (arith.Capitalize ("Hello world")); System.out.println (arith.Square (8)); // System.out.println (arith.SquareRoot (4)); } }
When we run it, we get this result: 64.0 Therefore, we have at least established that we can handle doubles with WSDL. Let's just pause here for a moment and consider what we've achieved here. We have managed to get two completely disparate systems, without a single line of common code, to talk to each other as if they were instead working as a single program, simply by formalizing the interactions between them in a standard, common, WSDL file. I think that is impressive, by anyone's standards. Also, it must be noted that once we are past the beta stage in these technologies, all of these little glitches should be ironed out and it will work seamlessly. As the chapter title suggests, this gives you a glimpse of the future.
24
B2B Futures: WSDL and UDDI
Chickens and Eggs Here is an interesting question. Which came first the WSDL or the code? Up until this point, we have taken a highly pragmatic approach to WSDL. We took an operational server, derived a WSDL file from it, and used that file to generate a suitable client. This isn't the only way to do it, however. There is another school of thought (the one that prefers not to believe in the concept of the executable specification) that says, actually, WSDL is a neat design tool. It is highly rigorous, platformindependent, concise, and defined using XML – so there are plenty of tools available for manipulating it, as well. The feeling is that for the present, this is not a viable option, because there are a couple of factors against it. Firstly, there currently aren't any commercially available graphical tools for building definitions, and handcrafting WSDL is just too tedious and time-consuming. Secondly, there is currently no Microsoft tool to generate code from WSDL. However, it is unlikely that either of these will remain true for very long, and perhaps future developments will be WSDL-driven, rather than code-driven.
UDDI OK, so we know how to talk to each other. However, that's not an enormous amount of help if we don't actually know of each other's existence. Now we could try a web search to find out the kind of service that we're looking for, and then hunt around on their web site for a suitable WSDL file. What we really need is a kind of dating agency, where we can enter the details of our required business partner, run a structured search, and come up with the details of what they do and who to contact there. If we're really lucky, we might even get hold of their WSDL file without having to search for it. It so happens that such an agency is being built, although it goes under the more sober title of the Universal Description, Discovery and Integration Registry. The parties involved are Ariba Inc., IBM, and Microsoft. All three have their own trial registry implementations, available on the World Wide Web, while the latter two provide toolkits to access the registries. In this half of the chapter, we are going to try out both APIs on all three registries. We're going to start by making inquiries on registries and then we're going to publish to the registries ourselves. The UDDI protocol is SOAP 1.1 based. However, we have no need to understand or even care about the underlying details, as the APIs nicely wrap all of this up for us. Before we get programming, though, we'd better get to grips with some of the concepts involved.
UDDI Concepts The UDDI business registry is described as being a cloud, wherein a business can be registered once, but published everywhere. What kind of information is held in this registry? The core element of information is the businessEntity. This contains general information about the business (contact names and addresses and so on), as well as businessService elements. Each business service contains descriptive data as well as a number of technical web service descriptions, or bindingTemplate elements. The binding templates contain the information needed to actually invoke a web service. The key part of this is called a tModel; this is metadata about a specification, containing its name, details of who published it and URL pointers to the actual specification. This specification could, of course, be in the form we have been discussing in the first part of this chapter: a WSDL file
25
Chapter 27
This diagram shows how these elements are related:
business Entity
binding Service Model binding Template
The rest of this chapter is going to look at how we can use the UDDI APIs to make simple inquiries of and publish to the various publicly available registries. We're not actually going to touch on much of the elements involved in a registry entry; once we've established how to access and update a simple part of the business entity (its description), we can extend that process to any other part of the structure.
UDDI Inquiries
AM FL Y
Before we can make any inquiries on any of the registries, we need to set up some test data. Since we need to be registered as publishers for the next section (because we're going to be publishing business information as well as making inquiries), we might as well do that now. This is quite a straightforward process, and all we need to do is go to each web site in turn and follow all the steps to register ourselves. The steps involved are different in each case, but they are quite self-explanatory, so we won't waste space by discussing them further here. The sites in question are: ❑
Ariba: http://uddi.ariba.com
❑
IBM: http://www.ibm.com/services/uddi
❑
Microsoft: http://uddi.microsoft.com
TE
Once we've registered ourselves, then we can go on and add a business. In the case of IBM, use the test registry, rather than the live one; in the other two cases, we don't get a choice. For the time being, just create a business with a name and a simple description. For the purposes of this chapter, we created a business called "WroxDemo". We will also need the APIs. The IBM UDDI API is called uddi4j (UDDI for Java), and this is included as part of the Web Services Toolkit that we loaded in the first part of this chapter. You should make sure that the uddi4j.jar archive is in the Java CLASSPATH. The Microsoft API can be found at http://msdn.microsoft.com/downloads/default.asp?URL=/code/sample.asp?url=/msdnfiles/027/001/527/msdncompositedoc.xml. Both APIs are essentially the same, although implemented using different technologies (Java for IBM, COM for Microsoft). The Microsoft version, as we shall see, exposes a little more of the underlying SOAP technology than IBM; however, most of the calls are functionally identical.
26
Team-Fly®
B2B Futures: WSDL and UDDI
Inquiries the IBM Way We will start with a very simple application to get a feel for what we're doing. We'll code it up as a single Java class using command line I/O, called UDDIClient.java. Here is how the code starts, importing the necessary UDDI Java classes: import import import import import
com.ibm.uddi.client.UDDIProxy; com.ibm.uddi.response.BusinessList; com.ibm.uddi.response.BusinessInfos; com.ibm.uddi.response.BusinessInfo; java.util.Vector;
Here's the main() method: public class UDDIClient { public static void main (String[] args) throws Exception { byte input[] = new byte[128]; System.out.print ("Search for? "); int nRead = System.in.read (input, 0, 128); String strSearch = new String (input, 0, nRead - 2); System.out.print ("Registry? "); nRead = System.in.read (input, 0, 128);
All we're doing here is getting a company name to search for, plus an identifier for the registry that we're going to search in (A for Ariba, I for IBM, and M for Microsoft). Let us look a bit further: String strRegistry; String strURL; switch (input[0]) { case 'A': strRegistry = "Ariba"; strURL = new String("http://uddi.ariba.com/UDDIProcessor.aw/ad/process"); break; case 'I': strRegistry = "IBM"; strURL = new String( "http://www-3.ibm.com/services/uddi/testregistry/inquiryapi"); break; case 'M': strRegistry = "Microsoft"; strURL = new String("http://test.uddi.microsoft.com/inquire"); break; default: System.out.println ("Invalid registry specified - exiting"); return; } System.out.println ("Searching for at " + strRegistry + " ...");
27
Chapter 27
Here, we're setting up the URL of the registry that we're going for. Notice that in the case of IBM, we have specified the test registry (http://www-3.ibm.com/services/uddi/testregistry/inquiryapi) rather than the live registry (http://www-3.ibm.com/services/uddi/inquiryapi). Now comes our first glimpse into the API: UDDIProxy proxy = new UDDIProxy(); proxy.setInquiryURL (strURL);
The UDDIProxy object is our way of talking to the registry. Every interaction with the registry is done via this object. It needs one URL for inquiries, and another one for publishing; however, we won't need the second one yet. It's time to carry out our search: BusinessList list = proxy.find_business (strSearch, null, 10);
At this stage, I've chosen a very simple partial string match search, and I've limited the number of hits to 10, as experience has shown that specifying an unlimited number of hits (-1) can result in an indefinite wait. Once we have a response, we can extract the information that we need: BusinessInfos infos = list.getBusinessInfos (); Vector vInfo = infos.getBusinessInfoVector (); int nInfo = vInfo.size(); System.out.println (nInfo + " items found:"); for (int iInfo = 0; iInfo < nInfo; iInfo++) { BusinessInfo info = (BusinessInfo) vInfo.elementAt (iInfo); System.out.println (info.getNameString ()); Vector vDesc = info.getDescriptionStrings (); int nDesc = vDesc.size (); for (int iDesc = 0; iDesc < nDesc; iDesc++) System.out.println ((String) vDesc.elementAt (iDesc)); } } }
A BusinessInfo object contains all the information for a single business, so all we're doing here is extracting the name and description for each of the businesses found by our search. In practice, it is only possible to enter a single description via any of the UDDI web sites, so we will only manage to extract one. Let us see what happens when we try out an extraction from Ariba: Search for? Wrox Registry? A Searching for at Ariba ... 1 items found: WroxDemo Wrox Demonstration Business (Ariba version) How about IBM:
28
B2B Futures: WSDL and UDDI
Search for? Wrox Registry? I Searching for at IBM ... 1 items found: WroxDemo Wrox Demonstration Business (IBM version) Finally, we will try Microsoft: Search for? Wrox Registry? M Searching for at Microsoft ... Exception in thread "main" [SOAPException: faultCode=SOAP-ENV:Protocol; msg=Miss ing content type.] at org.apache.soap.transport.TransportMessage.read(TransportMessage.java :214) at org.apache.soap.util.net.HTTPUtils.post(HTTPUtils.java:296) at org.apache.soap.transport.http.SOAPHTTPConnection.send(SOAPHTTPConn ection.java:208) at org.apache.soap.messaging.Message.send(Message.java:120) at com.ibm.uddi.client.UDDIProxy.send(UDDIProxy.java:1215) at com.ibm.uddi.client.UDDIProxy.send(UDDIProxy.java:1187) at com.ibm.uddi.client.UDDIProxy.find_business(UDDIProxy.java:192) at UDDIClient.main(UDDIClient.java:52) Now that doesn't look quite right. This problem arose during the period when we were working on this book, and is apparently due to the Microsoft registry returning a content type of "text/xml;", which causes Apache SOAP to enter a loop. With any luck, this will all have been resolved between the various parties by the time this book goes on sale.
Inquiries the Microsoft Way Let's see how Microsoft does it. There is a moderately sophisticated example provided with the UDDI API, but for clarity's sake, we're going to build something somewhat less ambitious, along very similar lines to the Java one. We'll use Microsoft Visual Basic version 6. Once we've created a new project, we'll need to add references to the two UDDI COM DLL's via Project | References:
29
Chapter 27
Next, we'll create our form:
The three text boxes are as follows: txtSearch, txtName and txtDescription. The combo box is cmbRegistry, and the command button is cmdSearch. We only need to attach any code to two events. In Form_Load(), all we need to do is set up the registry selection combo box: Private Sub Form_Load() cmbRegistry.AddItem ("Ariba") cmbRegistry.AddItem ("IBM") cmbRegistry.AddItem ("Microsoft") cmbRegistry.ListIndex = 0 End Sub
In cmdSearch_Click(), we start off by declaring a number of UDDI-related variables, which we'll discuss as we go on. Then we set up the first of these, the UDDI request manager. This is equivalent to the Java UDDIProxy object, in that it's the object that every UDDI interaction passes through: Private Sub cmdSearch_Click() Dim strSearch As String Dim strRegistry As String Dim req As UDDIEnv.RequestManager Dim envOut As UDDIEnv.Envelope Dim findBiz As UDDI10.find_business Dim envIn As UDDIEnv.Envelope Dim bizList As UDDI10.businessList Dim bizInfo As UDDI10.businessInfo strSearch = txtSearch.Text strRegistry = cmbRegistry.Text Set req = New RequestManager If (strRegistry = "Ariba") Then req.UDDI_Address = "http://uddi.ariba.com/UDDIProcessor.aw/ad/process" ElseIf (strRegistry = "IBM") Then req.UDDI_Address = _ "http://www-3.ibm.com/services/uddi/testregistry/inquiryapi" Else req.UDDI_Address = "http://uddi.microsoft.com/inquire" End If
30
B2B Futures: WSDL and UDDI
Next, we need to set up our find_business object. This object defines the parameters for our search: Set findBiz = New find_business
Then we have to create a UDDI SOAP envelope, and insert our find_business object into it. Once we've done that, we can safely populate it: Set outEnv = New Envelope Set outEnv.Plugin = findBiz findBiz.Name = strSearch findBiz.maxRows = 10
This exposes a little more of the underlying architecture than the IBM Java implementation (remember all we did last time was invoke the find_business method on the UDDIProxy object?). Now we can send off our request and get a response. The request goes off in one envelope and comes back in another. Strictly speaking, we could re-use the same envelope; however, for the sake of clarity, I've used a separate one: Set inEnv = req.UDDIRequest(outEnv)
If all goes well, we can extract a list of businesses from the incoming envelope: Set bizList = New businessList Set inEnv.Plugin = bizList
Having done that, we can simply iterate through the list, displaying the name and description (although we're actually only set up to display one at a time): For Each bizInfo In bizList.businessInfos txtName.Text = bizInfo.Name txtDescription.Text = bizInfo.Description(1) Next bizInfo End Sub
Let's try it out on Ariba:
31
Chapter 27
IBM:
And, finally, Microsoft:
Publishing to UDDI The majority of interactions with UDDI are likely to be inquiries. However, it's also unlikely that corporations in the future will want to be saddled with the hassle of using a web-style user interface every time they want to update their details. In this section, we look at how we can extend the way in which we interact with UDDI to include amendments as well as simple retrievals. Amending UDDI information involves several additional steps, and we are going to amend both of our implementations accordingly. All we are actually going to do is change the description field. However, once we can do that successfully, we can effectively amend anything.
Publishing the IBM Way As before, we will start off with the IBM Java implementation. We start by copying our previous class to UDDIClient2.java. Here's how it starts now, with a few additional imports (we'll see what these are doing as we go through the rest of the code): import import import import import import import
32
com.ibm.uddi.client.UDDIProxy; com.ibm.uddi.response.BusinessList; com.ibm.uddi.response.BusinessInfos; com.ibm.uddi.response.BusinessInfo; com.ibm.uddi.response.BusinessDetail; com.ibm.uddi.datatype.business.BusinessEntity; com.ibm.uddi.response.AuthToken;
B2B Futures: WSDL and UDDI
import java.util.Vector; import java.security.Security; import java.util.Properties;
In the new version, we are going to have to access the publishing URL for each of the registries, using a secure connection. Therefore, we are going to have to enable secure sockets: public class UDDIClient2 { public static void main (String[] args) throws Exception { Properties props = System.getProperties (); props.put ("java.protocol.handler.pkgs", "com.ibm.net.ssl.internal.www.protocol"); System.setProperties (props); Security.addProvider(new com.ibm.jsse.JSSEProvider());
The next section remains unchanged: byte input[] = new byte[128]; System.out.print ("Search for? "); int nRead = System.in.read (input, 0, 128); String strSearch = new String (input, 0, nRead - 2); System.out.print ("Registry? "); nRead = System.in.read (input, 0, 128);
We now have to set up a URL for publishing as well as inquiries: String strRegistry; String strInquiryURL; String strPublishURL; switch (input[0]) { case 'A': strRegistry = "Ariba"; strInquiryURL = new String( "http://uddi.ariba.com/UDDIProcessor.aw/ad/process"); strPublishURL = new String( "https://uddi.ariba.com/UDDIProcessor.aw/ad/process"); break; case 'I': strRegistry = "IBM"; strInquiryURL = new String( "http://www-3.ibm.com/services/uddi/testregistry/inquiryapi"); strPublishURL = new String( "https://www-3.ibm.com/services/uddi/ testregistry/protect/publishapi"); break;
33
Chapter 27
case 'M': strRegistry = "Microsoft"; strInquiryURL = new String("http://uddi.microsoft.com/inquire"); strPublishURL = new String("https://uddi.microsoft.com/publish"); break; default: System.out.println ("Invalid registry specified - exiting"); return; } System.out.println ("Searching for at " + strRegistry + " ..."); UDDIProxy proxy = new UDDIProxy(); proxy.setInquiryURL (strInquiryURL); proxy.setPublishURL (strPublishURL);
The next section, where we are setting up the initial inquiry, remains unchanged: BusinessList list = proxy.find_business (strSearch, null, 10); BusinessInfos infos = list.getBusinessInfos (); Vector vInfo = infos.getBusinessInfoVector ();
However, from this point on, we are in new territory. The first difference is that this time around, we're just going to focus on the first item found (if any). The second difference is that we're going to have to get access to the business entity, because that's what we're going to be changing. The first step to doing this is to get hold of the unique key for the record; the registry allocated this when we first registered the business: int nInfo = vInfo.size(); if (nInfo == 0) { System.out.println ("None found"); return; } BusinessInfo info = (BusinessInfo) vInfo.elementAt (0); String strKey = info.getBusinessKey ();
Now we have that, we make a request via the proxy for the business details relating to this key, and from that we can extract the business entity: System.out.println ("Getting business entity for key " + strKey + " ..."); BusinessDetail detail = proxy.get_businessDetail (strKey); Vector vEntity = detail.getBusinessEntityVector (); int nEntity = vEntity.size ();
34
B2B Futures: WSDL and UDDI
if (nEntity == 0) { System.out.println ("Failed to retrieve business entity"); return; } BusinessEntity entity = (BusinessEntity) vEntity.elementAt (0);
As a check that we have the right entity, we output the name and description from this object, rather than the business info object: String strName = entity.getNameString (); System.out.println ("Name = " + strName); Vector vDesc = entity.getDescriptionStrings (); int nDesc = vDesc.size (); if (nDesc > 0) System.out.println ("Description = " + (String) vDesc.elementAt (0));
Now we ask the user for a new description, and put it into the entity, as the first description. Then we load the entity back into the entity vector: System.out.print ("New description? "); nRead = System.in.read (input, 0, 128); String strDescription = new String (input, 0, nRead - 2); if (nDesc > 0) vDesc.setElementAt (strDescription, 0); else vDesc.addElement (strDescription); entity.setDescriptionStrings (vDesc); vEntity.setElementAt (entity, 0);
In order to publish, we need to give the registry the user name and password that we established when we registered ourselves with UDDI in the first place. We use these to get an authentication token, which we can use for subsequent publishing operations: System.out.print ("User name? "); nRead = System.in.read (input, 0, 128); String strUser = new String (input, 0, nRead - 2); System.out.print ("Password? "); nRead = System.in.read (input, 0, 128); String strPassword = new String (input, 0, nRead - 2); AuthToken auth = proxy.get_authToken (strUser, strPassword);
35
Chapter 27
Having got that, we can now save the business entity with its new description: proxy.save_business (auth.getAuthInfoString (), vEntity); } }
This is what happens when we run it; the username and password is the business name entered when we first registered a company at the start of this section: Search for? Wrox Registry? A Searching for at Ariba ... Getting business entity for key 3d528bf1-00e5-ec77-f22c-cefb19a7aa77 ... Name = WroxDemo Description = Wrox Demonstration Business (Ariba version) New description? Ariba Wrox Demo User name? … Password? … If we run it again, we can see that the description has indeed been changed. The IBM registry behaves similarly, but once again, the Microsoft registry is currently inaccessible from IBM.
Publishing the Microsoft Way
TE
AM FL Y
Let's complete the set now, and extend our VB application to publish a new description. Copy the project and form to UDDIClient2.vbp and UDDIClient2.frm, respectively, and amend the form as follows:
The new text boxes are called txtUser and txtPassword, respectively. The new command button is called cmdAmend. Seeing as we're now going to be holding information between the Search and Amend commands, we'll need to add a couple of global variables: Dim mReq As UDDIEnv.RequestManager Dim mEntity As UDDI10.businessEntity
Form_Load() remains unchanged, but we've also got to add one or two new variables to cmdSearch_Click():
36
Team-Fly®
B2B Futures: WSDL and UDDI
Private Sub cmdSearch_Click() Dim strSearch As String Dim strRegistry As String Dim outEnv As UDDIEnv.Envelope Dim findBiz As UDDI10.find_business Dim getBiz As UDDI10.get_businessDetail Dim inEnv As UDDIEnv.Envelope Dim bizList As UDDI10.businessList Dim bizInfo As UDDI10.businessInfo Dim bizDetail As UDDI10.businessDetail Dim strKey As String
As with the Java implementation, we need to set up both the inquiry address and the publishing (or secure, in Microsoft terms) address. We also need to use the global request variable now: strSearch = txtSearch.Text strRegistry = cmbRegistry.Text Set mReq = New RequestManager If (strRegistry = "Ariba") Then mReq.UDDI_Address = "http://uddi.ariba.com/UDDIProcessor.aw/ad/process" mReq.UDDI_SecureAddress = "http://uddi.ariba.com/UDDIProcessor.aw/ad/process" ElseIf (strRegistry = "IBM") Then mReq.UDDI_Address = "http://www-3.ibm.com/services/uddi/testregistry/inquiryapi" mReq.UDDI_SecureAddress = "https://www-3.ibm.com/services/uddi" & _ "/testregistry/protect/publishapi" Else mReq.UDDI_Address = "http://uddi.microsoft.com/inquire" mReq.UDDI_SecureAddress = "https://uddi.microsoft.com/publish" End If
The next section, which retrieves the business information from UDDI, remains unchanged: Set findBiz = New find_business Set outEnv = New Envelope Set outEnv.Plugin = findBiz findBiz.Name = strSearch findBiz.maxRows = 10 Set inEnv = mReq.UDDIRequest(outEnv) Set bizList = New businessList Set inEnv.Plugin = bizList
However, as before, from here on in we're in uncharted waters. The first thing we need to do is extract the business key: If (bizList.businessInfos.Count = 0) Then MsgBox ("None found") Exit Sub
37
Chapter 27
End If Set bizInfo = bizList.businessInfos(1) strKey = bizInfo.businessKey
Now we get the business details, and extract the entity from it: Set getBiz = New get_businessDetail Set outEnv = New Envelope Set outEnv.Plugin = getBiz getBiz.AddbusinessKey = strKey Set inEnv = mReq.UDDIRequest(outEnv) Set bizDetail = New businessDetail Set inEnv.Plugin = bizDetail If (bizDetail.Count = 0) Then MsgBox ("Failed to retrieve business entity") Exit Sub End If Set mEntity = bizDetail.businessEntity(1)
Again, we use this to populate the name and description fields: txtName.Text = mEntity.Name txtDescription.Text = mEntity.Description(1) End Sub
Now let's look at what happens when we click on the Amend button: Private Sub cmdAmend_Click() Dim save As UDDI10.save_business Dim outEnv As UDDIEnv.Envelope Dim inEnv As UDDIEnv.Envelope
The first thing that we have to do is get our authentication token. The slight difference here is that the actual token is stored within the request object, so we don't actually need to deal with it explicitly: mReq.Authenticate txtUser.Text, txtPassword.Text
Now we can amend our entity, and we're nearly home and dry: mEntity.Description(1) = txtDescription.Text
At this point, we encounter a problem. Because the Microsoft implementation is wrapped in COM objects, there is no easy way to copy across an entire entity into an envelope without copying each element in turn. So that's what we're going to have to do:
38
B2B Futures: WSDL and UDDI
Set save = New save_business Set outEnv = New Envelope Set outEnv.Plugin = save With save.AddbusinessEntity .Adddescription = mEntity.Description(1) .businessKey = mEntity.businessKey .Name = mEntity.Name End With
In fact, in real life, there would be a whole lot more! However, we are nearly finished, and we can send off our save to the registry: Set inEnv = mReq.UDDIRequest(outEnv) MsgBox ("Done") End Sub
Let's try the application out and see if it works. Here is the result of our search on Ariba:
Let's change that description back to what it was before:
If we now clear that description and issue the search again, the new value will reappear. And the really good news is that this works with the IBM and Microsoft registries as well.
39
Chapter 27
Summary In this chapter, we have looked a little into the future. In particular, we have: ❑
Looked at how our systems are going to be talking to each other, using WSDL
❑
Succeeded in getting two completely unrelated systems to work together in a reasonably satisfactory manner
❑
Discussed what impact this might have on future software development processes.
We have also looked at how: ❑
Our systems are going to find compatible partners, via the UDDI registry
❑
We can search this registry to find the information we need
❑
We can publish data to it.
Again, we have done this with two entirely different technologies.
40