196 100 14MB
English Pages [260] Year 2004
Grassroots Series =
a
[=
Pa
/ eer
BS:
Fa Ras
aaa =e
we
e
ie
ee ase
-
E
. =
= Bs
z 4
Alison Cawsey & Rick Dewar
fy
Internet
Technology and e-Commerce
x
Burton
College
Library Tel: 01283 494431/2
|BURTON COLLEGE | ACCIBC No
FOCEY
| Be. OSy
CLASS No
CAH
Internet Technology and a »0ks 9~°
‘ » be returned on or before
+ date -&---
Alison Cawsey and Rick Dewar
Qfave “ macmillan
© Alison Cawsey and Rick Dewar 2004 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. No paragraph of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London W1P 9HE. Any person who does any unauthorised act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The authors have asserted their rights to be identified as the authors of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2004 by PALGRAVE MACMILLAN Houndmills, Basingstoke, Hampshire RG21 6XS and 175 Fifth Avenue, New York, N.Y. 10010 Companies and representatives throughout the world PALGRAVE MACMILLAN is the global academic imprint of the Palgrave Macmillan division of St. Martin’s Press, LLC and of Palgrave Macmillan Ltd. Macmillan® is a registered trademark in the United States, United Kingdom
and other countries. Palgrave is a registered trademark in the European Union and other countries.
ISBN 0-333-98999-6 This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. A catalogue record for this book is available from the British Library.
10 TSS
ey 7 6 LZ 110095
Printed in China
Sale lary OS a C7
3 2 OG ne OS
1 04,
Dedication
For Sophie and Andrew For Finlay
AJC RGD
Between them they kept this book short.
t.
Cpe
‘hele: sid be Ma
Shears 2
oe
Wii, a +
ra
‘a
thitteacne
SO
Leola
a ides
A
i
aed
Cae
abet
Ce GPRcin Se ari: nc entala ee DeeCet re
ae
Zag me
ee merece Me Oe,
on
“he ee Wt by noted PVM tl
mth OTe
O-eagnone Mampenen RO! 60 cog! Sen,
[rn
=
Nee Vek, WALD -
ot. TeOV ee codves Orns Gee eres
ene Resi gr pe cman of mates ag Dar Petenien® § «repre mete me Pe nities an ae VER owl Sitver cvyr eee
rt
fan ernie tet yon Rte ~
® tg
Ae O51 -FS o
ide
:
ihn Seek & areas cn glue antieisaeaiamanma te
Siriaas vel Adaline? fomet pei
raceett fv nw Dockieslic Ain Pit get ee
wo
€
inet © Cin
7 le. h.se
é
2 «
_ "
& zal hun
a =
‘
=
eae 2
=
ss
= 2
aes —
ct
Contents
Preface
Chapter 1
xl
Introduction Chapter Overview
rl pe
Introduction The e-Commerce Transaction
Further Reading
Chapter 2 a, 22 233 2.4
Chapter 3 Jil 332 383 3.4 Ps) 3.6
SS NO OO
HTML Chapter Overview Introduction to Markup Languages and HTML Managing the Presentation: Stylesheets Managing Interaction with the User: Introduction to Forms Getting it Wrong: HTML Markup Pitfalls Chapter Summary Exercises Further Reading
19 D2 oe) 23 25
Web Site Design and Accessibility
26
Chapter Overview Introduction Design Considerations Design Communicates Trustworthiness Accessibility Generating the Pages Validation and Testing Chapter Summary Exercises Further Reading
26 26 26 By 3h) 36 36 37 37 38
14
Contents See
eed
Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6
Chapter 5 Del Doe oka) 5.4 a 5.6 Def 5.8 Do
Internet Architecture and Protocols
39
Chapter Overview Introduction Network Architectures Web Site Meta-architecture Protocols Capacity Planning Beyond the Client-Server Paradigm Chapter Summary Exercises Further Reading
39 39 4] 43 47 50 52 33 54 55
Languages of the Internet
56
Chapter Overview Introduction The Client and Server Sides of Web Programming Server-Side Scripting Languages Client-Side Scripting Languages
Further Reading
56 56 ad 38 62 63 66 67 67 67 68 68 69
Client-Side Scripting in JavaScript
70
Chapter Overview Introduction to JavaScript Document Objects Event Handlers Animating your Documents: DHTML
70 70 76 81 84 89 92 92 94
Java
Flash Cookies XML Hybrid Approaches Chapter Summary Exercises
Chapter 6 6.1 6.2 6.3 6.4 6.5
Cookies in JavaScript
Chapter Summary Exercises
Further Reading
Contents ULE ELE
Chapter 7 veal (ee) (pe 7.4 Jao) 7.6 od 7.8 73 7.10
Chapter 8 8.1 8.2 8.3 8.4 shoe,
Chapter 9 9.1 vB 953 9.4 eS: 9.6 OF
Server-Side Scripting in Perl
TINE,
95
Chapter Overview Introduction Getting Started Variables and Arrays Controlling the Script’s Execution File Handling String Operations Built-in Perl Functions A Simple Script for the Web Passing Values to the Script using CGI.pm Cookies in Perl Chapter Summary Exercises Further Reading
95 7) 96 Hh 100 105 107 108 108 111 114 115 ‘Ns 116
XML: eXtensible Markup Language
117
Chapter Overview Introduction to XML Defining Document Types
Displaying Documents: Stylesheets Substituting and Inserting Text Manipulating XML Documents: JavaScript and the Document Object Model Chapter Summary Exercises Further Reading
117 Li? al 27 PSD fy 143 144 145
Case-Study: A Simple Shopping Site
147
Chapter Overview Introduction Design of the Site Using JavaScript to Check Form Submissions Using Perl to Handle Requests Using Cookies to Keep Track of Purchases Using XML for Product Lists Business and Practical Issues Chapter Summary Exercises
147 147 148 152 152 154 156 158 158 158
Contents _ Semmens
Chapter 10 10.1 10.2 10.3 10.4 10.5 10.6 10:7 10.8 10.9 10.10 10.11
Chapter 11 Nel Lie2 13 11.4 Ii 11.6
Chapter 12 IZA Lae Lae 12.4 12.5 1226
e-Business Models
160
Chapter Overview Introduction Storefronts and Malls Auctions Portals and Community Sites Dynamic Pricing
160 160 161 162 163 163 165 165 167 169 169 170 170 170 171
Barter
The Virtual Organisation Application Service Providers Extranets
Industry Consortia B2B Trading Hubs Disintermediation and Reintermediation Chapter Summary Exercises Further Reading
e-Marketing Chapter Overview Introduction The 4 Ps of Marketing So What is e-Marketing? Doing e-Marketing Globalisation Sticky Sites Chapter Summary Exercises Further Reading
e-Payment Chapter Overview Introduction Offline and Online Payment The Online Credit Card Process A Word on EDI and EFT The Debit Card Process Alternative Payment Methods Chapter Summary Exercises Further Reading
172 172 172 Wis: 173 174 179 180 180 180 181
182 182 182 183 183 184 185 185 189 189 191
Contents enantio
Chapter 13 toe Rie Low 13.4 13.5 13.6 Loh 13.8 13.9 13.10 13.11 ie a 1.153
e-Security
192
Chapter Overview Introduction Security per se The e-Security Issues Encryption Cryptography Key Protocols Digital Signatures Timestamping Digital Certificates and Certification Authorities Public-Key Infrastructure Security Protocols Network Security with Firewalls
Further Reading
Nee Hz 192 ro5 196 197 21 201 203 203 204 204 205 206 207 207 209
e-Law and e-Ethics
210
Directory Access Control
Chapter Summary Exercises
Chapter 14 14.1 14.2 14,3
Chapter 15 15s], 152 5B
Chapter Overview Introduction
Chapter Summary Exercises Further Reading
210 210 2d 218 a9 220 221
e-Methodology
222
e-Law e-Ethics
Chapter Overview Introduction The System Development Life Cycle Methodologies Chapter Summary Exercises Further Reading
phn SEW) 229 224 229 229 230
Glossary
231
Index
239
Preface
This book provides an overview of key technologies for the World Wide Web and important issues in e-Commerce. It is intended to get you started, so that you have the confidence to try things out, and to give you an awareness of what can be done and what must be taken into account when designing a site. We don’t introduce the latest Web development tools. To do so would guarantee that this book would be out of date in two years. Our aim instead is to provide a base from which you will be able to understand new tools and techniques as they emerge. By developing an awareness of all the issues surrounding e-Commerce you should also be able to use new tools and technology appropriately. The book is based on the authors’ experience in teaching introductory courses on e-Commerce and Internet technology at Heriot-Watt University. We hope that it will be useful to students on similar courses, and also to anyone wanting a fairly quick overview of some of the key technologies and issues. To learn as much as possible from the book it is important that you try
things out. You also need to be able to find up-to-date sources of more detailed information. We have therefore included a variety of exercises in the book: some practical exercises on aspects of Web site development, and some exercises asking you to find out about current Web-based resources. By researching these for yourself you will be able to find and evaluate Web sites that are interesting to you or culturally relevant. We do make some suggestions for further reading in each chapter, including some key Web resources. However, you should be aware that to develop professional skills in this area you should be consulting the latest standards and the texts that support these. The book is split loosely into three sections. Chapters 2—4 introduce HTML, Web site design, and the architectures and protocols of the Internet. These chapters provide basic background material, and should also enable you to get quickly up to speed in developing simple static
Web sites. Chapters 5-9 build on from this, looking at the languages associated with the development of dynamic, interactive Web sites. Chapter 5 introduces a range of Web programming and scripting languages and how
they are used. Chapters 6-8 build on these and provide more detail on some basic technologies: client-side scripting in JavaScript; server-side
Preface serra terre
scripting in Perl; and XML (eXtensible Markup Language). Practical experience in these languages should provide you with a much better understanding of how dynamic Web sites really work.
Some of the technologies discussed in Chapters 5-8 are revisited in
Chapter 9, which presents a simple e-Commerce case-study. A shopping
site is developed using client- and server-side scripting. In terms of the technologies used it may seem old fashioned, but we aim in this chapter to show you how we can bring together different technologies in the development of an actual site. By seeing how different client and server-side technologies work together you should be in a strong position to learn and use appropriate new tools and technology as they emerge. Chapters 10-15 discuss key issues in e-Commerce. Chapter 10 (e-Business Models) looks at different ways in which e-Businesses can
arrange themselves and some of the options available to them. Chapter 11 (e-Marketing) considers basic marketing in the real and virtual worlds, but goes on to cover some of the marketing initiatives that are specific to e-Commerce. Chapter 12 (e-Payment) covers a selection of online
payment card and Chapter security,
options that are available as alternatives to the ubiquitous credit explains the difference between B2B and B2C payment. 13 (e-Security) begins by considering the wider context of then looks at cryptography in some detail. Issues of security
protocols and architecture are also discussed. Chapter 14 (e-Law and e-Ethics) considers the legal liabilities of the e-Commerce company and looks at the role of the contract in this context. The chapter also covers some of the ethical dimensions of e-Commerce. Chapter 15 (e-Methodology) attempts to pull together issues presented in other chapters and discusses the likely methodologies for developing online
applications with reference to development life cycle models. A short glossary is included at the end of the book, covering some, but not all, of the many technical terms and acronyms that you will encounter. We indicate terms that occur in the glossary by putting them in bold when they first occur in the text. All the chapters include exercises. Working through these exercises will test and broaden your understanding. In general the exercises in Chapters 5-9 are very practical, developing basic Web programming skills, while the exercises in Chapters 10-15 involve doing some research on the Web to broaden your knowledge of the area. There are a number of possible routes through the book, perhaps emphasising technical or e-Commerce issues. However, Chapters 2, 6, 7, 8
and 9 should be tackled in the order given. We have kept to a minimum the special purpose languages and tools
required for the exercises and examples. Most of the examples simply require a text editor (e.g. emacs or Notepad) and a fairly recent browser (e.g. Internet Explorer 6 or later, Netscape 6, Mozilla 1.1 or later). It is
better when starting out NOT to use specialised editors (e.g. HTML or XML editors) as these tend to hide part of what you are meant to be learning. Just type examples into your text editor. Some of the server-side
Preface tneensneosenaennenasne
scripting examples require you to have access to a Web server, and permissions to place files in an appropriate cgi-bin directory. They also require a Perl interpreter — but these are free and widely available for most platforms. Occasionally we make assumptions about the particular browser or platform you are using, and of course new browsers are always emerging. We will aim where possible to provide alternative versions of important examples on the book Web site. This book assumes relatively little prior knowledge. We assume that you are familiar with browsing the Web, and know how to use search engines to find information. We also assume in certain parts of the book that you have done some programming before, although we make no assumptions about the language. We hope that by working through this book you will reach a point where you have enough practice in core Internet technologies, and enough understanding of the issues in modern e-Commerce, that you will then be
able to rapidly pick up the more specialised skills that may be required for a commercial project. There is a book Web site at: http://www.macs.hw.ac.uk/~alison/netbook/
This contains the code examples from the book, links to key Web resources, and any errata. Please feel free to contact the authors with any suggestions or comments.
Introduction
IrN UP HA ppp «8m
CHAPTER OVERVIEW This book introduces key issues in e-Commerce, and some of the basic Internet technologies supporting e-Commerce developments. In this first chapter we set this material in context and:
Provide a little historical background. Discuss what is involved in an e-Commerce
1.1
transaction.
Introduction e-Commerce describes the buying and selling of products, services, and information via the Internet. It’s not just about online shopping, where the customer carries out a transaction with a business — so-called Business to Consumer (B2C) trade. It is also about transactions between businesses.
These are commonly referred to as Business to Business (B2B) exchanges. e-Commerce has been around a long time. In the early 1970s businesses were already exchanging data using computer networks — this was referred to as Electronic Data Interchange, or EDI. Initially private (and expensive) networks were used, but in the early 1990s the Internet started taking their
place, opening up EDI to smaller businesses and individuals. The development of the World Wide Web (WWW) in the early 1990s, with the release of the first Web browser (Mosaic) in 1993, led to the
beginnings of what we now think of as e-Commerce. By 1994 you could order pizzas from Pizza Hut online (if you lived in the right area!). e-Commerce boomed from 1995 to 2000. It allowed even small suppliers to reach a potentially enormous market, with low startup and operating costs. Money was thrown at entrepreneurs with new ideas (some with little business experience). But many businesses failed to break
even, and were living on promises. The crash came in 2000, with large businesses folding. Boo.com, for example, took hundreds of millions of investment money with it.
Internet Technology and e-Commerce Despite the crash, e-Commerce continued to grow in 2001. More and more organisations are expanding into e-Commerce, and more people are buying products over the Web. Although the future is still not clear, it does seem that we have learned from the heady days of the late 1990s, and are now beginning to take a more thoughtful approach to new developments. We’re therefore now ready to take a proper look at the issues. e-Commerce is an area where new ideas and new technology are emerging all the time. It may be tempting to follow upon the latest buzzwords, the latest tools, and try to rush in and learn the technology of the moment. However, we believe that it is better to start with a
grounding in the basics, and develop a basic competence in the core Internet technologies and business ideas that may be used in e-Commerce. Understanding e-Commerce includes consideration of the supply chain, customer service, procurement and intrabusiness tasks. It also
involves developing an understanding of and competence in the underlying technology. In this book we aim to introduce both aspects.
1.2
The e-Commerce
Transaction
We will now look in more detail at the e-Commerce
transaction, so you
can get a feel for some of what is involved. Consider a simple transaction where a list of products is displayed, and the user selects one item for purchase, then enters payment details. We can look at the transaction from a technical and from a usercentered point of view. From a technical perspective: The user requesting the Web page from his browser results in a request being sent to the server machine. The server locates the relevant page and sends it back to the user (client) machine. In general the server’s job is just to accept requests from the client, deal with them appropriately, and send responses.
That page is (probably) written in HTML. The browser works out how to display the HTML appropriately, given the browser size and settings.
The page will be displayed with product information, and (perhaps) checkboxes for the user to select items of interest. The user selects one or more items and submits his order.
Now another request is sent to the server. This time the request includes some data (the user’s selections). The server may run a program (e.g. a Perl program) to process the data, perhaps recording the order. It then sends another response. This time the response might consist of some HTML containing a form allowing the user to enter his payment details, which, when submitted, results in another request to the server.
Introduction LOLI
NN
ETSI
From a more user-centered perspective, the transaction involves an exchange between a user and the organisation selling the products. Site designers must consider this transaction as well as the technical one! They must take into account: How to design the site so it is attractive, and easy to navigate.
How to market the site, and manage payments. How to ensure that the transactions are secure, and can’t be
intercepted by a third party. How to ensure that the transaction with the user is acceptable from a legal and ethical perspective, and, related to this, how to engender trust in the user, so that he or she comes back for more.
We hope that this book will enable you to understand both some of the technologies involved in e-Commerce transactions, and the more usercentered aspects.
FURTHER
READING
Bates, C (2001) Web Programming: Building Internet Applications, John Wiley. Davis, WS and Benamati, J (2002) E-Commerce Basics,
Addison-Wesley. Deitel, HM, Deitel, PJ and Nieto, TR (2001) Internet and the World
Wide Web — How to Program, 2nd edition, Deitel. Turban, E, King, D, Lee, J, Warkentin,
Electronic Commerce 2002:
M and Chung, HM
(2002)
A Managerial Perspective, Prentice Hall.
HTML
NWO zmaAeyPr4N CHAPTER OVERVIEW The basic foundation for developing Web-based applications is the proper use of HTML. In this chapter we discuss: What we mean by a markup language. The various HTML
standards, including XHTML,
and the
importance of validation.
- The use of stylesheets to control presentation, and forms to allow user input.
Daal
Introduction to Markup Languages and HTML In this chapter you will learn about HTML: a markup language for the World Wide Web. Markup languages have been around for a very long time. A markup language is a language for annotating some text with
additional information, such as how the text should be displayed. Copyeditors have long used their own markup language to annotate books and articles prior to typesetting, while text formatting languages are based on adding additional markup to a text to indicate how it should be displayed.
Code 2.1 shows how we indicate that some text is to be displayed in a bold typeface in the LaTeX text formatting language: CODE 2.1 LaTex markup example
SSS
AE ERE
NNO
SIS
SUS TET ARIS
IRC
SL
ST
RE
Mee
Lee
eee &
Here
is an example
of some
ie
HTML, or Hypertext Markup the needs of documents which Web. As well as having special with how a document is to be
SAS
{\bf bold}
EILEEN
type. ORE
a
seea incsisidcceaeiaaianeemaeaall
Language, is a markup language geared to are to be displayed on the World Wide markup instructions which are concerned displayed, it has markup which allows the
author of the document to specify how the document is linked to others,
thereby allowing the reader to navigate from one document to another.
HTML SRR
HTML is based on SGML (Standard Generalised Markup Language). SGML provides a standard metalanguage on which a great variety of markup languages may be based. In SGML text is marked up by adding
start and end tags to label particular elements in the text. For example, the following code uses the start tag and end tag to label ‘Joe Bloggs’ as an author element:
CODE 2.2. element
An SGML
SGML provides a way of defining a particular markup language, does not specify, for example, what elements are allowed and how may be nested. A markup language based on SGML provides a set allowed elements (and allowed element nestings) appropriate for a particular application. A publishing application might, for example,
but it they of allow
an author element, as illustrated above. The World Wide Web is an
application with its own requirements. HTML is therefore defined to have elements appropriate for marking up documents for the World Wide
Web. HTML as a language has evolved as technology has developed over the last decade. HTML was invented in 1989 by Tim Berners-Lee. The original HTML was very simple (with a small set of allowed elements), but Web
users and developers rapidly started adding to it, with new elements for a range of purposes. There was a need for a standard version, encompassing new developments, but which all users could accept. HTML version 2.0 was the first attempt at this, followed by version 3.2 and then HTML 4.0. Version 4.01 (a minor update on 4.0) is now widely used and supported. However the latest is XHTML (discussed later in the chapter). The examples in this book are either in HTML 4.01 or, once we have discussed it, XHTML. These standards developments are coordinated by the W3C (World Wide Web Consortium) — an international consortium that promotes and coordinates the development of the Web. The rest of this section will briefly introduce some of the most
important elements used in HTML so you can develop simple Web pages. Much fancier looking Web pages may be developed in shorter time using
the many authoring tools now available. However, it is important to start by using a text editor and write your HTML by just typing it in. That way you can gain a proper understanding of the language, and go on to
understand and use the more advanced Web technologies.
2.1.1
Authoring Simple HTML HTML uses markup tags to indicate how a document is structured, and how elements within it should link to other files on the Web. The structure of the document in turn determines how it will be displayed in a
Internet Technology and e-Commerce
browser. An HTML file is a plain text file, and can be created using a simple text editor (such as Notepad under Microsoft Windows).
Code 2.3 is a very simple piece of HTML, which you can copy into a text editor and view in your browser. Your file can be called whatever you like, but the name must end with .html or .htm.
CODE 2.3) HTML Example 1
Code 2.3 appears as shown in Figure 2.1 when viewed from a Web browser (in this case Netscape).
The example shows how HTML tags are used to mark up the
document. Tags always start and end with angle brackets. A piece of text (or a section of HTML) is surrounded by a start tag and an end tag with the same name (e.g. My Page ). End tags are indicated by the additional slash (/). The two tags are used to define elements. For
example, the whole section surrounded by
is a body
element, containing the main content to be displayed on the page. Figure 2.1 Example 1 displayed in browser
HTML wepsortomnsermnneenem
Altogether there are five different types of element in the example: html head title body joa
p
The document starts with and ends with . These tags define the top level html element. We then have a head and a body element. The head element contains some information about the HTML file (including its title, which is a required element). The main content to
be displayed is in the body. Within the body h1 defines a heading element and p is used to indicate paragraphs. There are of course many other elements defined for HTML. There are elements for defining lists, tables, forms, and for incorporating various media elements in a page. Codes 2.4 and 2.5 show how an ordered list of items, and a simple 2 x2 table, are defined. Note how a table consists of a number of table rows (tr), which in turn consist of a number of table data cells (td). CODE 2.4 HTML ordered list
Foe
Bae
SR
ee
NNN
EONS
St
EERE EE
NESE
EE ESN
STL AUN
Le
co so “11> Pirst.ttem.:
Item 1 | Item 2 | Item 3 | Item 4 |
You can
find out more
about me
here, or look at this table:
Name | Alison |
Age | 38: | af CLS
Here’s a picture that I drew.
The first paragraph.
Some items:
Saad
oe
The first paragraph.
Some items:
Peul> ‘ Isn’t it a nice page.
A red paragraph.
EEO
ae ene ee
eT
.
Name:
é
Age: adBr SEBS os
Note how input elements have the attributes, type and name. These attributes allow you to specify the kind of input (e.g. text or checkbox), and give the input a name, that allows it to be referenced by whatever application uses this input. The form itself has an action attribute (which for now we leave empty). This specifies what is to be done with the form’s contents. Figure 2.4 shows how Code 2.24 is displayed in a Web browser. Figure 2.4 Simple form
There are a range of different types of input element that can be used, apart from text. The other main kinds are radio buttons, checkboxes and buttons.
» Radio buttons allow users to choose one from a range of mutually exclusive alternatives: CODE 2.25
_ Are you: 0-30 e pee SNS ee _ rmn CHAPTER
OVERVIEW
In order to make effective use of Internet technology for e-Commerce applications you need a basic understanding of the architecture and protocols associated with the Internet. This will allow you in turn to understand the design of complex Internet applications. In this chapter we: Discuss the role of the e-Architect.
Explain the components in, and rationale for having, a tiered architecture.
Describe the various components that make up a Web site’s meta-architecture. Discuss the role and selection of the Web server.
Explain the purpose of network protocols.
Describe the components and operation of the TCP/IP protocol suite. Consider capacity planning issues.
4.1
Introduction Let us begin our investigation into the architectural issues of e-Commerce by defining what we mean by architecture, before we look at e-Architecture. By architecture, we mean the way that the components of any design (e.g. a car, a building or computer system) fit together. The architecture might include consideration of what the components are, where they exist in time or space, how components communicate or are held together and the performance expected of them or the system as a whole. For instance, the architect for a building might consider the specification:
39
Internet Technology and e-Commerce -eaeenarntscony
to be located at 35 Anywhere Street; planned to be complete by next Spring; access from the ground floor on its east and south sides; constructed using a bolted steel frame with a sandstone outer skin; to comprise 4 floors connected by 1 stairwell and 2 elevators; to cover 2000 square metres of land; to be used for office accommodation; should not require major maintenance work for 20 years; to have electrically powered heating and air conditioning; will consume a maximum of 2MW of electrical power under extreme conditions. Looking at this example for a building, we notice two things. Firstly, these considerations look very much like customer requirements. Indeed, the role of an architect (for a building or an information system) is often to spend a great deal of time making sure their design meets the customer’s expectations (and by customers we can mean those inside and outside our organisation). Secondly, the concerns of architects from whatever domain are, at some level of abstraction, universal. Architects
are worried about time, cost, quality, risk, scalability, maintainability, reliability and so on. They also need to be aware of the people who build, use and maintain the system. Finally, they need to be aware of the latest technology, trends and opportunities that might affect the decisions they make. So, architecture is not just a plan for how something works or how it looks, it is also a role that someone needs to undertake.
Architects are also often concerned about managing complexity. We as humans have powerful, yet limited, abilities to imagine very complex systems in their entirety. Yet very complex systems exist — spacecraft, software applications, bridges, etc. The way we manage this complexity in order to build the system, or understand how something works, is to break it down into smaller, more manageable parts — e.g. a car is made up of sub-systems like transmission, engine, chassis, steering, etc.; in turn these
are themselves made up of sub-systems. Since we can understand the smaller parts, we can build them. Once built, all we need to worry about is
their interface with neighbouring parts — in other words, we can treat them
like black boxes that hide their own internal complexity. Now, as architects, we can plug these parts together safe in the knowledge that the whole system should work. Sounds simple, but things often go wrong — particularly at the interfaces between parts when the communication and interaction mechanisms are not fully understood. We can even get it wrong if we consider the wrong collection of parts to be a sub-system. We shall continue this discussion when we look at protocols later in this chapter. So what is the objective of this chapter? Well, firstly we want to start thinking about the components you will need to build a site. By components we are not talking about HTML and graphics files; we shall consider components at a slightly higher level of abstraction. So we are interested in the hardware, the databases and the functionality of certain key software elements. Once you, as an architect, have decided on these,
40
Internet Architecture and Protocols scomaeretaenonnearneens
i
NS
RSLS ELIE ATTEN SSS
ELS TT
Se
then you can start worrying about the Web site’s layout, content, navigation and functionality. Although such Web site design concerns constitute a creative act and rightly belong under the general banner of architecture, for convenience (and by virtue of imposing our arbitrary level of abstraction) we considered them in Chapter 3 on Web Site Design. Having looked at the components, and thereby satisfied our role as managers of complexity, we need to stick/glue these parts together. In the physical world of the computer network, such connections are predominantly cables. However, the data that is transmitted along these cables by the sending component needs to be addressed to, and understood by, the recipient component. Therefore, we shall also consider the network protocols and tools that allow us to stick the constituent parts together.
4.2
Network Architectures In the past, the dominant network architecture in an organisation would have been the mainframe with a number of satellite dumb terminals on people’s desks. Although no longer dominant, mainframes are still commonplace. These dumb terminals have keyboards and sometimes mice, and connect directly to the mainframe computer. As such they are little more than windows on the data and processes that the mainframe hosts. All the data live on the mainframe and all the processing is done there as well. The disadvantages of this architecture are related to its reliance on the mainframe. If the mainframe goes down, you are unable to carry out your work or communicate. Mainframes can also be susceptible to load surges. For instance, if everyone is processing data on the server at the same time, the response you get at your dumb terminal degrades to an unacceptable level. Subsequently, we saw a move to a client-server architecture (sometimes called a two-tier architecture) where the server
took the place of the mainframe and the clients (typically PCs as we know them today) took over the roles of the dumb terminals on people’s desks. The difference was that some power and responsibility was devolved from the server to the client — the client was no longer dumb. The servers became responsible for some storage, communication and administration duties. For example, you might have your own local hard disk in the PC
that sits on your desk, but you can also choose to put your files on a networked drive that lives on the server. Also, the server can allow or deny your request to log on to the network from your client machine. However, this trend for decentralising power and responsibility can go
too far, whereby each client becomes fat. There might be different and incompatible applications on each client. This may be problematic since it compromises communication; the users of the clients may not actively back-up their local files making them vulnerable to losing their important and valuable data; licensing diverse suites of applications becomes costly and difficult; more expensive machines are needed to store and run
41
Internet Technology and e-Commerce ev
RERUN NOURI
LIERE AUNT LI ES
sa
applications on fat clients; and so on. At least with the mainframe approach, there was a central, controllable, editable, supportable environment. In an attempt to control such chaotic diversity, some
organisations limit user access to local client storage and centralise applications. For instance, when a user wants to use the standard word processor and clicks on the relevant icon on their PC desktop, a copy is downloaded from the server to the client machine and run locally from memory. At the same time, there is a tentative move away from fat clients to
so-called thin client-servers. A thin client-server architecture within an organisation hosts the minimum amount of data and number of applications that are needed; the rest is outsourced to Application Service Providers (ASPs). An example would be that we no longer necessarily need any email client software on our PC, or even on our server, since we
can access remote accounts via our Web browser. The flip-side of this ‘diet’ is, however, that problems can be more acute with respect to network bandwidth (how much data can be transmitted), latency (the
related issue of how long data take to travel down the line) and connection stability. If performance is too low, the ASP option becomes unattractive. The notion of ASPs is discussed further in Chapter 10 on e-Business Models and Chapter 14 on e-Law and e-Ethics.
4.2.1
The 3-Tier Architecture
Having spoken about the architecture of our networks, let us now turn to that of our applications. A common approach is to use (or at least think in terms of) a 3-Tier Architecture. This is a model in which the user interface,
functional process logic (“business rules’) and data storage are developed and maintained as independent modules, often on separate platforms (see
Figure 4.1). The 3-tier architecture is intended to allow any of the three tiers to be upgraded or replaced independently as requirements or technology change — as long as the messages sent and protocols used to communicate between the tiers are consistent and understood. The middle tier can also take on the responsibility of load balancing so that, for instance, if too many clients are connecting to the application server, which in turn is swamping the database with requests, the application server can use other database servers in order to spread or balance the load. Compare this model to the mainframe approach where data, application and interface code were often tightly coupled, making maintenance complex and expensive. Typically, the user interface runs on a desktop PC and uses a standard
Graphical User Interface (GUI); functional process logic may consist of one or more separate modules running on an application server; and a Database Management System (DBMS) on a database server or
mainframe contains the data storage logic. The middle tier may be multi-tiered itself (in which case the overall architecture is called an “‘n-tier architecture).
42
Figure 4.1 The 3-tier
User Interface Client
architecture
Application Server
Database Server
Taking this approach outside the organisation it is possible to imagine that these three components do not even exist on the same local area network; they may be dispersed over the Internet. Consider the example of obtaining live, online share prices where the user interface is provided by a browser on the client machine, the application logic resides on the Web server of some e-portal company and the database lives at the stock exchange.
4.3
Web Site Meta-architecture As we have already said, this chapter will not concern itself with the appearance and functionality of the site. Instead we will stand back a little and consider the infrastructure that supports that site architecture — the architecture of the architecture, or meta-architecture, if you like. So, as
Web site architects, we exist within a larger system, namely the Internet, with all its protocols and client machines ready to access our site. Indeed, our site’s meta-architecture is a sub-system of the Internet and we need to start selecting the sub-sub-systems to build it with. Please note, however,
that we are assuming here that you are about to build this metaarchitecture and not, as so many online companies do, buy or outsource the facility from another company. This is the classic Build or Buy? dilemma that many face when they are considering developing a site. We discuss this issue elsewhere in Chapter 15 on e-Methodologies and Chapter 10 on e-Business Models.
43
Internet Technology and e-Commerce “Tessmmnmmsomemmmmm aN
cass yeasts meee
We need to decide on the hardware, software and functionality of our Web server, firewall, database, applications (client- and/or server-side? — see Chapter 5 on Languages of the Internet for a more comprehensive treatment) and communication protocols. We could begin by considering the topology of the system (see Figure 4.2). Here we show servers as existing on separate computers, but be aware that they could be, and often
are, housed on the same machine. In our Figure 4.2 example, the organisation’s internal network of servers and clients is denoted by LAN (Local Area Network). Different people with different authorisations will be able to access the other components we can see. Indeed the LAN will probably be able to access the Internet via the Gateway Server. These components are the parts of the architecture that are of interest to us in order to get our Web site up and running. Let us begin in the bottom right of Figure 4.2. Here we can see a Legacy System. If your organisation has been around for a number of years there is a fair chance that there will be old computer systems, e.g. mainframes,
that were built some time ago but are still being used — these are your legacy (or heritage) systems. Although they are perhaps difficult to maintain and do not readily interoperate with other systems, they are a vital and valuable part of the business and their replacement would be risky and costly and is certainly not planned in the near future. They may hold your inventory, accounting software, customer records, etc. As well as the data they possess, they will probably encapsulate a great deal of organisational knowledge in terms of the programs that have been built and evolved over the years to cater for your own circumstances and processes. In fact, the new business you get from the Web may eventually end up living on the legacy. For this reason, we need to be able to get our new Web architecture interoperating with the legacy.
Figure 4.2 Hardware architecture
Firewall
=| Gateway |
Server
Middleware Server
Database Server
44
Internet Architecture and Protocols Perera
nee tele
nc
sos
A common means of enabling this interoperability is through the use of middleware products. These software applications are specially designed to help heterogeneous computer systems communicate and are sometimes referred to as the glue or the plumbing between components. In our example we have installed these on the Middleware Server. There are a variety of middleware products and genres. For instance, CORBA, DCOM and Enterprise Java Beans provide a distributed object infrastructure — in effect, an application makes available a service to other components in the form of an object and hides the implementation of that object. Alternatively, message-oriented middleware can marshal predefined requests and responses into messages from and to applications to assist interoperability. Since both the Web Server and the Database Server may need to access the Legacy System’s logic and data, they can use the middleware applications to do so. Assuming the Legacy System is being used as the prime repository for operational data, the Database Server could act as a back-up for the legacy data, or store Internet specific data that the legacy was not designed to accommodate. As we mention elsewhere (specifically in Chapter 13 on e-Security), we would ideally want a well configured firewall between us and the Internet as a security measure. Therefore, the Web Server and the Gateway Server are both connected to the firewall. At this point, we should emphasise that Figure 4.2 constitutes only one of many possible configurations. Some components may live on the same physical machine and other components not mentioned here could come into play, such as an FTP, email or proxy server. Furthermore, you may have functionality requirements such as software for Customer Relationship Management. You may even be servicing multiple access channels like the World Wide Web, WebTV, wireless clients, intranets and extranets or dial-in services.
4.3.1
The Web Server
Turning now to the Web server itself and considering Figure 4.3, perhaps the most important component is the server software. It is responsible for access control, running scripts and programs, enabling administration of the server configuration and (to some extent) the contents of the Web site
and logging transactions. When deciding on a particular Webserver software, considerations may include: - Cost — the most popular packages are free, although free may mean
they come as part of a non-free suite of software like an operating system. Cost need not just include the purchase price of the server software, it could also mean incurring extra training costs for personnel if they are not familiar with what is being purchased, or indeed what it is being installed on.
45
Internet Technology and e-Commerce BEREAN I ORL
Figure 4.3
OEE
ES
Web
server architecture
|
Web Server
|
Machine
Operating System
Web Server Software
Applets/Serviets
HTML
Other Media
Server Software
» Platform — some are specific to Windows and/or Unix. » Performance — the number of requests you expect your site to handle and the speed with which they are processed may be a deciding factor. Security — the security protocols, access controls and filtering that is supported.
» Added functionality — some server software claims to posses extra features, like built-in e-Commerce functions. In fact, these considerations apply to almost all of the components you need to select in the meta-architecture of the site. There are many Web servers on the market but the big three are those from Apache, Microsoft and Netscape. Anecdotally, their notable quirks are that Apache is solid and comprehensive but more difficult to administer; Microsoft’s offerings are specific to Microsoft operating systems but have good performance; and Netscape servers can handle
large volumes of traffic and are straightforward to administer. Having said it may be the most important component, your choice of
server may also be the most important decision that you as an architect may make. Whilst following the herd is not always advisable, it may help to know that from recent surveys, almost two-thirds of the Web servers in
the world were running Apache, with around one-quarter using Microsoft (Internet Information Server, Personal Web Server) and less than one-
twentieth were from iPlanet — the stable that includes the Netscape
Enterprise Server. In terms of operating systems (OSs), the distinctions are less clear. Very
approximately, Unix-like OSs are run on half the World’s Web servers (Linux appears to be the most popular of this genre) and Microsoft OSs are run on the other half. With the popularity of open-source products for supporting the world’s Web sites, there is a neat acronym that you may 46
Internet Architecture and Protocols SS SUERTE AE SCIEN
NILES
TETAS
NSCS DOUGH
HR ENR
SSS
hear in connection with an approach to Web architecture — LAMP. LAMP stands for Linux, Apache, MySQL and the ‘P’ represents either Perl, PHP or Python. With the LAMP approach we cover the essential Web-enabling
technologies of the operating system, the Web server, the database query language and finally the application scripting language, respectively. 4.3.2
The Proxy Server We should also mention proxy servers here. For the context we are interested in, a proxy server (sometimes called a cache server) lives behind the firewall. It stores the result of requests made by clients and servers. Subsequently it intercepts other requests to determine if it can fulfil them itself. If not, it forwards the request to the appropriate server (remote or local) for processing. Proxy servers have two main purposes:
They can improve performance because they save (cache) the results of all requests for a certain amount of time. Consider the case where both user A and user B access the Web through a proxy server. User A requests a certain Web page on a remote Web server that is not in the cache. The proxy server notices this and saves a local copy of the page that is returned. Some time later, user B requests the same page, but
instead of forwarding the request to the remote Web server, which takes time and costs money, the proxy server simply returns the page that it has already fetched for user A. This is fine for static pages that change little over time (particularly since the cached version expires after some period of time), but if the author of the page changes the content between users A and B requesting the page, user B will end up looking at out-of-date content. Most browsers allow users to select the degree to which they rely on previously visited material and even cache pages themselves on local hard disks to supplement the contents of the central organisational proxy server.
They can also be used to filter requests. For instance, a company might use a proxy server to prevent its employees from accessing certain sites. In fact, this aspect of the proxy server is an important component of the firewall concept we mention in Chapter 13 on e-Security.
4,4
Protocols We have already mentioned that one aspect of architecture is deciding on the constituent parts of a system, whether that be a building or a digital technology. However, in order for us all to agree on how such components communicate/interoperate effectively, we need protocols.
Returning to our building analogy, imagine the architect had decided on a 100mm mains water pipe coming into the building, but the water company supplying that part of town had run a 300mm pipe up to the
47
Internet Technology and e-Commerce SCE
IEE SOIR
TINIE
EEE
OT
boundary of the building site after the 100 mm pipes had been installed. On discovering the mismatch, the site manager would have to order a reducer (incurring extra cost and time) and the building may be left without enough water volume and/or pressure as a result. In other words, the architect and the water company needed a protocol (an understanding) to help them make design decisions about the interface between the two systems. In the context of computer networks, a protocol is a set of formal rules describing how to transmit data between two or more communications devices or computers. In practice, we often consider network protocols to exist in a hierarchy and each level in this hierarchy we call a layer. Lowlevel protocols in the hierarchy define the electrical, optical and physical standards to be observed, whilst high-level protocols deal with data formatting, the syntax of messages, character sets and the sequencing of messages. There are a multitude of different network protocols and we do not intend to deal with this large subject area in any great detail. Instead, we shall look at those commonly found in e-Commerce. 4.4.1
The TCP/IP Protocol Suite TCP/IP TCP (Transmission Control Protocol) and IP (Internet Protocol) were
originally envisioned as a single protocol, but they are in fact two different protocols that are often used together on the Internet — hence TCP/IP or ‘TCP over IP’, referring to the aforementioned hierarchy where TCP exists at a higher layer than IP. In practice, and rather confusingly, the TCP/IP protocol suite actually refers to a large collection of protocols and applications, which is usually called simply TCP/IP. HTTP
The Hypertext Transfer Protocol is the basis for exchange of information over the World Wide Web and exists at the top level of the TCP/IP protocol suite as a so-called application layer protocol. You may recognise
this acronym from your Web browser as it is pre-pended to the URLs you request. Other protocols at this level are FTP, Telnet, SMTP and POP, to
name but a few. There are also security protocols (e.g. SET and SSL) which we will mention in Chaptyer 13 on e-Security. These live in an optional layer above TCP and below HTTP. Let us consider a (highly simplified) scenario involving two computers communicating on a network (or the Internet) — see Figure 4.4. Computer A sends an HTTP request to computer B (e.g. the user selects a hyperlink on a Web page using a browser — the browser is in fact an HTTP client). Before the request is sent, TCP on A takes this message, breaks it up into 48
Internet Architecture and Protocols SLIVER TA
A SERENE
BPR RH a
AS MURR
ASS
Figure 4.4 TCP/IP stack Computer A
Computer B
HTTP Client (browser)
HTTP Server
Request
Reply
Request
Segments
Reply
Segments Packets transferred on
physical condition
ordered segments and adds its own headers (binary codes that tell B various things such as the order this segment comes in the message, etc. ) around each segment. Then IP on A adds its own headers around the segments, turning them into packets. Each packet is sent down the wire to B where IP on B strips off the IP headers. One of these tells it to send the resulting segments to TCP. TCP now strips off its headers. One of these headers tells it that it needs to send the bare message portions in a prescribed order. Another header says that the destination of this ordered stream of bits is the HTTP server — namely the Web server. The HTTP server then sends back its response with the requested data (i.e. the HTML code for a Web page). This descends the protocol stack, being split up and having the TCP then the IP headers added. It arrives at B to have the headers stripped as it ascends the stack and eventually arrives at the HTTP client. The browser interprets the HTML code and displays the page. PHEW!! Did you ever imagine there was so much involved in clicking on a hyperlink? The amazing thing is that this happens over vast distances, involves large amounts of data and often appears to happen instantaneously and without mishap. Please note that this description is highly simplified, but should give you a feel for the issues. In fact the TCP/IP protocol stack is usually a four layer model. The physical connection is the fourth layer and its protocols include Ethernet, token ring and ISDN. The advantage of using a suite of protocols in this manner is analogous
to the use of tiered architectures — so that we can provide modularity in case we need to change one aspect in the hierarchy. 4.4.2
IP Addresses
Having mentioned the IP protocol, we should mention one of its most important header types — the IP Address. Contained in the headers are the
source and destination addresses, typically written as a sequence of four numbers. Since the values are separated by periods, the notation is called dotted decimal.
49
erce SSRI
A sample IP address is 137.195.150.50. This address belongs to the main Web server at Heriot-Watt University in Scotland. As it happens, a block of such numbers beginning 137.195.xxx.xxx has been allocated to HeriotWatt. Within Heriot-Watt there area number of sub-networks. One such sub-network, owned by an academic department, has its own block of numbers starting 137.195.52.xxx. Some of the remaining numbers in this sub-network refer to individual machines in the department. 4.4.3
The Domain Name System While IP addresses are wholly numeric, most users do not memorise the numeric addresses of the hosts to which they attach; instead people are more comfortable with host names. Most IP hosts, then, have both a numeric IP address and a name. For instance, the IP address we
mentioned previously was 137.195.150.50, which also has the host name
www.hw.ac.uk. While this is convenient for people, the name must be translated back to a numeric address for routing the packets of data around the Internet. To handle these names, the Domain Name System (DNS) was created.
The DNS is a distributed database containing host name and IP address information for all domains on the Internet. There is a single authoritative name server for every domain. This contains all DNS-related information about the domain; each domain also has at least one secondary name server that also contains a copy of this information. Several root servers around the world maintain a list of all of these authoritative name servers. When a host on the Internet needs to obtain another host’s IP address based upon its name, a DNS request is made by the initial host to a local name server. The local name server may be able to respond to the request with information that is either configured or cached at the name server. If the necessary information is not available, the local name server forwards
the request to one of the root servers. The root server then determines an appropriate name server for the target host and the DNS request will be forwarded to the domain’s name server. Once the numeric IP address is known and the packets have been sent out onto the Internet, various pieces of hardware, called routers, handle
the paths that the packets take based on their knowledge of the Internet. Since there is generally more than one route between any two IP addresses, the Internet can continue to operate even if some parts fail.
4.5
Capacity Planning So far, you might be under the impression that all we have to do to get your site off the ground is buy some hardware, install and build some software and wait for the orders to roll in. If only it were that simple! In developing a site you are exposing your hardware, networks and
application to millions, even billions of people. What would happen if
50
Internet Architecture and Protocols reac
noeesittom ant
SRO
LATERM RELIES ESTERS CLALIT EE
they all wanted to use your services at the same time? The answer is almost certainly that your architecture would slow down and probably fall over (i.e. break). From a commercial perspective, your customers will lose trust in you if the site’s performance is poor or services become inaccessible. If they lose trust, you lose business. A useful analogy might be to imagine a large brick-and-mortar supermarket during a busy period with only one checkout open. The queues and frustrations become unbearable. Therefore, we need to think about planning the capacity of our architecture so that it can handle the number of users that we expect without breaking or slowing down too much. But how? Firstly, let us consider the notion of redundancy, which occurs everywhere. For instance, we have two eyes and ten fingers, our cars carry a spare wheel, graphical and audio elements in Web pages often have alternative textual descriptions, etc. Yet, as well as providing a back-up in case there are failures or weaknesses in systems, we often observe emergent properties arising from such redundancy: two eyes give us stereoscopic vision, multiple fingers give us dexterity, alternative text in Web pages helps indexing and supports people with certain disabilities. Indeed, the Internet itself incorporates much redundancy, where data can take
numerous routes between client and server depending on circumstances. We'll return to emergent properties (as they emerge) in the chapter summary. Returning to our own site’s architecture, we have to consider the
possibility that we may need more than one Web server, database server, etc. If we had multiple servers, we could spread the load amongst them and if one fell over, the others hopefully could cope until the broken one is fixed. However, we would need to make sure they all possessed the same functionality and data and, as such, act and appear as one system to the user. The commonest approach to achieving cooperation and distribution across such servers is to use clustering. So what is clustering apart from a collection of servers? Well, clustering is a technology that has two core aspects. The first consists of customised operating systems, compilers, and applications. This means that you need to proactively take advantage of the technology as you develop your
applications. The second aspect is the hardware connections between the servers in the cluster. These need to be carefully designed since normal networks and cables would only introduce bottlenecks that would undermine the whole point of clustering in the first place. Once these two aspects have been dealt with, clustering software will attempt to balance loads and assign client requests as they arrive. An exercise at the end of the chapter asks you to research clustering products. In practice, many companies aim for 100 per cent redundancy. This means that companies know they can handle customer requests while they
take down half the architecture (perhaps hosted at a different physical geographic location for disaster recovery reasons) and conduct updates, maintenance and upgrades. In addition, companies often demand very close to 100 per cent availability. With sufficient redundancy and
51
Internet Technology and e-Commerce stents
eM
EREDAR
ERE
NI
thorough testing, this availability is generally achievable (in theory at least). Another exercise in this chapter encourages you to investigate capacity testing tools. For many Web infrastructure components, e.g. servers, the load they experience is related to the amount of online business the company is executing. This volume of business can be quantified by Natural Business Units (NBUs) — sometimes called key volume indicators or forecasting
business units. For instance, the number of registered users, the number of concurrent users at a certain time of day, the number of products on offer,
and so on could all constitute NBUs. However, the NBUs change over time as the market grows or environmental factors take hold. For the latter, a new marketing campaign (from within or from a competitor) might affect trade. Alternatively, government, media or stock exchange announcements can impact NBUs.
Therefore, having analysed and predicted demand from historical logs and via consideration of planned changes in NBUs — and having decided on availability and performance criteria — it is possible to specify necessary infrastructure component requirements, e.g. servers. However, building in reliability and performance can be expensive, so care should be exercised when ramping up capacity to ensure that there is a sensible business case and not just a technical whim. Not surprisingly, if you plan to outsource your application and data hosting, one major concern for you should be the availability and performance of the service. We cover this in some detail when we discuss law and ethics, but suffice to say, one aspect of the contract with the supplier should be agreed quality of service metrics. These are often called service level agreements. Finally, things do go wrong. Unforeseen circumstances may mean that there is poor performance or downtime. If this does happen, it makes good business sense to conduct some public relations and apologise to customers, advising them when the situation will be rectified. There may even be a case for enforced downtime if performance is too poor.
4.6
Beyond the Client-Server Paradigm Here, we shall briefly review some of the competing paradigms that yet may rival the dominance of the client-server approach to network architecture. Let’s begin by considering a few scenarios.
Imagine I can read/write data from/to your PC and you can do the same on my PC. We agree that we can do this with each other, subject to certain limits on the amount of data and where it is stored, and
install software that enables us to do this. Now, instead of a server mediating the process of data sharing, we have two like-minded people/machines (so-called peers) that are in control. This is Peer to Peer (P2P) computing such as implemented by Napster and Gnutella. We mention Napster again when we discuss ethics and law.
52
Internet Architecture and Protocols ANTENA
EME
NEON
LATA LARSEN
LL EOE NE OES
But that’s just data sharing. Imagine we could use the power (not just the data storage) of many small machines to tackle a really hard and time-consuming computational problem. This is sometimes called Internet computing. An example is the SETI@home project. Now imagine we can take the notion of Application Service Providers (ASPs) one step further so that, instead of using the service of one
ASP in isolation, we can plug together the services from a number of them and produce an integrated, customised, end-to-end B2B infrastructure for our organisation. These services are called Web Services. Finally, imagine we have scientific instruments that can be controlled by remote users and which can make their results visible to remote users; databases of results gleaned from such instruments and elsewhere; and powerful computers that can analyse the data of the instruments and the databases. Now picture a technology that can help us string together and schedule tasks that run across all these resources.
This technology, essentially a sophisticated middleware, can even find out what resources are available to use at a particular time and can bill you for any resources you use. This final scenario captures some of the concepts that are behind the Grid — a word that is analogous both to the electrical power grid and the World Wide Web. These example scenarios and allied technologies are still evolving and may yet converge. In essence they represent a paradigm that allows
computing to become even more ubiquitous, distributed, heterogeneous and interoperable. Exciting times!
CHAPTER SUMMARY When we talked generally about architecture, we mentioned the management of complexity. We achieve this by looking at the meta-architecture of the site and breaking it down into components that we can consider separately. Having done so, we, as architects, can decide if the service that these components supply should be built in-house, bought off-the-shelf from someone else or outsourced completely in terms of
maintenance, configuration and hosting. Issues of in-house expertise, cost, speed, reliability and risk will help us make such decisions.
Next we considered the ways in which these components fit together —
how they communicate. This we achieved through employing well known and well understood protocols. Many of these protocols are build into the
applications and components we either build or buy. This means we do not need to worry too much about the low-level implementation of them, but we should understand the mechanisms they employ.
53
Internet Technology and e-Commerce Furthermore, by putting individual components together we would hope to witness some degree of synergy. By synergy we mean that the whole is greater than the sum of the individual parts. Consider water (H,O). If you had a bucket of hydrogen on its own and another bucket
containing just oxygen, you could neither quench a thirst, wash a car nor grow a tree. However, if the two are chemically combined (this is the equivalent of our protocol) to produce water, we get a fundamental enabler for life on this planet. Such synergy usually results in emergent properties. In the case of water, the properties that emerge are taste, wetness, etc. So as e-Architects we have put various components together and created the infrastructure to enable a Web site to be hosted. In isolation,
hardware, software, cables and protocols would not allow this to happen. But put these together correctly and our emergent properties are that the site becomes maintainable, scalable, secure, fast, cost effective and
reliable.
1
Find an online resource that will give you an up-to-date picture of the relative usage of the different Web server software products. Date and quote the most recent figures taking account of (and describing) the distinction between active and inactive servers. Search terms you may wish to employ are: survey; Web; server; apache; iss; iplanet.
2
We have already noted the popularity of Apache server software. Now write a short article (not more than 200 words) about the
Apache project, specifically answering the questions: what is the project about; its history; its development methodology; and its justification for being freely distributed.
3
Write a one-sentence description about each of the following application layer protocols: FTP, POP, Telnet, IMAP.
4
Find a service on the Web that will convert IP addresses to host names and vice versa. Use it to convert three domain names that
you use on a regular basis and record their IP addresses. 5
Find and critically compare at least two clustering products.
6
Find and critically appraise at least two capacity planning tools
(sometimes called network simulators) that can test your site and look for problems.
54
Internet Architecture and Protocols UHR
8
FURTHER READING
ee
RL,
:
Deitel, HM, Deitel, PJ and Nieto, TR (2001) e-Business and e-Commerce: How to Program, Prentice Hall (Chapter 8).
Eaglestone, B and Ridley, M (2001) Web Database Systems, McGraw-Hill. Foster, I, Kesselman, C and Tuecke, S (2001) “The anatomy of the
Grid: enabling scalable virtual organizations’, International Journal of High Performance Computing Applications, 15(3), pp. 200-22. Menascé, DA and Almeida, UAF (2002) Capacity Planning for Web Services: Metrics, Models and Methods, Prentice Hall. Turban, E, King, D, Lee, J, Warkentin,
M and Chung, HM
(2002)
Electronic Commerce 2002: Managerial Perspective, Prentice Hall, NJ (Chapter 12).
Turban, E, Lee, J, King, D and Chung, HM (1999) Electronic Commerce: A Managerial Perspective, Prentice Hall, NJ (Chapter 11).
55
Languages of the Internet (Qi zmavrin CHAPTER
OVERVIEW
In order to develop and understand e-Commerce applications you need an awareness of the different Web programming languages that can be used. In this chapter we give an overview of some of the commonly used languages and technologies, and: Explain the implementation, applications and relative merits of various Web programming/scripting languages. Differentiate between client- and server-side technologies. Introduce cookies and XML. In the following chapters we look more closely at some of the most important and widely available languages, enabling you to develop interactive Web services.
5.1
Introduction When the Web first appeared it provided a static set of pages that contained information. So a page might have contained hyperlinks that pointed at other pages. At the same time we had applications running on our own computers that exchanged data over networks. This data was often dynamic, in other words up-to-date and constantly changing. Therefore, the Web was missing an opportunity to exchange and display
dynamic data. It did not take long before we realised that there was a need to be able to create Web pages (and elements of Web pages) on the fly — in real time. For example, imagine a static page displaying share prices. The content of the page would have to be constantly updated by someone typing in the
new data to the HTML file, saving the file and then starting over again
until the markets closed. Even if this pointless task were carried out, the
page’s users on remote client machines would be endlessly hitting refresh/
reload on their browser interface in order to see the latest data. The preferred alternative would be for the client to access a Web page that is
56
Languages of the Internet ar
ROSS
LRT ROMAN SENSE NADA ABNER
SEER MBS ARES
itself constantly and automatically refreshing the data from a real-time database. Another limitation of the static Web page is its inability to allow the user to communicate. Imagine a situation where we had text boxes, radio buttons, etc. in a Web page form, and there was no facility to process what the user had entered — where do the data go and what actions are performed on them? In order to introduce the necessary dynamism, we needed to use computer software to process, control, access, report, format and filter data. This was the motivation for using the Web programming languages we discuss here. This introductory section is not intended as a comprehensive discussion of PHP, Perl, ASP, ColdFusion, JavaScript, VBScript or Java Applets and Servlets. All we hope to achieve is a raised awareness of where such languages are deployed, what they might look like at the most basic of levels and their relative merits. In the following chapters we discuss some of the core technologies in more detail, so that you are able to develop simple but complete interactive applications.
D2
The Client and Server Sides of Web Programming If we consider an oversimplified topology of the Internet’s client-server architecture, as in Figure 5.1, then the Web browser lives on the client machine and the Web server lives on the server. The client then requests services (e.g. ‘Please send the contents of an HTML file’) and the server
sends them back. The browser on the client then displays the data. The technologies that allow us to create dynamic Web pages can live on either the server or client. Client-side technologies are executed in the browser on the client machine and the programs that use such
technologies are included in the HTML files the server sends. Server-side technologies run on the Web server and can send their outputs to the client for display. Figure 5.1 Simplistic client-server architecture over the Internet
57
Internet Technology and e-Commerce REE
RA REEL
LE LETTE
One advantage of client-side programming for the Web is that, once the page has been sent to the browser, no additional communication may be necessary with the server. This avoids the delays associated with a slow network connection. However, the main disadvantage is that more responsibility is given to the client machine. If the client is slow or the browser is old (and therefore often incompatible with the necessary client-
side technologies) the page may not display as expected. On the other hand, if the process is executed on the server side, the developer has more control over how the code behaves and is less at the mercy of the browser in use on the client machine. The downside is that developers have to be sure that their service is scalable, such that it can
handle multiple requests to run a program and not cause the service performance to degrade significantly.
5.3
Server-Side Scripting Languages In this section we shall look at the scripting languages that run on the server side. We deal with Perl in a separate chapter (Chapter 7), so we only touch on it here, along with PHP, and to a lesser extent ASP, ColdFusion
and SSI. Let us start by mentioning the Common Gateway Interface. 5.3.1
Common Gateway Interface The Common Gateway Interface (CGI) is a standard feature of many Web servers. CGI specifies how to pass arguments to a program via a Web
server and defines a set of environment variables that will be made available. This allows us to invoke the program from the Web, and pass it the necessary inputs. The CGI program can then, for example, access information in a database on the server and format the results as
HTML, returning this to the client browser. A CGI program can be any program which can accept command line arguments. Perl is a common choice for writing CGI scripts, although C/C++ programs are often used. The CGI programs often reside in a sub-directory of the directory containing the HTML files. For instance, you may have a Web directory called ‘.../~username/www’ and place your CGI programs in ‘,../~username/www/cgi-bin’. Whenever the server receives a CGI execution request it creates a new process to run the program. If the process fails to terminate for some reason, or if requests are received faster than the server can respond to them, the server may become swamped with processes. In order to improve performance, Netscape devised NSAPI and Microsoft developed the ISAPI standard, which allow CGI-like tasks to run as part of the main server process, thus
avoiding the overhead of creating a new process to handle each CGI invocation.
58
Languages of the Internet EN
De Dae
LSA
ERNE
NH
NE
ER
Perl
Perl is a widely used and flexible server-side scripting language, particularly popular for writing CGI scripts. To run a Perl script your Web
server needs to have certain Perl binaries installed and be configured appropriately. However, Perl is Open Source software, freely available for most platforms, so the chances are that your Web server will be set up appropriately. Perl is particularly useful for string manipulation, with powerful string
matching methods built in. It also has freely available program libraries (modules) for many of the common tasks needed for Web scripting. As a simple example of some Perl in action, consider Code 5.1, revisited in Chapter 7. This outputs a simple HTML ‘Hello World’ page, as displayed in Figure 5.2. CODE 5.1 Script
Perl i #!/usr/local/bin/perl ' print "Content-type: text/html\n\n"; } print "\n"; { print "\n"; : print "My First Perl program\n"; : print "\n"; i -print “\n"> : print "Hello World!\n";
: : :
' : : | :
| print "\n";print "\n"; ee
Figure 5.2 Perl script running in a browser
fq My First Perl program-... File
Edit
View
PIES
Favorites ** »
~
-
Linke
Back
Hello World!
a
‘> Internet
59
Internet Technology and e-Commerce 955
SNE
IESE
ERSTE
SFe le
RAN CREAR
MTSE ESS
ES
TIS
PHP
PHP stands for ‘PHP Hypertext Preprocessor’. It is a server-side, cross-platform, HTML-embedded scripting language used to create dynamic Web pages and is open source. The significant term here is HTML-embedded. This means that we can write PHP code directly into our HTML files — see the Hello World example in Code 5.2, which produces the same outputs as Code 5.1 (see Figure 5.2) except for the different titles. CODE 5.2
5
PHP
=
script ii
:
My First PHP program
Note that, for PHP to run, the Web server needs to have certain files installed and be configured to parse files with certain extensions (e.g. php or phtml) appropriately. An interesting issue that arises with PHP is that of separating the presentation code and the process logic (e.g. business rules). In some organisations there will be people who specialise in making the site usable and attractive and others who specialise in getting the applications to talk to each other, crunch numbers and communicate with databases. If both
aspects are combined in the same file, ownership, responsibility and concurrent development becomes difficult. For this reason, PHP applications often separate out the presentation elements and business logic elements of the code into separate files. 5.3.4
Active Server Pages Let us briefly mention Active Server Pages (ASP). These essentially mirror the implementations of JavaScript and VBScript that we mention shortly but, instead of being processed on the client side, they are processed on the server side. Interestingly, whilst JavaScript is more popular on the client side, VBScript appears more common in the ASP world. One advantage of this approach is that, since the processing is being
done on the server side, we are not at the mercy of the vagaries of different browsers and have better control over how our pages appear. Generally,
ASP only run on Microsoft Web servers, like IIS or PWS, since they have the requisite functionality. In addition, ASP files usually have a ‘.asp’ extension so that the server knows to parse them appropriately and not treat them as conventional HTML files.
60
Languages of the Internet ‘AER
ae Te)
RE
RIO
A
ColdFusion
Another brief mention goes to ColdFusion, which is a proprietary software product from Allaire. It uses its own markup language (CFML) that the Web server needs to be configured to recognise. In effect, we are embedding specialist markup tags in our HTML - the files often have a .cfm extension. Primarily, ColdFusion simplifies the integration of Web pages with databases, so we could have a database query inside tags, then output the results inside a tag pair, for instance.
J.d.0
Server-Side Include (SSI) SSIs are commands embedded in HTML documents. They direct the Web server to dynamically generate data for the Web page whenever requested. The basic format for an SSI is . You may recognise the comment tags around the SSI. These are required for servers that do not support SSIs, so the instructions will simply be ignored as comments. The #command can be one of several instructions to the server, the simplest being #include, which inserts the contents of
another file. This is especially useful for ensuring that standard components, e.g. headers and footers, are the same on all pages throughout a Web site. To change a standard element, you need only modify the include file, instead of updating every individual Web page. SSIs can also be used to execute programs and insert the results. For HelloWorld: CODE 5.3 Server-Side Include
RS
ee
ee
|
.
git
as
My First SSI