Internet Technology and e-Commerce [1 ed.] 0333989996, 9780333989999


196 100 14MB

English Pages [260] Year 2004

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Internet Technology and e-Commerce [1 ed.]
 0333989996, 9780333989999

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Grassroots Series =

a

[=

Pa

/ eer

BS:

Fa Ras

aaa =e

we

e

ie

ee ase

-

E

. =

= Bs

z 4

Alison Cawsey & Rick Dewar

fy

Internet

Technology and e-Commerce

x

Burton

College

Library Tel: 01283 494431/2

|BURTON COLLEGE | ACCIBC No

FOCEY

| Be. OSy

CLASS No

CAH

Internet Technology and a »0ks 9~°

‘ » be returned on or before

+ date -&---

Alison Cawsey and Rick Dewar

Qfave “ macmillan

© Alison Cawsey and Rick Dewar 2004 All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission. No paragraph of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, 90 Tottenham Court Road, London W1P 9HE. Any person who does any unauthorised act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The authors have asserted their rights to be identified as the authors of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2004 by PALGRAVE MACMILLAN Houndmills, Basingstoke, Hampshire RG21 6XS and 175 Fifth Avenue, New York, N.Y. 10010 Companies and representatives throughout the world PALGRAVE MACMILLAN is the global academic imprint of the Palgrave Macmillan division of St. Martin’s Press, LLC and of Palgrave Macmillan Ltd. Macmillan® is a registered trademark in the United States, United Kingdom

and other countries. Palgrave is a registered trademark in the European Union and other countries.

ISBN 0-333-98999-6 This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. A catalogue record for this book is available from the British Library.

10 TSS

ey 7 6 LZ 110095

Printed in China

Sale lary OS a C7

3 2 OG ne OS

1 04,

Dedication

For Sophie and Andrew For Finlay

AJC RGD

Between them they kept this book short.

t.

Cpe

‘hele: sid be Ma

Shears 2

oe

Wii, a +

ra

‘a

thitteacne

SO

Leola

a ides

A

i

aed

Cae

abet

Ce GPRcin Se ari: nc entala ee DeeCet re

ae

Zag me

ee merece Me Oe,

on

“he ee Wt by noted PVM tl

mth OTe

O-eagnone Mampenen RO! 60 cog! Sen,

[rn

=

Nee Vek, WALD -

ot. TeOV ee codves Orns Gee eres

ene Resi gr pe cman of mates ag Dar Petenien® § «repre mete me Pe nities an ae VER owl Sitver cvyr eee

rt

fan ernie tet yon Rte ~

® tg

Ae O51 -FS o

ide

:

ihn Seek & areas cn glue antieisaeaiamanma te

Siriaas vel Adaline? fomet pei

raceett fv nw Dockieslic Ain Pit get ee

wo



inet © Cin

7 le. h.se

é

2 «

_ "

& zal hun

a =



=

eae 2

=

ss

= 2

aes —

ct

Contents

Preface

Chapter 1

xl

Introduction Chapter Overview

rl pe

Introduction The e-Commerce Transaction

Further Reading

Chapter 2 a, 22 233 2.4

Chapter 3 Jil 332 383 3.4 Ps) 3.6

SS NO OO

HTML Chapter Overview Introduction to Markup Languages and HTML Managing the Presentation: Stylesheets Managing Interaction with the User: Introduction to Forms Getting it Wrong: HTML Markup Pitfalls Chapter Summary Exercises Further Reading

19 D2 oe) 23 25

Web Site Design and Accessibility

26

Chapter Overview Introduction Design Considerations Design Communicates Trustworthiness Accessibility Generating the Pages Validation and Testing Chapter Summary Exercises Further Reading

26 26 26 By 3h) 36 36 37 37 38

14

Contents See

eed

Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6

Chapter 5 Del Doe oka) 5.4 a 5.6 Def 5.8 Do

Internet Architecture and Protocols

39

Chapter Overview Introduction Network Architectures Web Site Meta-architecture Protocols Capacity Planning Beyond the Client-Server Paradigm Chapter Summary Exercises Further Reading

39 39 4] 43 47 50 52 33 54 55

Languages of the Internet

56

Chapter Overview Introduction The Client and Server Sides of Web Programming Server-Side Scripting Languages Client-Side Scripting Languages

Further Reading

56 56 ad 38 62 63 66 67 67 67 68 68 69

Client-Side Scripting in JavaScript

70

Chapter Overview Introduction to JavaScript Document Objects Event Handlers Animating your Documents: DHTML

70 70 76 81 84 89 92 92 94

Java

Flash Cookies XML Hybrid Approaches Chapter Summary Exercises

Chapter 6 6.1 6.2 6.3 6.4 6.5

Cookies in JavaScript

Chapter Summary Exercises

Further Reading

Contents ULE ELE

Chapter 7 veal (ee) (pe 7.4 Jao) 7.6 od 7.8 73 7.10

Chapter 8 8.1 8.2 8.3 8.4 shoe,

Chapter 9 9.1 vB 953 9.4 eS: 9.6 OF

Server-Side Scripting in Perl

TINE,

95

Chapter Overview Introduction Getting Started Variables and Arrays Controlling the Script’s Execution File Handling String Operations Built-in Perl Functions A Simple Script for the Web Passing Values to the Script using CGI.pm Cookies in Perl Chapter Summary Exercises Further Reading

95 7) 96 Hh 100 105 107 108 108 111 114 115 ‘Ns 116

XML: eXtensible Markup Language

117

Chapter Overview Introduction to XML Defining Document Types

Displaying Documents: Stylesheets Substituting and Inserting Text Manipulating XML Documents: JavaScript and the Document Object Model Chapter Summary Exercises Further Reading

117 Li? al 27 PSD fy 143 144 145

Case-Study: A Simple Shopping Site

147

Chapter Overview Introduction Design of the Site Using JavaScript to Check Form Submissions Using Perl to Handle Requests Using Cookies to Keep Track of Purchases Using XML for Product Lists Business and Practical Issues Chapter Summary Exercises

147 147 148 152 152 154 156 158 158 158

Contents _ Semmens

Chapter 10 10.1 10.2 10.3 10.4 10.5 10.6 10:7 10.8 10.9 10.10 10.11

Chapter 11 Nel Lie2 13 11.4 Ii 11.6

Chapter 12 IZA Lae Lae 12.4 12.5 1226

e-Business Models

160

Chapter Overview Introduction Storefronts and Malls Auctions Portals and Community Sites Dynamic Pricing

160 160 161 162 163 163 165 165 167 169 169 170 170 170 171

Barter

The Virtual Organisation Application Service Providers Extranets

Industry Consortia B2B Trading Hubs Disintermediation and Reintermediation Chapter Summary Exercises Further Reading

e-Marketing Chapter Overview Introduction The 4 Ps of Marketing So What is e-Marketing? Doing e-Marketing Globalisation Sticky Sites Chapter Summary Exercises Further Reading

e-Payment Chapter Overview Introduction Offline and Online Payment The Online Credit Card Process A Word on EDI and EFT The Debit Card Process Alternative Payment Methods Chapter Summary Exercises Further Reading

172 172 172 Wis: 173 174 179 180 180 180 181

182 182 182 183 183 184 185 185 189 189 191

Contents enantio

Chapter 13 toe Rie Low 13.4 13.5 13.6 Loh 13.8 13.9 13.10 13.11 ie a 1.153

e-Security

192

Chapter Overview Introduction Security per se The e-Security Issues Encryption Cryptography Key Protocols Digital Signatures Timestamping Digital Certificates and Certification Authorities Public-Key Infrastructure Security Protocols Network Security with Firewalls

Further Reading

Nee Hz 192 ro5 196 197 21 201 203 203 204 204 205 206 207 207 209

e-Law and e-Ethics

210

Directory Access Control

Chapter Summary Exercises

Chapter 14 14.1 14.2 14,3

Chapter 15 15s], 152 5B

Chapter Overview Introduction

Chapter Summary Exercises Further Reading

210 210 2d 218 a9 220 221

e-Methodology

222

e-Law e-Ethics

Chapter Overview Introduction The System Development Life Cycle Methodologies Chapter Summary Exercises Further Reading

phn SEW) 229 224 229 229 230

Glossary

231

Index

239

Preface

This book provides an overview of key technologies for the World Wide Web and important issues in e-Commerce. It is intended to get you started, so that you have the confidence to try things out, and to give you an awareness of what can be done and what must be taken into account when designing a site. We don’t introduce the latest Web development tools. To do so would guarantee that this book would be out of date in two years. Our aim instead is to provide a base from which you will be able to understand new tools and techniques as they emerge. By developing an awareness of all the issues surrounding e-Commerce you should also be able to use new tools and technology appropriately. The book is based on the authors’ experience in teaching introductory courses on e-Commerce and Internet technology at Heriot-Watt University. We hope that it will be useful to students on similar courses, and also to anyone wanting a fairly quick overview of some of the key technologies and issues. To learn as much as possible from the book it is important that you try

things out. You also need to be able to find up-to-date sources of more detailed information. We have therefore included a variety of exercises in the book: some practical exercises on aspects of Web site development, and some exercises asking you to find out about current Web-based resources. By researching these for yourself you will be able to find and evaluate Web sites that are interesting to you or culturally relevant. We do make some suggestions for further reading in each chapter, including some key Web resources. However, you should be aware that to develop professional skills in this area you should be consulting the latest standards and the texts that support these. The book is split loosely into three sections. Chapters 2—4 introduce HTML, Web site design, and the architectures and protocols of the Internet. These chapters provide basic background material, and should also enable you to get quickly up to speed in developing simple static

Web sites. Chapters 5-9 build on from this, looking at the languages associated with the development of dynamic, interactive Web sites. Chapter 5 introduces a range of Web programming and scripting languages and how

they are used. Chapters 6-8 build on these and provide more detail on some basic technologies: client-side scripting in JavaScript; server-side

Preface serra terre

scripting in Perl; and XML (eXtensible Markup Language). Practical experience in these languages should provide you with a much better understanding of how dynamic Web sites really work.

Some of the technologies discussed in Chapters 5-8 are revisited in

Chapter 9, which presents a simple e-Commerce case-study. A shopping

site is developed using client- and server-side scripting. In terms of the technologies used it may seem old fashioned, but we aim in this chapter to show you how we can bring together different technologies in the development of an actual site. By seeing how different client and server-side technologies work together you should be in a strong position to learn and use appropriate new tools and technology as they emerge. Chapters 10-15 discuss key issues in e-Commerce. Chapter 10 (e-Business Models) looks at different ways in which e-Businesses can

arrange themselves and some of the options available to them. Chapter 11 (e-Marketing) considers basic marketing in the real and virtual worlds, but goes on to cover some of the marketing initiatives that are specific to e-Commerce. Chapter 12 (e-Payment) covers a selection of online

payment card and Chapter security,

options that are available as alternatives to the ubiquitous credit explains the difference between B2B and B2C payment. 13 (e-Security) begins by considering the wider context of then looks at cryptography in some detail. Issues of security

protocols and architecture are also discussed. Chapter 14 (e-Law and e-Ethics) considers the legal liabilities of the e-Commerce company and looks at the role of the contract in this context. The chapter also covers some of the ethical dimensions of e-Commerce. Chapter 15 (e-Methodology) attempts to pull together issues presented in other chapters and discusses the likely methodologies for developing online

applications with reference to development life cycle models. A short glossary is included at the end of the book, covering some, but not all, of the many technical terms and acronyms that you will encounter. We indicate terms that occur in the glossary by putting them in bold when they first occur in the text. All the chapters include exercises. Working through these exercises will test and broaden your understanding. In general the exercises in Chapters 5-9 are very practical, developing basic Web programming skills, while the exercises in Chapters 10-15 involve doing some research on the Web to broaden your knowledge of the area. There are a number of possible routes through the book, perhaps emphasising technical or e-Commerce issues. However, Chapters 2, 6, 7, 8

and 9 should be tackled in the order given. We have kept to a minimum the special purpose languages and tools

required for the exercises and examples. Most of the examples simply require a text editor (e.g. emacs or Notepad) and a fairly recent browser (e.g. Internet Explorer 6 or later, Netscape 6, Mozilla 1.1 or later). It is

better when starting out NOT to use specialised editors (e.g. HTML or XML editors) as these tend to hide part of what you are meant to be learning. Just type examples into your text editor. Some of the server-side

Preface tneensneosenaennenasne

scripting examples require you to have access to a Web server, and permissions to place files in an appropriate cgi-bin directory. They also require a Perl interpreter — but these are free and widely available for most platforms. Occasionally we make assumptions about the particular browser or platform you are using, and of course new browsers are always emerging. We will aim where possible to provide alternative versions of important examples on the book Web site. This book assumes relatively little prior knowledge. We assume that you are familiar with browsing the Web, and know how to use search engines to find information. We also assume in certain parts of the book that you have done some programming before, although we make no assumptions about the language. We hope that by working through this book you will reach a point where you have enough practice in core Internet technologies, and enough understanding of the issues in modern e-Commerce, that you will then be

able to rapidly pick up the more specialised skills that may be required for a commercial project. There is a book Web site at: http://www.macs.hw.ac.uk/~alison/netbook/

This contains the code examples from the book, links to key Web resources, and any errata. Please feel free to contact the authors with any suggestions or comments.

Introduction

IrN UP HA ppp «8m

CHAPTER OVERVIEW This book introduces key issues in e-Commerce, and some of the basic Internet technologies supporting e-Commerce developments. In this first chapter we set this material in context and:

Provide a little historical background. Discuss what is involved in an e-Commerce

1.1

transaction.

Introduction e-Commerce describes the buying and selling of products, services, and information via the Internet. It’s not just about online shopping, where the customer carries out a transaction with a business — so-called Business to Consumer (B2C) trade. It is also about transactions between businesses.

These are commonly referred to as Business to Business (B2B) exchanges. e-Commerce has been around a long time. In the early 1970s businesses were already exchanging data using computer networks — this was referred to as Electronic Data Interchange, or EDI. Initially private (and expensive) networks were used, but in the early 1990s the Internet started taking their

place, opening up EDI to smaller businesses and individuals. The development of the World Wide Web (WWW) in the early 1990s, with the release of the first Web browser (Mosaic) in 1993, led to the

beginnings of what we now think of as e-Commerce. By 1994 you could order pizzas from Pizza Hut online (if you lived in the right area!). e-Commerce boomed from 1995 to 2000. It allowed even small suppliers to reach a potentially enormous market, with low startup and operating costs. Money was thrown at entrepreneurs with new ideas (some with little business experience). But many businesses failed to break

even, and were living on promises. The crash came in 2000, with large businesses folding. Boo.com, for example, took hundreds of millions of investment money with it.

Internet Technology and e-Commerce Despite the crash, e-Commerce continued to grow in 2001. More and more organisations are expanding into e-Commerce, and more people are buying products over the Web. Although the future is still not clear, it does seem that we have learned from the heady days of the late 1990s, and are now beginning to take a more thoughtful approach to new developments. We’re therefore now ready to take a proper look at the issues. e-Commerce is an area where new ideas and new technology are emerging all the time. It may be tempting to follow upon the latest buzzwords, the latest tools, and try to rush in and learn the technology of the moment. However, we believe that it is better to start with a

grounding in the basics, and develop a basic competence in the core Internet technologies and business ideas that may be used in e-Commerce. Understanding e-Commerce includes consideration of the supply chain, customer service, procurement and intrabusiness tasks. It also

involves developing an understanding of and competence in the underlying technology. In this book we aim to introduce both aspects.

1.2

The e-Commerce

Transaction

We will now look in more detail at the e-Commerce

transaction, so you

can get a feel for some of what is involved. Consider a simple transaction where a list of products is displayed, and the user selects one item for purchase, then enters payment details. We can look at the transaction from a technical and from a usercentered point of view. From a technical perspective: The user requesting the Web page from his browser results in a request being sent to the server machine. The server locates the relevant page and sends it back to the user (client) machine. In general the server’s job is just to accept requests from the client, deal with them appropriately, and send responses.

That page is (probably) written in HTML. The browser works out how to display the HTML appropriately, given the browser size and settings.

The page will be displayed with product information, and (perhaps) checkboxes for the user to select items of interest. The user selects one or more items and submits his order.

Now another request is sent to the server. This time the request includes some data (the user’s selections). The server may run a program (e.g. a Perl program) to process the data, perhaps recording the order. It then sends another response. This time the response might consist of some HTML containing a form allowing the user to enter his payment details, which, when submitted, results in another request to the server.

Introduction LOLI

NN

ETSI

From a more user-centered perspective, the transaction involves an exchange between a user and the organisation selling the products. Site designers must consider this transaction as well as the technical one! They must take into account: How to design the site so it is attractive, and easy to navigate.

How to market the site, and manage payments. How to ensure that the transactions are secure, and can’t be

intercepted by a third party. How to ensure that the transaction with the user is acceptable from a legal and ethical perspective, and, related to this, how to engender trust in the user, so that he or she comes back for more.

We hope that this book will enable you to understand both some of the technologies involved in e-Commerce transactions, and the more usercentered aspects.

FURTHER

READING

Bates, C (2001) Web Programming: Building Internet Applications, John Wiley. Davis, WS and Benamati, J (2002) E-Commerce Basics,

Addison-Wesley. Deitel, HM, Deitel, PJ and Nieto, TR (2001) Internet and the World

Wide Web — How to Program, 2nd edition, Deitel. Turban, E, King, D, Lee, J, Warkentin,

Electronic Commerce 2002:

M and Chung, HM

(2002)

A Managerial Perspective, Prentice Hall.

HTML

NWO zmaAeyPr4N CHAPTER OVERVIEW The basic foundation for developing Web-based applications is the proper use of HTML. In this chapter we discuss: What we mean by a markup language. The various HTML

standards, including XHTML,

and the

importance of validation.

- The use of stylesheets to control presentation, and forms to allow user input.

Daal

Introduction to Markup Languages and HTML In this chapter you will learn about HTML: a markup language for the World Wide Web. Markup languages have been around for a very long time. A markup language is a language for annotating some text with

additional information, such as how the text should be displayed. Copyeditors have long used their own markup language to annotate books and articles prior to typesetting, while text formatting languages are based on adding additional markup to a text to indicate how it should be displayed.

Code 2.1 shows how we indicate that some text is to be displayed in a bold typeface in the LaTeX text formatting language: CODE 2.1 LaTex markup example

SSS

AE ERE

NNO

SIS

SUS TET ARIS

IRC

SL

ST

RE

Mee

Lee

eee &

Here

is an example

of some

ie

HTML, or Hypertext Markup the needs of documents which Web. As well as having special with how a document is to be

SAS

{\bf bold}

EILEEN

type. ORE

a

seea incsisidcceaeiaaianeemaeaall

Language, is a markup language geared to are to be displayed on the World Wide markup instructions which are concerned displayed, it has markup which allows the

author of the document to specify how the document is linked to others,

thereby allowing the reader to navigate from one document to another.

HTML SRR

HTML is based on SGML (Standard Generalised Markup Language). SGML provides a standard metalanguage on which a great variety of markup languages may be based. In SGML text is marked up by adding

start and end tags to label particular elements in the text. For example, the following code uses the start tag and end tag to label ‘Joe Bloggs’ as an author element:

CODE 2.2. element

An SGML

SGML provides a way of defining a particular markup language, does not specify, for example, what elements are allowed and how may be nested. A markup language based on SGML provides a set allowed elements (and allowed element nestings) appropriate for a particular application. A publishing application might, for example,

but it they of allow

an author element, as illustrated above. The World Wide Web is an

application with its own requirements. HTML is therefore defined to have elements appropriate for marking up documents for the World Wide

Web. HTML as a language has evolved as technology has developed over the last decade. HTML was invented in 1989 by Tim Berners-Lee. The original HTML was very simple (with a small set of allowed elements), but Web

users and developers rapidly started adding to it, with new elements for a range of purposes. There was a need for a standard version, encompassing new developments, but which all users could accept. HTML version 2.0 was the first attempt at this, followed by version 3.2 and then HTML 4.0. Version 4.01 (a minor update on 4.0) is now widely used and supported. However the latest is XHTML (discussed later in the chapter). The examples in this book are either in HTML 4.01 or, once we have discussed it, XHTML. These standards developments are coordinated by the W3C (World Wide Web Consortium) — an international consortium that promotes and coordinates the development of the Web. The rest of this section will briefly introduce some of the most

important elements used in HTML so you can develop simple Web pages. Much fancier looking Web pages may be developed in shorter time using

the many authoring tools now available. However, it is important to start by using a text editor and write your HTML by just typing it in. That way you can gain a proper understanding of the language, and go on to

understand and use the more advanced Web technologies.

2.1.1

Authoring Simple HTML HTML uses markup tags to indicate how a document is structured, and how elements within it should link to other files on the Web. The structure of the document in turn determines how it will be displayed in a

Internet Technology and e-Commerce

browser. An HTML file is a plain text file, and can be created using a simple text editor (such as Notepad under Microsoft Windows).

Code 2.3 is a very simple piece of HTML, which you can copy into a text editor and view in your browser. Your file can be called whatever you like, but the name must end with .html or .htm.

CODE 2.3) HTML Example 1

Code 2.3 appears as shown in Figure 2.1 when viewed from a Web browser (in this case Netscape).

The example shows how HTML tags are used to mark up the

document. Tags always start and end with angle brackets. A piece of text (or a section of HTML) is surrounded by a start tag and an end tag with the same name (e.g. My Page ). End tags are indicated by the additional slash (/). The two tags are used to define elements. For

example, the whole section surrounded by

is a body

element, containing the main content to be displayed on the page. Figure 2.1 Example 1 displayed in browser

HTML wepsortomnsermnneenem

Altogether there are five different types of element in the example: html head title body joa

p

The document starts with and ends with . These tags define the top level html element. We then have a head and a body element. The head element contains some information about the HTML file (including its title, which is a required element). The main content to

be displayed is in the body. Within the body h1 defines a heading element and p is used to indicate paragraphs. There are of course many other elements defined for HTML. There are elements for defining lists, tables, forms, and for incorporating various media elements in a page. Codes 2.4 and 2.5 show how an ordered list of items, and a simple 2 x2 table, are defined. Note how a table consists of a number of table rows (tr), which in turn consist of a number of table data cells (td). CODE 2.4 HTML ordered list

Foe

Bae

SR

ee

NNN

EONS

St

EERE EE

NESE

EE ESN

STL AUN

Le

co so “11> Pirst.ttem.:

  • Second Item.


  • Thind.ltem...

    Ss ae

    CODE 2.5 table

    HTML ,



    eta oes

    Item 1 Item 2


    Item 3
    Item 4
    a

    ee

    Elements can also have attributes. An attribute gives additional information about an element. This additional information sometimes relates to the appearance or layout of an element, such as colour or size.

    Internet Technology and e-Commerce ae eee

    REALEREUNIONS IDPS

    TETRTE

    EE

    ES

    NNER

    BNE

    The examples in Code 2.6 illustrate how attributes are associated with an element by listing attribute-value assignments in the start tag. CODE 2.6

    eee

    LLIB

    .

    at

    SIS SS OHI oP

    aU

    ie

    UNH HN

    NOI

    ae

    4

    ...



    ...



    go to my site Oe Ss SESS RSE SB SE aN Se GIES

    ae

    ee

    ee

    ee

    OR

    ET

    ee

    ee eee

    The last example in Code 2.6 is perhaps the most important. The a element denotes an anchor, and can be used to define hypertext links to other pages anywhere on the Web. The value of the href attribute is the URL (Uniform Resource Locator) specifying the address, or location, of the document to link to. You should be familiar with URLs from browsing the Web. Be aware that sometimes the more general term URI, or Uniform Resource Identifier, is used for this address. For our purposes you needn’t worry about the distinction. Anyway, whatever comes between the start anchor tag and the end anchor tag will be displayed as a hyperlink (e.g. by underlining). In the above example, the text ‘my site’ will be a hyperlink. Clicking on a hyperlink takes the user to the new location. Using HTML markup we can also add multimedia elements such as images, audio and video clips. To incorporate an image we use the img element. We need to specify the location of the image file, so it can be incorporated on the page. This is done by specifying a src attribute. We must also give a text description of the image, in case the image is not accessible (see Chapter 3). This is done by including an alt attribute, as illustrated in Code 2.7. Other attributes are available to control the size and layout of the media elements. CODE 2.7

    2pNETRA

    | |

    NR

    8

    Note that recent versions of HTML also support an object element which allows different media types to be incorporated, from images to applets, but this element may not be properly supported by older browsers. We've now covered a small but useful subset of HTML. Code 2.8 combines several of the tags introduced, while Figure 2.2 shows how it is displayed under Netscape.

    HTML

    CODE 2.8

    CEE SARTR SS E orem meee ROT

    oS SA

    EEE

    LEENA oa EE

    ET 7



    A Second Simple Web Page

    My Second Web Page



    You can

    find out more

    about me

    here, or look at this table:



    SPST ay se aa a AA EE ete



    af CLS
    Name Alison
    Age 38:

    Here’s a picture that I drew.



    i | Rese ee

    ALES

    LLL

    aN

    Figure 2.2 HTML Example 2

    My Second Web Page You can ae out more aha neha orlooket

    é ce table:

    Hass a picture thet Idrew.

    2.

    x

    |

    I NN

    NI

    Internet Technology and e-Commerce eH

    ORNL ENEROTERE

    EES,

    Before we leave the basics of HTML we should mention the comment element. You may be used to writing comments in your computer programs. Well written HTML should also be commented, so you, or

    anyone else maintaining your page, will be able to understand why your HTML code is written in a certain way. To insert the comment ‘I’m a comment.’ you would add the following to your HTML: CODE 2.9

    Obviously many more elements are used in HTML. There are also strict rules about which elements may be contained in which. To develop more complex pages look at one of the many HTML primers available. See the

    Further Reading section for some starting points. pm Bp

    Checking it is Correct The world is now full of badly written HTML, much of it not even

    obeying the basic HTML syntax rules. These documents may happen to display satisfactorily on some browsers (which are often designed to be quite forgiving), but more by chance than judgement. If it displays nicely on your browser this does not mean it will display well on another. As you can’t control which browsers people use to access your site, this is quite a problem.

    However, if you write HTML which is correct according to the standard definitions of the language you can be more confident that it will display on all browsers supporting (that version of) HTML. Before we even start we need to be quite clear about which version of HTML we are using. Correct HTML 3.2 may well not be correct XHTML. We can add to our document a declaration which states which version of HTML we are using. For HTML 4.01 this document type declaration would look like Code 2.10. CODE 2.10 DOCTYPE declaration

    (TS

    ESS

    SSR

    SNS



    SRST a ee

    PNR

    cURRITRR

    In fact there are a number of alternative declarations for HTML 4.01, depending on whether you are writing in strict HTML 4.01 or allow older features that are no longer supported (transitional), and whether you are

    10

    HTML SOSA ORIS

    using frames. The declaration given above is for transitional HTML 4.01. You should really aim for strict HTML 4.01, but note that there is tighter

    control on elements for strict HTML 4.01. For example, Code 2.11 is NOT correct according to the strict HTML 4.01 definition. In strict HTML 4.01

    there must be a title element, and the body can’t contain text which is not contained within what is known as a block-level element, such as p.

    CODE 2.11 Invalid HTML 4.01 example

    Pease

    |: f:

    oe

    PSU:

    DNS

    Nr

    ce

    nn

    %

    ::

    :

    | o _ My Page i

    |

    a

    ' ;

    |

    This is great.

    | i

    Thee ee ee

    | ste

    A

    ie

    ee

    ee

    ee a

    The above example also fails to include a document type declaration. This is required in HTML 4.01. If you miss it out you are writing incorrect HTML. The remaining HTML examples in this chapter will include an appropriate declaration. Now, once you have declared which version of HTML you are using, you can use a validator to check that it is correct. There are a number of validators that can be used, but the validator available online on the W3C site is fairly definitive (available at http://validator.w3.org/). Note that the validator may require that you specify a character encoding telling it what character set is used in the document. You can specify this by adding a statement at the start of your head section:

    This sets the encoding to UTF-8 (unicode) — a fairly standard encoding for Web documents. If you get in the habit of writing complex Web pages without trying to validate them, you will almost certainly get many odd errors when you do

    try to validate your pages. Validate early and often and your pages are more likely to be correct and work on the full range of browsers supporting your version of HTML. 2.1.3

    XHTML: Making it XML-compatible Recently there has been more and more interest in XML (eXtensible Markup Language). XML may be viewed as a replacement for SGML which is simpler, but stricter in some respects, and which allows new

    11

    Internet Technology and e-Commerce SEE SHERESR

    AT

    BNR

    NNER

    SEEN ESRI

    specialised markup languages to be defined and used over the World Wide Web. So, while HTML was originally based on SGML, there is now demand for an XML compliant version, which will allow XML tools to work with HTML. This is the motivation behind XHTML

    (eXtensible

    HyperText Markup Language). A valid XHTML document will also be valid HTML (4.01), so there are

    no real disadvantages to writing XHTML from the start. If you are learning HTML it is worthwhile learning to write correct XHTML. If you already know HTML you should become aware of the differences so you can convert documents if required. The key differences to note are that XHTML tag names must be in lower case, and there must be an end tag to go with every start tag. The second point is perhaps the most problematic one, as it is quite common in HTML to omit certain end tags (e.g. for paragraphs or list items). This isn’t allowed in XHTML. It is also common to use empty elements such as
    (for line break), where the markup just applies to a particular point in the document. In XHTML again this isn’t allowed. Strictly we should be writing

    . However, in the case of the empty element a handy abbreviation is defined. We put a slash (/) at the end of a single tag, as in
    . You will notice this in many of the examples in this chapter.

    The following lists slightly more formally the main points about XHTML: To be recognised as XHTML the document must start with a document type declaration. Code 2.12 is the document type declaration for XHTML

    1.0 (Strict).

    CODE 2.12 |«DOCTYPE f

    html

    "=//W3C//DTD

    PUBLIC XHTML

    1.0

    Strict//EN"

    "http: //www.w3.org/TR/xhtml1/ DTD/xhtmll-strict.dtd"> SLR

    RT

    IR

    All elements must be closed. In XHTML when you have a start tag it must have a matching end tag.

    Tag names must be in lower case. This is because XML (and hence XHTML) is case sensitive, so for example and are different tags.

    The following two examples should illustrate some of these differences. Code 2.13 is valid HTML 4.01, but not a valid XHTML file. Code 2.14 is valid XHTML. See if you can spot the differences.

    You can check whether your document is valid XHTML using the W3C validator referred to in the last section.

    12

    CODE 2.13



    My HTML page



    My HTML page

    The first paragraph.

    Some items:

    Saad

    oe

  • Number
  • Number

    1 2



    Be

    eee

    er

    e

    er

    CODE 2.14

    My HTML page



    My HTML page

    The first paragraph.



    Some items:

    Peul> ‘
  • Number 1
  • Number 2




  • 13

    Internet Technology and e-Commerce ‘eB AREA RRS I

    DE NER a a PNR

    Der Pract

    RA

    TE

    LALLA

    Managing the Presentation: Stylesheets Separating Content, Structure and Presentation As noted in the last section, there is a lot of bad HTML out on the World

    Wide Web. But it’s often not just the syntax that is incorrect, it is the way that it is used. HTML is designed to work on a whole range of browsers, which may be configured in a whole range of different ways by the individual user. Default font sizes may vary, or the size of the browser window. An HTML document should normally be designed so that it can be read by anyone, using any browser, in any reasonable setup. However, it is tempting to develop Web pages using a kind of try-itand-see approach, seeing whether using a particular tag results in a page that looks nice in your browser using your settings. This is NOT a good idea. Using this approach may result in a document that looks good in your browser, but many other folk looking at your page may find that it appears odd or even unreadable. | HTML markup should be primarily about the structure of a document. It allows you to declare that something is a heading, or a quoted section,

    or an itemised list. But the details of the presentation (e.g. how to list items in an itemised list, or how to display a heading) are not specified. Different browsers may treat these differently. It may be tempting to use, say, the h1 (level one heading) element as a general instruction to present something in a large bold font. But you might find that another browser presents level one headings using italics, and automatically adds them to a contents page. Of course, some HTML markup IS about presentation. You can add markup to indicate what font to use, or to indicate the colour. But the ideal approach is to describe a document in terms of structure, and, if it is necessary to specify the presentation, to do that separately. The content (text and multimedia elements), structure, and presentation of a

    document are all distinct components (see Figure 2.3). We will see later when discussing stylesheets, and when talking about XML, how keeping these three components separate allows increased flexibility in the sharing of information. HTML is intended as a declarative markup language. It is intended to be used to add markup indicating facts about the document (e.g. this part is a heading). It can be contrasted with procedural markup languages, where the markup is intended to be read as instructions by a particular application. Some typesetting markup languages are intended to be procedural — the markup allowed is intended to be read by a typesetting device. It is dangerous to start using HTML in this way. Tags should be used to add information about the document, not as instructions to, for

    example, change the margin indentation. Some WYSIWYG (‘what you see is what you get’) HTML editors and authoring tools may encourage a procedural, try-it-and-see approach to

    14

    HTML cosserearanremmenise

    Figure 2.3 Content, structure and presentation

    HTML markup. Although authoring tools are undoubtedly useful for creating complex effects once you understand the technology, by initially using a text editor you can check more easily that the actual markup makes sense according to the structure of the document you are creating or editing. In the next section we will see how stylesheets can be used as one way of keeping the presentation of a document separate from the

    structure. 2.2.2

    The Role of Stylesheets In some versions of HTML it is possible to use elements and attributes to control document presentation rather than structure, for example, by using the tag. Code 2.8 included two examples of this, using the

    align and bgcolor attributes to control appearance. HTML 3.2 includes a number of such elements/attributes, and they are still in wide use. However, since the release of the HTML 3.2 specification it has been

    realised that mixing structure and presentation is a bad thing. Later versions of the HTML standard (starting with HTML 4.01) attempted to clean this up by removing or deprecating tags that relate to presentation, and requiring that presentation is managed using stylesheets. For example and are no longer allowed in strict HTML 4.01 — the author has to control these aspects of appearance using a stylesheet. A stylesheet is a set of instructions stating how various elements should be displayed. A stylesheet might, for example, state that a particular font

    and colour should be used for headings, or that all paragraphs should be indented by a certain amount. Stylesheets may be used in a number of different ways. If you are only concerned with managing the style of a single document, you can put style instructions in the head of the

    document. This is called an internal stylesheet. An example using CSS (Cascading Stylesheets) is given in Code 2.15. More frequently you want to develop a stylesheet that can be applied to a whole set of Web documents, perhaps to create a consistent look in a Web site. To do this use an external stylesheet.

    15

    Internet Technology and e-Commerce inner

    AAO ASAE

    CODE 2.15

    CSSeae

    t:

    ESSENSE

    EN ET LEI



    a

    '

    CSS Demo étitie>



    blue}

    red}

    My tee

    Isn’t it a nice page.



    To specify that a particular stylesheet should be applied to a site we can use the link element in the head: CODE 2.16

    The attributes specify the link type (stylesheet), the type (or strictly MIME type) of the document (text/css) and its location. The stylesheet in the specified location will be retrieved and applied to the document. The file mystyle.css might contain something like the following: CODE 2.17 body {background-color: hl {color: red}

    blue}

    This single stylesheet may be used by as many HTML documents as is desired — so they would all have the same blue background and red headings. Code 2.17 shows how you can specify the display characteristics of particular elements. In this case the background colour of the body should be blue, and level one headers should be displayed in red. It is also possible

    16

    HTML

    to specify things like spacing, alignment, margins and font sizes, thus controlling the detailed formatting of the document. What if the user’s browser settings conflict with a defined style, or an internal style conflicts with an external one? In general the style information can combine (e.g. a blue background could be defined in an external style file, while the red headings are defined internally). Where there are conflicts (e.g. a blue background vs a red one) then more specific rules override more general rules. This means that internal styles override external ones, and external ones override browser settings. We refer to styles cascading — hence the name. DIN

    Writing Simple Cascading Stylesheets (CSS) Whether you use internal or external stylesheets, the basic syntax for CSS is the same. CSS instructions consist of three parts: a selector, a property

    and a value: CODE 2.18 selector

    {property:

    value}

    The selector is the name of the HTML element whose appearance you wish to control. For example, using the selector hl you would alter how level one headings appear. The property is the attribute you wish to alter

    (e.g. color, font-family), and the value is the new value, for example: CODE 2.19

    ANE

    ARENAS ESS

    TEE

    ES

    ERS TU

    hi {color: red} p {text-align:

    ee

    a

    Se

    center}

    To make full use of stylesheets you will have to learn, or look up, which

    properties can be altered, and what the allowed values are. Look up one of the many references available on the Web. To be reasonably up to date you should use a reference to CSS2 — this is an update to the original CSS1 specification. There are various ways that you can make CSS instructions a little more concise, without having to write a separate instruction for every property

    and value. You can list several properties and values, and you can group selectors so that your properties and values apply to a whole set:

    CODE 2.20 he; n2;h3

    { color:

    green;

    font-family:

    times

    17

    Internet Technology and e-Commerce ue arnt

    Using the basic CSS instructions that we have described so far, it is difficult to have, say, some level one headings one colour, and some another. If this is required, it is possible to define element classes and have different styles for each class. We can even identify a specific element which can have its own set of style instructions. We use the class and id attributes to manage this. For example, you can define two different paragraph styles, and specify which one is to be used by setting the class attribute in the relevant HTML element: CODE 2.21

    Fe

    ne

    eee ee

    LETT Te

    ee

    p.red

    {color:

    red}

    p-blue

    {color:

    blue}

    TET

    CECE

    Ee

    TC

    A red paragraph.

    EEO

    ae ene ee

    eT

    .



    In Code 2.21 the p stylesheet selectors are followed by a class name (red or blue). Within a particular paragraph, the class attribute is set to red so the red paragraph style is selected. You can also omit the element name, to define very general styles to be applied to any element when wanted (see Code 2.22). CODE 2.22 .red

    {color:

    red}

    -blue

    {color:

    blue}

    A red heading. a

    .

    ere

    een

    er are ea

    eee

    ee

    To define a style that is to be applied to just one unique element you can use the id attribute, referencing this by using a # sign in the CSS selector: CODE 2.23

    ea Ne

    #headingll

    ee

    a

    CMEC

    eat on ate

    {color:

    ere

    ee

    ee

    meter

    red}

    A red heading.

    .

    There are CSS properties concerned with colour, fonts, margins and other aspects of layout and appearance. Table 2.1 lists a small number of the most useful.

    18

    HTML SRNR

    Table 2.1 Example CSS

    description

    possible values

    properties

    text colour

    a colour

    text-align

    aligns text in element

    left, right, center, justify

    background-color

    background colour of element

    a colour, or ‘transparent’

    font-family

    font to be used in element

    a font family name (e.g. helvetica)

    margin-left

    2.3



    absolute or relative font size to be used in element

    small, medium, large

    left margin of an element

    | a length (e.g. 3in) or %

    (etc), pt size, or %

    Managing Interaction with the User: Introduction to Forms We've now described how to mark up an HTML document to indicate its structure, and how to control the presentation of the document using stylesheets. The advantages of XHTML have been discussed — allowing

    compatibility with XML applications. But many Web pages do more than just present information. They allow the user to make selections, choose options, and generally interact with the information on the site. For example, a shopping site will allow the user to select items from some collection, and enter name/address details, etc.

    The simplest way to allow the user to interact in this way is to use HTML form elements, allowing simple types of user input. We start by looking at the role and syntax of form elements, and consider in later chapters how to handle the data entered. The html form element is used to define an area where the user may enter information. Within that form element there may be any number of input elements which allow the user to enter information in a number of different ways. Note that in strict HTML 4.01 and XHTML the input

    elements must be contained in what are known as block level elements, such as paragraphs, rather than directly inside the form. Other HTML elements are also allowed within the form, so additional information and prompts can be included, marked up with the

    appropriate tags.

    A very small example form allowing the user to enter text into text boxes is given in Code 2.24.

    19

    Internet Technology and e-Commerce CODE 2.24

    Sform agtione"">

    4.0.5

    Name:

    é

    Age: adBr SEBS os

    Note how input elements have the attributes, type and name. These attributes allow you to specify the kind of input (e.g. text or checkbox), and give the input a name, that allows it to be referenced by whatever application uses this input. The form itself has an action attribute (which for now we leave empty). This specifies what is to be done with the form’s contents. Figure 2.4 shows how Code 2.24 is displayed in a Web browser. Figure 2.4 Simple form

    There are a range of different types of input element that can be used, apart from text. The other main kinds are radio buttons, checkboxes and buttons.

    » Radio buttons allow users to choose one from a range of mutually exclusive alternatives: CODE 2.25

    _ Are you: 0-30 e pee SNS ee _ rmn CHAPTER

    OVERVIEW

    In order to make effective use of Internet technology for e-Commerce applications you need a basic understanding of the architecture and protocols associated with the Internet. This will allow you in turn to understand the design of complex Internet applications. In this chapter we: Discuss the role of the e-Architect.

    Explain the components in, and rationale for having, a tiered architecture.

    Describe the various components that make up a Web site’s meta-architecture. Discuss the role and selection of the Web server.

    Explain the purpose of network protocols.

    Describe the components and operation of the TCP/IP protocol suite. Consider capacity planning issues.

    4.1

    Introduction Let us begin our investigation into the architectural issues of e-Commerce by defining what we mean by architecture, before we look at e-Architecture. By architecture, we mean the way that the components of any design (e.g. a car, a building or computer system) fit together. The architecture might include consideration of what the components are, where they exist in time or space, how components communicate or are held together and the performance expected of them or the system as a whole. For instance, the architect for a building might consider the specification:

    39

    Internet Technology and e-Commerce -eaeenarntscony

    to be located at 35 Anywhere Street; planned to be complete by next Spring; access from the ground floor on its east and south sides; constructed using a bolted steel frame with a sandstone outer skin; to comprise 4 floors connected by 1 stairwell and 2 elevators; to cover 2000 square metres of land; to be used for office accommodation; should not require major maintenance work for 20 years; to have electrically powered heating and air conditioning; will consume a maximum of 2MW of electrical power under extreme conditions. Looking at this example for a building, we notice two things. Firstly, these considerations look very much like customer requirements. Indeed, the role of an architect (for a building or an information system) is often to spend a great deal of time making sure their design meets the customer’s expectations (and by customers we can mean those inside and outside our organisation). Secondly, the concerns of architects from whatever domain are, at some level of abstraction, universal. Architects

    are worried about time, cost, quality, risk, scalability, maintainability, reliability and so on. They also need to be aware of the people who build, use and maintain the system. Finally, they need to be aware of the latest technology, trends and opportunities that might affect the decisions they make. So, architecture is not just a plan for how something works or how it looks, it is also a role that someone needs to undertake.

    Architects are also often concerned about managing complexity. We as humans have powerful, yet limited, abilities to imagine very complex systems in their entirety. Yet very complex systems exist — spacecraft, software applications, bridges, etc. The way we manage this complexity in order to build the system, or understand how something works, is to break it down into smaller, more manageable parts — e.g. a car is made up of sub-systems like transmission, engine, chassis, steering, etc.; in turn these

    are themselves made up of sub-systems. Since we can understand the smaller parts, we can build them. Once built, all we need to worry about is

    their interface with neighbouring parts — in other words, we can treat them

    like black boxes that hide their own internal complexity. Now, as architects, we can plug these parts together safe in the knowledge that the whole system should work. Sounds simple, but things often go wrong — particularly at the interfaces between parts when the communication and interaction mechanisms are not fully understood. We can even get it wrong if we consider the wrong collection of parts to be a sub-system. We shall continue this discussion when we look at protocols later in this chapter. So what is the objective of this chapter? Well, firstly we want to start thinking about the components you will need to build a site. By components we are not talking about HTML and graphics files; we shall consider components at a slightly higher level of abstraction. So we are interested in the hardware, the databases and the functionality of certain key software elements. Once you, as an architect, have decided on these,

    40

    Internet Architecture and Protocols scomaeretaenonnearneens

    i

    NS

    RSLS ELIE ATTEN SSS

    ELS TT

    Se

    then you can start worrying about the Web site’s layout, content, navigation and functionality. Although such Web site design concerns constitute a creative act and rightly belong under the general banner of architecture, for convenience (and by virtue of imposing our arbitrary level of abstraction) we considered them in Chapter 3 on Web Site Design. Having looked at the components, and thereby satisfied our role as managers of complexity, we need to stick/glue these parts together. In the physical world of the computer network, such connections are predominantly cables. However, the data that is transmitted along these cables by the sending component needs to be addressed to, and understood by, the recipient component. Therefore, we shall also consider the network protocols and tools that allow us to stick the constituent parts together.

    4.2

    Network Architectures In the past, the dominant network architecture in an organisation would have been the mainframe with a number of satellite dumb terminals on people’s desks. Although no longer dominant, mainframes are still commonplace. These dumb terminals have keyboards and sometimes mice, and connect directly to the mainframe computer. As such they are little more than windows on the data and processes that the mainframe hosts. All the data live on the mainframe and all the processing is done there as well. The disadvantages of this architecture are related to its reliance on the mainframe. If the mainframe goes down, you are unable to carry out your work or communicate. Mainframes can also be susceptible to load surges. For instance, if everyone is processing data on the server at the same time, the response you get at your dumb terminal degrades to an unacceptable level. Subsequently, we saw a move to a client-server architecture (sometimes called a two-tier architecture) where the server

    took the place of the mainframe and the clients (typically PCs as we know them today) took over the roles of the dumb terminals on people’s desks. The difference was that some power and responsibility was devolved from the server to the client — the client was no longer dumb. The servers became responsible for some storage, communication and administration duties. For example, you might have your own local hard disk in the PC

    that sits on your desk, but you can also choose to put your files on a networked drive that lives on the server. Also, the server can allow or deny your request to log on to the network from your client machine. However, this trend for decentralising power and responsibility can go

    too far, whereby each client becomes fat. There might be different and incompatible applications on each client. This may be problematic since it compromises communication; the users of the clients may not actively back-up their local files making them vulnerable to losing their important and valuable data; licensing diverse suites of applications becomes costly and difficult; more expensive machines are needed to store and run

    41

    Internet Technology and e-Commerce ev

    RERUN NOURI

    LIERE AUNT LI ES

    sa

    applications on fat clients; and so on. At least with the mainframe approach, there was a central, controllable, editable, supportable environment. In an attempt to control such chaotic diversity, some

    organisations limit user access to local client storage and centralise applications. For instance, when a user wants to use the standard word processor and clicks on the relevant icon on their PC desktop, a copy is downloaded from the server to the client machine and run locally from memory. At the same time, there is a tentative move away from fat clients to

    so-called thin client-servers. A thin client-server architecture within an organisation hosts the minimum amount of data and number of applications that are needed; the rest is outsourced to Application Service Providers (ASPs). An example would be that we no longer necessarily need any email client software on our PC, or even on our server, since we

    can access remote accounts via our Web browser. The flip-side of this ‘diet’ is, however, that problems can be more acute with respect to network bandwidth (how much data can be transmitted), latency (the

    related issue of how long data take to travel down the line) and connection stability. If performance is too low, the ASP option becomes unattractive. The notion of ASPs is discussed further in Chapter 10 on e-Business Models and Chapter 14 on e-Law and e-Ethics.

    4.2.1

    The 3-Tier Architecture

    Having spoken about the architecture of our networks, let us now turn to that of our applications. A common approach is to use (or at least think in terms of) a 3-Tier Architecture. This is a model in which the user interface,

    functional process logic (“business rules’) and data storage are developed and maintained as independent modules, often on separate platforms (see

    Figure 4.1). The 3-tier architecture is intended to allow any of the three tiers to be upgraded or replaced independently as requirements or technology change — as long as the messages sent and protocols used to communicate between the tiers are consistent and understood. The middle tier can also take on the responsibility of load balancing so that, for instance, if too many clients are connecting to the application server, which in turn is swamping the database with requests, the application server can use other database servers in order to spread or balance the load. Compare this model to the mainframe approach where data, application and interface code were often tightly coupled, making maintenance complex and expensive. Typically, the user interface runs on a desktop PC and uses a standard

    Graphical User Interface (GUI); functional process logic may consist of one or more separate modules running on an application server; and a Database Management System (DBMS) on a database server or

    mainframe contains the data storage logic. The middle tier may be multi-tiered itself (in which case the overall architecture is called an “‘n-tier architecture).

    42

    Figure 4.1 The 3-tier

    User Interface Client

    architecture

    Application Server

    Database Server

    Taking this approach outside the organisation it is possible to imagine that these three components do not even exist on the same local area network; they may be dispersed over the Internet. Consider the example of obtaining live, online share prices where the user interface is provided by a browser on the client machine, the application logic resides on the Web server of some e-portal company and the database lives at the stock exchange.

    4.3

    Web Site Meta-architecture As we have already said, this chapter will not concern itself with the appearance and functionality of the site. Instead we will stand back a little and consider the infrastructure that supports that site architecture — the architecture of the architecture, or meta-architecture, if you like. So, as

    Web site architects, we exist within a larger system, namely the Internet, with all its protocols and client machines ready to access our site. Indeed, our site’s meta-architecture is a sub-system of the Internet and we need to start selecting the sub-sub-systems to build it with. Please note, however,

    that we are assuming here that you are about to build this metaarchitecture and not, as so many online companies do, buy or outsource the facility from another company. This is the classic Build or Buy? dilemma that many face when they are considering developing a site. We discuss this issue elsewhere in Chapter 15 on e-Methodologies and Chapter 10 on e-Business Models.

    43

    Internet Technology and e-Commerce “Tessmmnmmsomemmmmm aN

    cass yeasts meee

    We need to decide on the hardware, software and functionality of our Web server, firewall, database, applications (client- and/or server-side? — see Chapter 5 on Languages of the Internet for a more comprehensive treatment) and communication protocols. We could begin by considering the topology of the system (see Figure 4.2). Here we show servers as existing on separate computers, but be aware that they could be, and often

    are, housed on the same machine. In our Figure 4.2 example, the organisation’s internal network of servers and clients is denoted by LAN (Local Area Network). Different people with different authorisations will be able to access the other components we can see. Indeed the LAN will probably be able to access the Internet via the Gateway Server. These components are the parts of the architecture that are of interest to us in order to get our Web site up and running. Let us begin in the bottom right of Figure 4.2. Here we can see a Legacy System. If your organisation has been around for a number of years there is a fair chance that there will be old computer systems, e.g. mainframes,

    that were built some time ago but are still being used — these are your legacy (or heritage) systems. Although they are perhaps difficult to maintain and do not readily interoperate with other systems, they are a vital and valuable part of the business and their replacement would be risky and costly and is certainly not planned in the near future. They may hold your inventory, accounting software, customer records, etc. As well as the data they possess, they will probably encapsulate a great deal of organisational knowledge in terms of the programs that have been built and evolved over the years to cater for your own circumstances and processes. In fact, the new business you get from the Web may eventually end up living on the legacy. For this reason, we need to be able to get our new Web architecture interoperating with the legacy.

    Figure 4.2 Hardware architecture

    Firewall

    =| Gateway |

    Server

    Middleware Server

    Database Server

    44

    Internet Architecture and Protocols Perera

    nee tele

    nc

    sos

    A common means of enabling this interoperability is through the use of middleware products. These software applications are specially designed to help heterogeneous computer systems communicate and are sometimes referred to as the glue or the plumbing between components. In our example we have installed these on the Middleware Server. There are a variety of middleware products and genres. For instance, CORBA, DCOM and Enterprise Java Beans provide a distributed object infrastructure — in effect, an application makes available a service to other components in the form of an object and hides the implementation of that object. Alternatively, message-oriented middleware can marshal predefined requests and responses into messages from and to applications to assist interoperability. Since both the Web Server and the Database Server may need to access the Legacy System’s logic and data, they can use the middleware applications to do so. Assuming the Legacy System is being used as the prime repository for operational data, the Database Server could act as a back-up for the legacy data, or store Internet specific data that the legacy was not designed to accommodate. As we mention elsewhere (specifically in Chapter 13 on e-Security), we would ideally want a well configured firewall between us and the Internet as a security measure. Therefore, the Web Server and the Gateway Server are both connected to the firewall. At this point, we should emphasise that Figure 4.2 constitutes only one of many possible configurations. Some components may live on the same physical machine and other components not mentioned here could come into play, such as an FTP, email or proxy server. Furthermore, you may have functionality requirements such as software for Customer Relationship Management. You may even be servicing multiple access channels like the World Wide Web, WebTV, wireless clients, intranets and extranets or dial-in services.

    4.3.1

    The Web Server

    Turning now to the Web server itself and considering Figure 4.3, perhaps the most important component is the server software. It is responsible for access control, running scripts and programs, enabling administration of the server configuration and (to some extent) the contents of the Web site

    and logging transactions. When deciding on a particular Webserver software, considerations may include: - Cost — the most popular packages are free, although free may mean

    they come as part of a non-free suite of software like an operating system. Cost need not just include the purchase price of the server software, it could also mean incurring extra training costs for personnel if they are not familiar with what is being purchased, or indeed what it is being installed on.

    45

    Internet Technology and e-Commerce BEREAN I ORL

    Figure 4.3

    OEE

    ES

    Web

    server architecture

    |

    Web Server

    |

    Machine

    Operating System

    Web Server Software

    Applets/Serviets

    HTML

    Other Media

    Server Software

    » Platform — some are specific to Windows and/or Unix. » Performance — the number of requests you expect your site to handle and the speed with which they are processed may be a deciding factor. Security — the security protocols, access controls and filtering that is supported.

    » Added functionality — some server software claims to posses extra features, like built-in e-Commerce functions. In fact, these considerations apply to almost all of the components you need to select in the meta-architecture of the site. There are many Web servers on the market but the big three are those from Apache, Microsoft and Netscape. Anecdotally, their notable quirks are that Apache is solid and comprehensive but more difficult to administer; Microsoft’s offerings are specific to Microsoft operating systems but have good performance; and Netscape servers can handle

    large volumes of traffic and are straightforward to administer. Having said it may be the most important component, your choice of

    server may also be the most important decision that you as an architect may make. Whilst following the herd is not always advisable, it may help to know that from recent surveys, almost two-thirds of the Web servers in

    the world were running Apache, with around one-quarter using Microsoft (Internet Information Server, Personal Web Server) and less than one-

    twentieth were from iPlanet — the stable that includes the Netscape

    Enterprise Server. In terms of operating systems (OSs), the distinctions are less clear. Very

    approximately, Unix-like OSs are run on half the World’s Web servers (Linux appears to be the most popular of this genre) and Microsoft OSs are run on the other half. With the popularity of open-source products for supporting the world’s Web sites, there is a neat acronym that you may 46

    Internet Architecture and Protocols SS SUERTE AE SCIEN

    NILES

    TETAS

    NSCS DOUGH

    HR ENR

    SSS

    hear in connection with an approach to Web architecture — LAMP. LAMP stands for Linux, Apache, MySQL and the ‘P’ represents either Perl, PHP or Python. With the LAMP approach we cover the essential Web-enabling

    technologies of the operating system, the Web server, the database query language and finally the application scripting language, respectively. 4.3.2

    The Proxy Server We should also mention proxy servers here. For the context we are interested in, a proxy server (sometimes called a cache server) lives behind the firewall. It stores the result of requests made by clients and servers. Subsequently it intercepts other requests to determine if it can fulfil them itself. If not, it forwards the request to the appropriate server (remote or local) for processing. Proxy servers have two main purposes:

    They can improve performance because they save (cache) the results of all requests for a certain amount of time. Consider the case where both user A and user B access the Web through a proxy server. User A requests a certain Web page on a remote Web server that is not in the cache. The proxy server notices this and saves a local copy of the page that is returned. Some time later, user B requests the same page, but

    instead of forwarding the request to the remote Web server, which takes time and costs money, the proxy server simply returns the page that it has already fetched for user A. This is fine for static pages that change little over time (particularly since the cached version expires after some period of time), but if the author of the page changes the content between users A and B requesting the page, user B will end up looking at out-of-date content. Most browsers allow users to select the degree to which they rely on previously visited material and even cache pages themselves on local hard disks to supplement the contents of the central organisational proxy server.

    They can also be used to filter requests. For instance, a company might use a proxy server to prevent its employees from accessing certain sites. In fact, this aspect of the proxy server is an important component of the firewall concept we mention in Chapter 13 on e-Security.

    4,4

    Protocols We have already mentioned that one aspect of architecture is deciding on the constituent parts of a system, whether that be a building or a digital technology. However, in order for us all to agree on how such components communicate/interoperate effectively, we need protocols.

    Returning to our building analogy, imagine the architect had decided on a 100mm mains water pipe coming into the building, but the water company supplying that part of town had run a 300mm pipe up to the

    47

    Internet Technology and e-Commerce SCE

    IEE SOIR

    TINIE

    EEE

    OT

    boundary of the building site after the 100 mm pipes had been installed. On discovering the mismatch, the site manager would have to order a reducer (incurring extra cost and time) and the building may be left without enough water volume and/or pressure as a result. In other words, the architect and the water company needed a protocol (an understanding) to help them make design decisions about the interface between the two systems. In the context of computer networks, a protocol is a set of formal rules describing how to transmit data between two or more communications devices or computers. In practice, we often consider network protocols to exist in a hierarchy and each level in this hierarchy we call a layer. Lowlevel protocols in the hierarchy define the electrical, optical and physical standards to be observed, whilst high-level protocols deal with data formatting, the syntax of messages, character sets and the sequencing of messages. There are a multitude of different network protocols and we do not intend to deal with this large subject area in any great detail. Instead, we shall look at those commonly found in e-Commerce. 4.4.1

    The TCP/IP Protocol Suite TCP/IP TCP (Transmission Control Protocol) and IP (Internet Protocol) were

    originally envisioned as a single protocol, but they are in fact two different protocols that are often used together on the Internet — hence TCP/IP or ‘TCP over IP’, referring to the aforementioned hierarchy where TCP exists at a higher layer than IP. In practice, and rather confusingly, the TCP/IP protocol suite actually refers to a large collection of protocols and applications, which is usually called simply TCP/IP. HTTP

    The Hypertext Transfer Protocol is the basis for exchange of information over the World Wide Web and exists at the top level of the TCP/IP protocol suite as a so-called application layer protocol. You may recognise

    this acronym from your Web browser as it is pre-pended to the URLs you request. Other protocols at this level are FTP, Telnet, SMTP and POP, to

    name but a few. There are also security protocols (e.g. SET and SSL) which we will mention in Chaptyer 13 on e-Security. These live in an optional layer above TCP and below HTTP. Let us consider a (highly simplified) scenario involving two computers communicating on a network (or the Internet) — see Figure 4.4. Computer A sends an HTTP request to computer B (e.g. the user selects a hyperlink on a Web page using a browser — the browser is in fact an HTTP client). Before the request is sent, TCP on A takes this message, breaks it up into 48

    Internet Architecture and Protocols SLIVER TA

    A SERENE

    BPR RH a

    AS MURR

    ASS

    Figure 4.4 TCP/IP stack Computer A

    Computer B

    HTTP Client (browser)

    HTTP Server

    Request

    Reply

    Request

    Segments

    Reply

    Segments Packets transferred on

    physical condition

    ordered segments and adds its own headers (binary codes that tell B various things such as the order this segment comes in the message, etc. ) around each segment. Then IP on A adds its own headers around the segments, turning them into packets. Each packet is sent down the wire to B where IP on B strips off the IP headers. One of these tells it to send the resulting segments to TCP. TCP now strips off its headers. One of these headers tells it that it needs to send the bare message portions in a prescribed order. Another header says that the destination of this ordered stream of bits is the HTTP server — namely the Web server. The HTTP server then sends back its response with the requested data (i.e. the HTML code for a Web page). This descends the protocol stack, being split up and having the TCP then the IP headers added. It arrives at B to have the headers stripped as it ascends the stack and eventually arrives at the HTTP client. The browser interprets the HTML code and displays the page. PHEW!! Did you ever imagine there was so much involved in clicking on a hyperlink? The amazing thing is that this happens over vast distances, involves large amounts of data and often appears to happen instantaneously and without mishap. Please note that this description is highly simplified, but should give you a feel for the issues. In fact the TCP/IP protocol stack is usually a four layer model. The physical connection is the fourth layer and its protocols include Ethernet, token ring and ISDN. The advantage of using a suite of protocols in this manner is analogous

    to the use of tiered architectures — so that we can provide modularity in case we need to change one aspect in the hierarchy. 4.4.2

    IP Addresses

    Having mentioned the IP protocol, we should mention one of its most important header types — the IP Address. Contained in the headers are the

    source and destination addresses, typically written as a sequence of four numbers. Since the values are separated by periods, the notation is called dotted decimal.

    49

    erce SSRI

    A sample IP address is 137.195.150.50. This address belongs to the main Web server at Heriot-Watt University in Scotland. As it happens, a block of such numbers beginning 137.195.xxx.xxx has been allocated to HeriotWatt. Within Heriot-Watt there area number of sub-networks. One such sub-network, owned by an academic department, has its own block of numbers starting 137.195.52.xxx. Some of the remaining numbers in this sub-network refer to individual machines in the department. 4.4.3

    The Domain Name System While IP addresses are wholly numeric, most users do not memorise the numeric addresses of the hosts to which they attach; instead people are more comfortable with host names. Most IP hosts, then, have both a numeric IP address and a name. For instance, the IP address we

    mentioned previously was 137.195.150.50, which also has the host name

    www.hw.ac.uk. While this is convenient for people, the name must be translated back to a numeric address for routing the packets of data around the Internet. To handle these names, the Domain Name System (DNS) was created.

    The DNS is a distributed database containing host name and IP address information for all domains on the Internet. There is a single authoritative name server for every domain. This contains all DNS-related information about the domain; each domain also has at least one secondary name server that also contains a copy of this information. Several root servers around the world maintain a list of all of these authoritative name servers. When a host on the Internet needs to obtain another host’s IP address based upon its name, a DNS request is made by the initial host to a local name server. The local name server may be able to respond to the request with information that is either configured or cached at the name server. If the necessary information is not available, the local name server forwards

    the request to one of the root servers. The root server then determines an appropriate name server for the target host and the DNS request will be forwarded to the domain’s name server. Once the numeric IP address is known and the packets have been sent out onto the Internet, various pieces of hardware, called routers, handle

    the paths that the packets take based on their knowledge of the Internet. Since there is generally more than one route between any two IP addresses, the Internet can continue to operate even if some parts fail.

    4.5

    Capacity Planning So far, you might be under the impression that all we have to do to get your site off the ground is buy some hardware, install and build some software and wait for the orders to roll in. If only it were that simple! In developing a site you are exposing your hardware, networks and

    application to millions, even billions of people. What would happen if

    50

    Internet Architecture and Protocols reac

    noeesittom ant

    SRO

    LATERM RELIES ESTERS CLALIT EE

    they all wanted to use your services at the same time? The answer is almost certainly that your architecture would slow down and probably fall over (i.e. break). From a commercial perspective, your customers will lose trust in you if the site’s performance is poor or services become inaccessible. If they lose trust, you lose business. A useful analogy might be to imagine a large brick-and-mortar supermarket during a busy period with only one checkout open. The queues and frustrations become unbearable. Therefore, we need to think about planning the capacity of our architecture so that it can handle the number of users that we expect without breaking or slowing down too much. But how? Firstly, let us consider the notion of redundancy, which occurs everywhere. For instance, we have two eyes and ten fingers, our cars carry a spare wheel, graphical and audio elements in Web pages often have alternative textual descriptions, etc. Yet, as well as providing a back-up in case there are failures or weaknesses in systems, we often observe emergent properties arising from such redundancy: two eyes give us stereoscopic vision, multiple fingers give us dexterity, alternative text in Web pages helps indexing and supports people with certain disabilities. Indeed, the Internet itself incorporates much redundancy, where data can take

    numerous routes between client and server depending on circumstances. We'll return to emergent properties (as they emerge) in the chapter summary. Returning to our own site’s architecture, we have to consider the

    possibility that we may need more than one Web server, database server, etc. If we had multiple servers, we could spread the load amongst them and if one fell over, the others hopefully could cope until the broken one is fixed. However, we would need to make sure they all possessed the same functionality and data and, as such, act and appear as one system to the user. The commonest approach to achieving cooperation and distribution across such servers is to use clustering. So what is clustering apart from a collection of servers? Well, clustering is a technology that has two core aspects. The first consists of customised operating systems, compilers, and applications. This means that you need to proactively take advantage of the technology as you develop your

    applications. The second aspect is the hardware connections between the servers in the cluster. These need to be carefully designed since normal networks and cables would only introduce bottlenecks that would undermine the whole point of clustering in the first place. Once these two aspects have been dealt with, clustering software will attempt to balance loads and assign client requests as they arrive. An exercise at the end of the chapter asks you to research clustering products. In practice, many companies aim for 100 per cent redundancy. This means that companies know they can handle customer requests while they

    take down half the architecture (perhaps hosted at a different physical geographic location for disaster recovery reasons) and conduct updates, maintenance and upgrades. In addition, companies often demand very close to 100 per cent availability. With sufficient redundancy and

    51

    Internet Technology and e-Commerce stents

    eM

    EREDAR

    ERE

    NI

    thorough testing, this availability is generally achievable (in theory at least). Another exercise in this chapter encourages you to investigate capacity testing tools. For many Web infrastructure components, e.g. servers, the load they experience is related to the amount of online business the company is executing. This volume of business can be quantified by Natural Business Units (NBUs) — sometimes called key volume indicators or forecasting

    business units. For instance, the number of registered users, the number of concurrent users at a certain time of day, the number of products on offer,

    and so on could all constitute NBUs. However, the NBUs change over time as the market grows or environmental factors take hold. For the latter, a new marketing campaign (from within or from a competitor) might affect trade. Alternatively, government, media or stock exchange announcements can impact NBUs.

    Therefore, having analysed and predicted demand from historical logs and via consideration of planned changes in NBUs — and having decided on availability and performance criteria — it is possible to specify necessary infrastructure component requirements, e.g. servers. However, building in reliability and performance can be expensive, so care should be exercised when ramping up capacity to ensure that there is a sensible business case and not just a technical whim. Not surprisingly, if you plan to outsource your application and data hosting, one major concern for you should be the availability and performance of the service. We cover this in some detail when we discuss law and ethics, but suffice to say, one aspect of the contract with the supplier should be agreed quality of service metrics. These are often called service level agreements. Finally, things do go wrong. Unforeseen circumstances may mean that there is poor performance or downtime. If this does happen, it makes good business sense to conduct some public relations and apologise to customers, advising them when the situation will be rectified. There may even be a case for enforced downtime if performance is too poor.

    4.6

    Beyond the Client-Server Paradigm Here, we shall briefly review some of the competing paradigms that yet may rival the dominance of the client-server approach to network architecture. Let’s begin by considering a few scenarios.

    Imagine I can read/write data from/to your PC and you can do the same on my PC. We agree that we can do this with each other, subject to certain limits on the amount of data and where it is stored, and

    install software that enables us to do this. Now, instead of a server mediating the process of data sharing, we have two like-minded people/machines (so-called peers) that are in control. This is Peer to Peer (P2P) computing such as implemented by Napster and Gnutella. We mention Napster again when we discuss ethics and law.

    52

    Internet Architecture and Protocols ANTENA

    EME

    NEON

    LATA LARSEN

    LL EOE NE OES

    But that’s just data sharing. Imagine we could use the power (not just the data storage) of many small machines to tackle a really hard and time-consuming computational problem. This is sometimes called Internet computing. An example is the SETI@home project. Now imagine we can take the notion of Application Service Providers (ASPs) one step further so that, instead of using the service of one

    ASP in isolation, we can plug together the services from a number of them and produce an integrated, customised, end-to-end B2B infrastructure for our organisation. These services are called Web Services. Finally, imagine we have scientific instruments that can be controlled by remote users and which can make their results visible to remote users; databases of results gleaned from such instruments and elsewhere; and powerful computers that can analyse the data of the instruments and the databases. Now picture a technology that can help us string together and schedule tasks that run across all these resources.

    This technology, essentially a sophisticated middleware, can even find out what resources are available to use at a particular time and can bill you for any resources you use. This final scenario captures some of the concepts that are behind the Grid — a word that is analogous both to the electrical power grid and the World Wide Web. These example scenarios and allied technologies are still evolving and may yet converge. In essence they represent a paradigm that allows

    computing to become even more ubiquitous, distributed, heterogeneous and interoperable. Exciting times!

    CHAPTER SUMMARY When we talked generally about architecture, we mentioned the management of complexity. We achieve this by looking at the meta-architecture of the site and breaking it down into components that we can consider separately. Having done so, we, as architects, can decide if the service that these components supply should be built in-house, bought off-the-shelf from someone else or outsourced completely in terms of

    maintenance, configuration and hosting. Issues of in-house expertise, cost, speed, reliability and risk will help us make such decisions.

    Next we considered the ways in which these components fit together —

    how they communicate. This we achieved through employing well known and well understood protocols. Many of these protocols are build into the

    applications and components we either build or buy. This means we do not need to worry too much about the low-level implementation of them, but we should understand the mechanisms they employ.

    53

    Internet Technology and e-Commerce Furthermore, by putting individual components together we would hope to witness some degree of synergy. By synergy we mean that the whole is greater than the sum of the individual parts. Consider water (H,O). If you had a bucket of hydrogen on its own and another bucket

    containing just oxygen, you could neither quench a thirst, wash a car nor grow a tree. However, if the two are chemically combined (this is the equivalent of our protocol) to produce water, we get a fundamental enabler for life on this planet. Such synergy usually results in emergent properties. In the case of water, the properties that emerge are taste, wetness, etc. So as e-Architects we have put various components together and created the infrastructure to enable a Web site to be hosted. In isolation,

    hardware, software, cables and protocols would not allow this to happen. But put these together correctly and our emergent properties are that the site becomes maintainable, scalable, secure, fast, cost effective and

    reliable.

    1

    Find an online resource that will give you an up-to-date picture of the relative usage of the different Web server software products. Date and quote the most recent figures taking account of (and describing) the distinction between active and inactive servers. Search terms you may wish to employ are: survey; Web; server; apache; iss; iplanet.

    2

    We have already noted the popularity of Apache server software. Now write a short article (not more than 200 words) about the

    Apache project, specifically answering the questions: what is the project about; its history; its development methodology; and its justification for being freely distributed.

    3

    Write a one-sentence description about each of the following application layer protocols: FTP, POP, Telnet, IMAP.

    4

    Find a service on the Web that will convert IP addresses to host names and vice versa. Use it to convert three domain names that

    you use on a regular basis and record their IP addresses. 5

    Find and critically compare at least two clustering products.

    6

    Find and critically appraise at least two capacity planning tools

    (sometimes called network simulators) that can test your site and look for problems.

    54

    Internet Architecture and Protocols UHR

    8

    FURTHER READING

    ee

    RL,

    :

    Deitel, HM, Deitel, PJ and Nieto, TR (2001) e-Business and e-Commerce: How to Program, Prentice Hall (Chapter 8).

    Eaglestone, B and Ridley, M (2001) Web Database Systems, McGraw-Hill. Foster, I, Kesselman, C and Tuecke, S (2001) “The anatomy of the

    Grid: enabling scalable virtual organizations’, International Journal of High Performance Computing Applications, 15(3), pp. 200-22. Menascé, DA and Almeida, UAF (2002) Capacity Planning for Web Services: Metrics, Models and Methods, Prentice Hall. Turban, E, King, D, Lee, J, Warkentin,

    M and Chung, HM

    (2002)

    Electronic Commerce 2002: Managerial Perspective, Prentice Hall, NJ (Chapter 12).

    Turban, E, Lee, J, King, D and Chung, HM (1999) Electronic Commerce: A Managerial Perspective, Prentice Hall, NJ (Chapter 11).

    55

    Languages of the Internet (Qi zmavrin CHAPTER

    OVERVIEW

    In order to develop and understand e-Commerce applications you need an awareness of the different Web programming languages that can be used. In this chapter we give an overview of some of the commonly used languages and technologies, and: Explain the implementation, applications and relative merits of various Web programming/scripting languages. Differentiate between client- and server-side technologies. Introduce cookies and XML. In the following chapters we look more closely at some of the most important and widely available languages, enabling you to develop interactive Web services.

    5.1

    Introduction When the Web first appeared it provided a static set of pages that contained information. So a page might have contained hyperlinks that pointed at other pages. At the same time we had applications running on our own computers that exchanged data over networks. This data was often dynamic, in other words up-to-date and constantly changing. Therefore, the Web was missing an opportunity to exchange and display

    dynamic data. It did not take long before we realised that there was a need to be able to create Web pages (and elements of Web pages) on the fly — in real time. For example, imagine a static page displaying share prices. The content of the page would have to be constantly updated by someone typing in the

    new data to the HTML file, saving the file and then starting over again

    until the markets closed. Even if this pointless task were carried out, the

    page’s users on remote client machines would be endlessly hitting refresh/

    reload on their browser interface in order to see the latest data. The preferred alternative would be for the client to access a Web page that is

    56

    Languages of the Internet ar

    ROSS

    LRT ROMAN SENSE NADA ABNER

    SEER MBS ARES

    itself constantly and automatically refreshing the data from a real-time database. Another limitation of the static Web page is its inability to allow the user to communicate. Imagine a situation where we had text boxes, radio buttons, etc. in a Web page form, and there was no facility to process what the user had entered — where do the data go and what actions are performed on them? In order to introduce the necessary dynamism, we needed to use computer software to process, control, access, report, format and filter data. This was the motivation for using the Web programming languages we discuss here. This introductory section is not intended as a comprehensive discussion of PHP, Perl, ASP, ColdFusion, JavaScript, VBScript or Java Applets and Servlets. All we hope to achieve is a raised awareness of where such languages are deployed, what they might look like at the most basic of levels and their relative merits. In the following chapters we discuss some of the core technologies in more detail, so that you are able to develop simple but complete interactive applications.

    D2

    The Client and Server Sides of Web Programming If we consider an oversimplified topology of the Internet’s client-server architecture, as in Figure 5.1, then the Web browser lives on the client machine and the Web server lives on the server. The client then requests services (e.g. ‘Please send the contents of an HTML file’) and the server

    sends them back. The browser on the client then displays the data. The technologies that allow us to create dynamic Web pages can live on either the server or client. Client-side technologies are executed in the browser on the client machine and the programs that use such

    technologies are included in the HTML files the server sends. Server-side technologies run on the Web server and can send their outputs to the client for display. Figure 5.1 Simplistic client-server architecture over the Internet

    57

    Internet Technology and e-Commerce REE

    RA REEL

    LE LETTE

    One advantage of client-side programming for the Web is that, once the page has been sent to the browser, no additional communication may be necessary with the server. This avoids the delays associated with a slow network connection. However, the main disadvantage is that more responsibility is given to the client machine. If the client is slow or the browser is old (and therefore often incompatible with the necessary client-

    side technologies) the page may not display as expected. On the other hand, if the process is executed on the server side, the developer has more control over how the code behaves and is less at the mercy of the browser in use on the client machine. The downside is that developers have to be sure that their service is scalable, such that it can

    handle multiple requests to run a program and not cause the service performance to degrade significantly.

    5.3

    Server-Side Scripting Languages In this section we shall look at the scripting languages that run on the server side. We deal with Perl in a separate chapter (Chapter 7), so we only touch on it here, along with PHP, and to a lesser extent ASP, ColdFusion

    and SSI. Let us start by mentioning the Common Gateway Interface. 5.3.1

    Common Gateway Interface The Common Gateway Interface (CGI) is a standard feature of many Web servers. CGI specifies how to pass arguments to a program via a Web

    server and defines a set of environment variables that will be made available. This allows us to invoke the program from the Web, and pass it the necessary inputs. The CGI program can then, for example, access information in a database on the server and format the results as

    HTML, returning this to the client browser. A CGI program can be any program which can accept command line arguments. Perl is a common choice for writing CGI scripts, although C/C++ programs are often used. The CGI programs often reside in a sub-directory of the directory containing the HTML files. For instance, you may have a Web directory called ‘.../~username/www’ and place your CGI programs in ‘,../~username/www/cgi-bin’. Whenever the server receives a CGI execution request it creates a new process to run the program. If the process fails to terminate for some reason, or if requests are received faster than the server can respond to them, the server may become swamped with processes. In order to improve performance, Netscape devised NSAPI and Microsoft developed the ISAPI standard, which allow CGI-like tasks to run as part of the main server process, thus

    avoiding the overhead of creating a new process to handle each CGI invocation.

    58

    Languages of the Internet EN

    De Dae

    LSA

    ERNE

    NH

    NE

    ER

    Perl

    Perl is a widely used and flexible server-side scripting language, particularly popular for writing CGI scripts. To run a Perl script your Web

    server needs to have certain Perl binaries installed and be configured appropriately. However, Perl is Open Source software, freely available for most platforms, so the chances are that your Web server will be set up appropriately. Perl is particularly useful for string manipulation, with powerful string

    matching methods built in. It also has freely available program libraries (modules) for many of the common tasks needed for Web scripting. As a simple example of some Perl in action, consider Code 5.1, revisited in Chapter 7. This outputs a simple HTML ‘Hello World’ page, as displayed in Figure 5.2. CODE 5.1 Script

    Perl i #!/usr/local/bin/perl ' print "Content-type: text/html\n\n"; } print "\n"; { print "\n"; : print "My First Perl program\n"; : print "\n"; i -print “\n"> : print "Hello World!\n";

    : : :

    ' : : | :

    | print "\n";print "\n"; ee

    Figure 5.2 Perl script running in a browser

    fq My First Perl program-... File

    Edit

    View

    PIES

    Favorites ** »

    ~

    -

    Linke

    Back

    Hello World!

    a

    ‘> Internet

    59

    Internet Technology and e-Commerce 955

    SNE

    IESE

    ERSTE

    SFe le

    RAN CREAR

    MTSE ESS

    ES

    TIS

    PHP

    PHP stands for ‘PHP Hypertext Preprocessor’. It is a server-side, cross-platform, HTML-embedded scripting language used to create dynamic Web pages and is open source. The significant term here is HTML-embedded. This means that we can write PHP code directly into our HTML files — see the Hello World example in Code 5.2, which produces the same outputs as Code 5.1 (see Figure 5.2) except for the different titles. CODE 5.2

    5

    PHP

    =

    script ii

    :



    My First PHP program



    Note that, for PHP to run, the Web server needs to have certain files installed and be configured to parse files with certain extensions (e.g. php or phtml) appropriately. An interesting issue that arises with PHP is that of separating the presentation code and the process logic (e.g. business rules). In some organisations there will be people who specialise in making the site usable and attractive and others who specialise in getting the applications to talk to each other, crunch numbers and communicate with databases. If both

    aspects are combined in the same file, ownership, responsibility and concurrent development becomes difficult. For this reason, PHP applications often separate out the presentation elements and business logic elements of the code into separate files. 5.3.4

    Active Server Pages Let us briefly mention Active Server Pages (ASP). These essentially mirror the implementations of JavaScript and VBScript that we mention shortly but, instead of being processed on the client side, they are processed on the server side. Interestingly, whilst JavaScript is more popular on the client side, VBScript appears more common in the ASP world. One advantage of this approach is that, since the processing is being

    done on the server side, we are not at the mercy of the vagaries of different browsers and have better control over how our pages appear. Generally,

    ASP only run on Microsoft Web servers, like IIS or PWS, since they have the requisite functionality. In addition, ASP files usually have a ‘.asp’ extension so that the server knows to parse them appropriately and not treat them as conventional HTML files.

    60

    Languages of the Internet ‘AER

    ae Te)

    RE

    RIO

    A

    ColdFusion

    Another brief mention goes to ColdFusion, which is a proprietary software product from Allaire. It uses its own markup language (CFML) that the Web server needs to be configured to recognise. In effect, we are embedding specialist markup tags in our HTML - the files often have a .cfm extension. Primarily, ColdFusion simplifies the integration of Web pages with databases, so we could have a database query inside tags, then output the results inside a tag pair, for instance.

    J.d.0

    Server-Side Include (SSI) SSIs are commands embedded in HTML documents. They direct the Web server to dynamically generate data for the Web page whenever requested. The basic format for an SSI is . You may recognise the comment tags around the SSI. These are required for servers that do not support SSIs, so the instructions will simply be ignored as comments. The #command can be one of several instructions to the server, the simplest being #include, which inserts the contents of

    another file. This is especially useful for ensuring that standard components, e.g. headers and footers, are the same on all pages throughout a Web site. To change a standard element, you need only modify the include file, instead of updating every individual Web page. SSIs can also be used to execute programs and insert the results. For HelloWorld: CODE 5.3 Server-Side Include

    RS

    ee

    ee

    |

    .

    git

    as

    My First SSI