Elements of programming with Perl 9781884777806, 1884777805

As the complexity of websites grows, more and more webmasters need to acquire programming skills. Naturally, such person

294 103 16MB

English Pages xvi, 312 [372] Year 2000

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Elements of programming with Perl
 9781884777806, 1884777805

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Concepts, skills and practices of effective programming

Copley Square

Elements ofProgramming with Perl

Digitized by the Internet Archive in

2014

https://archive.org/details/eiennentsofprograOOjohn_0

Elements of

Programming with Perl

Andrew L. Johnson

II

MANNING Greenwich (74° w. long.)

For Susanna, Joseph, and Thomas For electronic browsing and ordering of this and other visit

http://www.manning.com. The pubHsher

when ordered

in quantity. For

Special Sales

Manning

Manning

offers discounts

more information,

books,

on

this

book

please contact:

Department

Publications Co.

32 Lafayette Place Greenwich,

Fax: (203)

CT 06830

email:

66 1-9018

[email protected]

© 2000 by Manning Publications Co. All rights reserved. No

part of this publication

may

be reproduced, stored in a retrieval system,

or transmitted, in any form or by

means

electronic, mechanical,

photocopying, or otherwise, without prior written permission of the publisher.

Many of the

designations used by manufacturers and sellers to distinguish

their products are claimed as trademarks. in the book,

designations have been printed in

@

Where

those designations appear

and Manning Publications was aware of a trademark claim, the initial

caps or

all

caps.

Recognizing the importance of preserving what has been written,

it is

Manning's policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end.

Library of Congress Cataloglng-in- Publication Data

Johnson, Andrew

L.,

1963-

Elements of Programming with

Perl

/

Andrew

Includes bibliographical references and index,

ISBN 1-884777-80-5 1.

Perl

L.

Johnson,

cm.

p.

(alk.

(Computer program language)

QA76.73.P22J644 005.13'3— dc21

(p.

348).

~ ,

,

'

,

;

"; i

;,

,

paper) I.

Title.

1999

99-42510

CIP

Manning

Production

Publications Co.

32 Lafayette Place Greenwich,

services:

Copyeditor:

CT 06830

Typesetter:

Cover designer:

Printed in the United States of America 1

2 3 4 5 6 7 8 9 19 -

CM

- 02 01 00 02

TIPS Technical Publishing Adrianne Harun Lorraine B. Elder Leslie

Haimes

contents

preface

xi

xv

acknowledgments

Part I 1

Introductory elements

Introduction

3

1.1

On programming

1.2

On

'

Perl

7

Getting started 1.3

2

3

4

10,

A bigger picture

Writing code

Peri

18

i

2.2

Naming

2.3

Comments

24

2.4

Being

12

.

29

2.5

A quick style guide

21

strict

31

32

A first program Specification

debugging

Getting help

20

Structure

Writing programs

11,

15

2.1

3.1

Running

34 34,

45,

Design

35,

Maintenance

V

Coding 47

40,

Testing and

faqgrep

3.3

Exercises

Part II 4

49

3.2

Essential elements

Data: 4.

55

1

types

and variables 60

Scalar data

63

Scalar variables

4.2

Expressions

4.3

List data

65

67

Array variables

5

6

59

Hash

69,

4.4

Context

4.5

References to variables

4.6

Putting

4.7

Exercises

variables

73

it

77

'

', .

Selection statements

5.2

Repetition: loops

84

5.3

Logical operators

89

5.4

Statement modifiers

5.5

Putting

5.6

Exercises

92

together

97

,'

6.1

File handles

6.2

Pattern matching

99

and join

The

6.5

Putting

103 105,

Regex language constructs

6.6

Exercises

107,

112

113

DATA file handle it

^ 5

98

Matching and substitution operators

6.4

,

92

Simple I/O and text processing

Split

.

80

5.1

6.3

'

78

Control structures

it

74

76

together

Matching constructs

vi

71

together

114

116

120

CONTENTS

7

Functions

121

\

7.1

Scope

7.2

Global variables

7.3

Parameters

7.4

Return values

7.5

Designing functions

7.6

Parameters and references

7.7

Recursion

7.8

Putting

123

it

8

8.1

'x,

;

;

,

129

139

141

143

structures

Nested hashes

147,

149

8.3

References to functions

149,

-

150

-

,

,

152

,,

;



,

153

Closures 8.4

Nested structures on the

8.5

Review

8.6

Exercises

fly

155

158 159

160

/

POD

9.1

User documentation and

9.2

Source code documentation

Other uses

of LP

Tangling code

A simple tangler

CONTENTS

Routine examples

137,

140

Scope and references

9.4

r;

134

mathq program

8.2

9.3

^

137

together

Creating references

Documentation

,

131

Nested or multi-dimensional arrays

9

i

,

.

and aggregate data structures

Mixed

.

'

>

.

.

-

135

Exercises

References

'

'

127

Revisiting the

7.9

127

Further resources

.

161

164

169 170 170 178

vii

Part III 10

Practical elements

183

Regular expressions 10.1

The

basic

10.2

The

character class

components

184

188

Search and replace: capitalize headings shortcuts

10.3

Greedy

10.4

Non-greedy

10.5

Simple anchors

10.6

Grouping, capturing, and backreferences

10.7

10.8

quantifiers: take

11.1

what you can

quantifiers: take

what you need

Inserting

commas

Exercises

201

1

13

96

number

in a

operator

11.2

The

11.3

Strings within strings

11.4

Translating characters

11.5

Exercises

198

' .

206

207

208 211

212

.

214

lists

215

12.1

Processing a

12.2

Filtering a

12.3

Sorting

12.4

Chaining functions

12.5

Reverse revisited

12.6

Exercises

More I/O

198

203

substitution operator

Working with

13.1

195

202

text

The match

192

193

Context of the match operator

12

class

191

get

Other anchors: lookahead and lookbehind

Working with

Character

89,

191

Prime number regex

//

1

list

217

list

lists

"

217

:

^

'-

221

223

224

225

Running

external

commands

226

CONTENTS

14

13.2

Reading and writing from/ to external commands

13.3

Working with

13.4

Filetest operators

13.5

faqgrep revisited

13.6

Exercises

228

directories

16

17

.

230

233 -ry::.--^

modules

236

14.1

Installing

14.2

Using modules

237

14.3

File::Basename

238

i

14.4

Command line options

14.5

The

14.6

Fetching webpages

dating

game

239

241

243 243 -

14.7

CGI.pm

14.8

Reuse, don't reinvent

14.9

Exercises

249 253

255

256

Debugging

Part

.

229

Stock quotes and graphs

15

-

234

Using modules

227

15.1

Debugging by hand

15.2

The

257 262

Perl debugger

rV Advanced elements

Modular programming

271

16.1

Modules and packages

16.2

Making

16.3

Why make modules?

16.4

Exercises

Algorithms

a

274

278

279

and data structuring

17.1

Searching

17.2

Sorting

CONTENTS

module

272

280

281

283

ix

17.3

Heap

17.4

Exercises

291

Object-oriented programming and abstract data structures

18

What

18.2

OOP in Perl

is

295 295,

basics

Stacks, queues,

and linked

Stacks

Queues

301

lists

Linked

307,

lists

309

314

OOP examples

More

315

19.1

The heap

19.2

Grades: an object example

19.3

Exercises

as

316

an abstract data structure

320

330 ^^

What's

appendix A

left?

^

331

Command line switches 333

appendix

B

Special variables

appendix

C

Additional resources

appendix

D

Numeric formats

glossary

342

index

301

18.4

301,

299

Inheritance

Abstract data structures

Exercises

292

293

18.3

18.5

20

OOP?

18.1

The

19

286

sort

348

336 338

340

" '

.

;

CONTENTS

preface The Norse God Odin had two

He would send them

ory).

they would return and

ravens,

out each day to

him

tell

all

Thought and memory,

resources.

Hugin and Munin (Thought and Memthe corners of the earth.

fly to

Odin knew how

their secrets.

cogitation

and

recall,

a programmer, these are the important resources

aims to be your guide in should have the will

skills to

By

this endeavor.

own hugin

Perl

his



as

you too must tame. This book you

you

finish this book,

to scour the

web

manage and query

for interesting

the database of

collect.

are a lot

recommend

manage

processing and storage

the time

program

information, as well as a munin program to

There

night,

manage your own Hugin and Munin. In other words, you

be able to write your

information you

to

At

of books about Perl on the market today, and some of them

highly. (See

Appendix C, "Additional

resources.")

However,

I

many

authors of these other Perl books assume readers are already familiar with pro-

gramming. Other authors take the ulary

side-effect

approach, teaching readers the vocab-

and syntax of the language but offering few guidelines on how

effectively.

I

do not

believe that the side-effect approach

is

an

effective

to use

it

means of

teaching programming.

This book instead presents the basic elements of programming using the context I

of the Perl language.

merely

teach

I

hammer you with

do not assume that you've programmed syntax and function names. This

you both programming and

you need

to

Perl,

become an accomplished

from the

Perl

basics to the

book

before, nor is

do

designed to

more advanced

skills

programmer.

Audience This book

is

intended for two types of readers: those approaching Perl as their

first

programming language and those who may have learned programming off the cuff

xi

but

now want

a

more thorough grounding

in

programming

in general,

and

Perl in

particular.

More people than for

to

Common

its

Gateway

Some

popularity.

programming

Perl

front that this

is

book

an example or two learning belt,

how

ever are learning Perl. Undoubtedly, Perl's widespread use

to

people need Perl

cool.

is

to

understand up

not about using Perl for web-related programming, although

illustrates that application

it

while others just think

skills for their jobs,

Whatever your motivation, you need

program using

you can apply

and web-client programming contributes

Interface (CGI)

Perl.

of

Once you have

to a multitude of problem

This book does not assume that you

programmed

Perl. Instead, this

that

book

is

about

knowledge under your

domains.

know what

variables, arrays,

and loops

However, familiarity with basic mathemat-

are,

or that you've

ical

concepts and logic will certainly be helpful. Readers with no prior program-

ming experience should, of through the

first

course, begin at the beginning

and work

their

way

nine chapters in order. Chapters 10 through 15 are largely inde-

pendent and can be read advanced

before.

any

in

Chapters 16 through 19 introduce

order.

Each of these chapters

Perl concepts.

lays a

foundation for the following

chapter, so read these four chapters in order. If you are already familiar

to read chapters 2

which you wish

and

3,

and then pick and choose chapters

to improve. For example, chapters 6, 10,

aspects of regular expressions If you are a

with elementary Perl programming, you

and matching

competent programmer

that tackle areas in

and 11 cover

different

operators. Chapter 8 covers references.

in another language, this

useful in demonstrating Perl's

way of doing

not be

and the content

as concise as you'd like,

may want

things. is

book may still be

However, the discussions

not organized

as a reference

may

book.

Organization The book

is

organized in four main parts, starting with things to consider before

you begin programming, followed by the Perl.

The

third section explores a

Finally, the later chapters

structures

essential aspects

few of the more

practical

and

Perl-specific areas.

introduce more advanced concepts, such as abstract data

and object oriented-programming using

Introductory elements

of programming with

The

Perl.

three chapters in this section provide elementary

information on programming and the Perl language. Chapters 2 and 3 also delve into the basics of

program structure and design. In chapter

two examples, providing a whirlwind tour of the

xii

3,

we work through

Perl language in the process.

PREFACE

Essential elements

you need

structures

from

Chapters 4 through 9 cover the essential concepts and

program

to learn to

effectively.

variables to loop control constructs to

expressions to subroutines to references ish this section

of the book, you

will

Here you

input and output to basic regular

file

and nested data

have

all

the tools

applications.

structures.

you need

Chapters

to the Perl language, exploiting

1

0 through

1

into areas

some of Perl's unique and powerful

explore regular expressions in

more

detail, string

and

>

V

;

you

5 take

When you

fin-

to build real-world

,

Practical elements

we

will find everything

more

f

specific

Here

strengths.

processing,

list

.

more

input and output techniques, using modules, and the Perl debugger.

Advanced elements

Chapters 16 through 19 provide an introduction to more

advanced programming techniques, including building modules and abstract data

You

structures.

programming

are also introduced to object-oriented

Chapter 20 mentions a few areas not covered in

this

book and

features in Perl.

suggests references

for fijrther study.

The

Appendices

four short appendices cover command-line switches, special

Perl variables, additional resources for readers,

and a

brief explanation of binary,

and hexadecimal numeric representation. Following the appendices

octal,

is

a

small glossary of technical terms used in this book.

Source code, solutionSy The

source code for

and errata

many of the example programs and modules

book may be obtained from Manning's www.manning.com/Johnson for

presented in this

website. Point your browser to httpill

links to the online resources for the book, including

source packages.

Many

chapters have a small

appendix of answers to these

number of

exercises.

exercises at the end.

There

is

no

However, the web page mentioned above

contains a link to a solutions page. Finally,

some

errors

although

may

tains a link to

an online errata

errata sheet

address to

PREFACE

strived to eliminate mistakes

have slipped through.

book was published. on the

we have

If you find

and

fix

them

which you can submit

The

from the manuscript,

previously mentioned

web page con-

listing corrections to errors discovered after the

any

errors, please let us

in a later printing.

The

know errata

so

we can

page

lists

list

them

an email

error reports.

xiii

— Conventions In this book, "Perl" (uppercase P) refers to the Perl

programming language, while

"perl" (lowercase p) refers to the perl compiler/interpreter or the perl distribution.

Filenames and

mands you might

Some

font.

URLs

appear in

italics.

command

issue at the

Code, program names, and any comline

prompt appear

blocks of code are written using a form of literate

in a

fixed-width

programming (LP)

syntax to break the code into smaller chunks for presentation (explained in chapters

3 and

In these cases, the real Perl code, or the pseudo-code,

9).

fixed-width

font, while the lines representing literate

zn italic fixed-width

Many sary.

You

the terms

first

appear in the

commonly used

able or the contents of a string. directly related to

terms

—and many

whatever similar

text,

book

this

are in

they are

are defined in the glos-

italicized.

words foo and bar throughout

will also see the

generic terms

programming syntax

^onl.

of the technical terms introduced in

When

in a plain

is

in syntax

the point of the example

vari-

is

not

being called "foo." You might encounter these

"dummy"

in publicly available articles

book. These are

examples to represent the name of a

They are used when

is

this

words, such as

f oobar, baz, qux,

and quux

and examples on programming.

Author online Purchase of Elements of Programming with Perl includes free access to a private Internet

forum where you can make comments about the book, ask

questions,

and

receive help

the forum, point your

from the author and from other

web browser

rules

xiv

forum once you

To

to www.manning.com/Johnson. There

be able to subscribe to the forum. This access the

Perl users.

technical

site also

are registered,

you

provides information on

what kind of help

is

available,

access will

how

to

and the

of conduct on the forum.

PREFACE

"

acknowledgments No

author works in a vacuum, and

people deserve

my

I

'

^

am certainly no

gratitude for their help

exception.

A large number of

and support while

undertook

I

this

"little" project.

First

my ideas,

I

would

letting

like to

me

thank Marjan Bace, the publisher, for taking a chance on

run with them, and reining

to take seriously the view that an author

His pleasant and personal

welcome

relief

from the

style

and

toils

in

and publisher

during our

stresses

me

many phone

when

necessary.

and

are partners,

He it

seems shows.

conversations was always a

of making a living while writing a book.

Many other people at Manning contributed considerable efforts to make the book you now hold in your hands something of which I can be proud. I'd like to his many insightful suggestions; Ted Kennedy, Mary Piergies, for managing the production protesting the results of my script to convert the LaTeX

thank Brian Riley; Ben Kovitz, for for

managing the review

cess;

Syd Brown,

chapters into

process;

for patiently

MML format; Adrianne

in copyediting;

Harun,

for her steadfast attention to detail

and Robert Kern, Lynanne Fowie, and Lorraine

Elder, for turning

the manuscript into typeset pages.

A great many reviewers also

caught embarrassing errors and glitches and pro-

vided helpful comments and suggestions.

I

would

like

thank Tad McClellan,

Randy Kobes, Brad Fenwick, Jim Esten, Paul Holser, Dave Cross, Patrick Gardella, Mike MuUins, Michael Weinrich, Peter Murray, Richard Nilsson,

Umesh

Nair, Vasco Patricio,

provided crucial advice

at a

responsible for helping

while

I

My brother Brad Johnson also

couple of junctures in the book. All of these people are

make

this a better

book.

Any problems

or errors that

my responsibility.

remain are Parts

and Richard Kingston.

of

was

this

book were

at Bellamy's, the local

always find one of

booth when

I

my

my

written, or at least conceptualized, in

corner

preferred beers

needed to work, and a

pub and

on

restaurant,

I

could almost

tap. Bellamy's offered a quiet table or

seat at the bar

XV

where

notebook

when I needed

rejoin the living.

I'd like to

thank Linda

E., the bartender, for

not only serving up good

ale,

but also

being a cheerful friend.

Anyone who

uses uses Perl

and enjoys

gratitude to Larry Wall for creating this

thanks are due to

Tom

Christiansen,

it

as

much

gem and

as

giving

I

do owes a

serious debt of

to us so freely. Also,

it

whose long and continued

efforts

many

have been

instrumental in creating and maintaining a vast and informative documentation set that, in

Thanks from

my

opinion, remains unmatched in any other software documentation.

are also

whom

improve

Perl,

and taking Lastly,

I

to the

many and

Perl into

rest

new

of the Perl community

not;

my wife,

least, I'd like to

Susanna, just for being

xvi

continually

work

for continually keeping stories,

to

keep sharing code and ideas

thank

my

faults.

.

who

she

.um,

eccentricities);

Joseph, for their patience and understanding

games, reading

who

mother and three brothers

encouraging and never lacking in good advice whether you want

hours and countless other

and

who

on comp.lang.perl.misc

territory.

but certainly not

are always

varied regular posters

have learned much, the perl5-porters

and the

who lar

due

me

building

is

(and for putting up with

when Dad

and

it

or

my irregu-

my sons, Thomas

and

disappeared into his ofHce,

in touch with the pleasant side of reality: playing

snowmen, and other simple

pleasures.

A CKNO WLED GMENTS

Introductory elements

Introduction 1

.

1

1.2 1.3

On programming On Perl 7 A bigger picture

4

15

— To

write a story, fiction or fact, one

enough

at least so that

readers.

One must

manner

that

must grasp the

basics of the language in use

one can make simple statements that may be understood by

also

be able to string together a

communicates meanings and events

Stories also have certain compositional elements

series

of such statements in a

in relation to

time and space.

beginning, a middle, and an

like a

end, although the presentation need not proceed in that order. Listen to writers talk about writing, result

in the

mind and simply

writes

claiming that inspiration result

you'll find that writing

and

rare

is

Much more

down.

it

meaning

is

once

often, you'll find writers

—and

and usually comes during

fleeting

many ways

similar in

how

are described in space



and ends

are also

as a

nor a

visionaries,

like writing, that

and

skill levels

of statements such that events and

and time. Structural elements

immersed an

is

in the



Finally, real inspira-

is

often a subject of discussion.

neither an art achievable only by innately talented right-brain

is

strictly scientific left-brain logic

of

first

principles. It

on

some programmers

little

novels, others write pulp fiction,

write

all

trade,

go on with their

1.1

squares of adhesively backed yellow paper.

write elegant programs solving massively complex

is

all

manner of tools and big and

who

are not pro-

little

programs and

about

telling narra-

real jobs.

programming

is

a craft like writing,

and writing

programming about? Programming

is

is

about solving problems.

On programming

Computers talk

crank out

and many

kinds of narratives from business

problems; others produce solid utilitarian code daily; and many,

what

a craft,

for a multitude of purposes.

reports to postcards to notes

So, if

is

can be learned, practiced, developed, and honed to a variety of

Some writers write great literary who are not writers by trade

grammers by

beginnings, mid-

development process, not before.

art or a science

people

Similarly,

One must know the basics of

important in program composition.

Whether programming But programming

to writing.

to string together a series

tion generally occurs while

we

seldom the

— the writing process, not the other way around.

the language and

tives,

is

realizes the entire story at

of

Programming

dles,

and

of any "holistic inspiration" where the writer

are mindless devices capable only

about the

activities

of what a program

hardware into a

is

of doing what they are

told. Before

of programming we need to have a basic understanding

and the

role

it

plays in turning an expensive piece of mindless

usefiil device.

Imagine the following scenario: You are taken to a room containing a desk and a chair.

4

On

the

left

of the desk

is

an "in-box"

fiill

of pages of numbers, and on the

CHAPTER

1

INTRODUCTION

right

an empty "out-box." In between

is

lies

a manila envelope, a calculator, a

pad

of paper, and a stack of blank forms. You are told to open the envelope and follow the instructions you'll find inside. take the

first

The simple and

these calculations into certain

sets

of those numbers and to write the

into the out-tray

you must place the newly

and begin again with the next page of numbers

In this scenario,

you

are operating essentially as a mindless

taking input, performing simple operations, instructions are simple in

results

out form

in the in-tray.

computing

and writing out the

and do not require "thought" or

numbers, operate the

filled

of

When

numbered boxes on one of the blank forms.

you've completed one page of numbers,

results

to

page of numbers from the in-tray and perform certain simple arith-

metic operations on particular

you can read

you

explicit instructions tell

device:

results.

The

interpretation, merely that

calculator, temporarily store intermediate

on the pad of paper, and control the output device

(pencil) to write out the

results (see figure 1.1).

Memory Instructions

Other Storage (scratch pad)

Output

Input

Input

CPU

Output Device

(brain)

Device

ALU

Figure

1.1

An

(calculator)

Processing data

important point to note

in this scenario.

You may be

just

is

that

you have no idea about what you

one part of a team performing various

are

doing

steps in a

complex encryption/decryption scheme, or you might simply be balancing checkbook. At any

rate,

tion to being boring ties

you would not

and

repetitive



nonetheless require attention to

but don't absolutely free your mind.

such a

likely

enjoy this particular

activity.

It

small details.

would be hard

In addi-

—such

activi-

They occupy your

brain,

the usual definition of mindless

many

my

to meditate while

performing

task.

The most important component in the above situation is the set of instructions. The mindless brain can be replaced by a mindless central processing unit

ON PROGRAMMING

5

(CPU) controlling simple mechanical/ electrical devices metic operations and temporary storage. However, given set of instructions to control such a machine.

it

for input, output, arith-

takes a

mind

Anyone who has

to conceive a

stayed

up

late

trying to put together a "some assembly required" toy for their

on Christmas Eve

child can appreciate the problems of working with incomplete, out of order,

and

badly written instructions. I

rate.

said

programming

You not only have

is

about solving problems, but that

to think of a

way

develop that solution into a complete step-by-step solving the problem.

When

a

method

of simple, repeatable instructions,

An

we

for solving a

result. If

each step

is

An

of simple instructions for

problem

is

reduced to a

series

must be simple and not subject to

algorithm must also always generate the same

performed

fliawlessly,

the

outcome

is

guaranteed. Program-

about creating algorithms to perform simple or complex tasks and trans-

lating these algorithms into instructions that a

The as

have to

also

algorithm can be specified in any language, but regardless of how complex

interpretation or intuition.

is

set

you

that set of instructions an algorithm.

call

the algorithm might be, each step or instruction

ming

not entirely accu-

is

to solve a given problem,

particular instructions that a

computer can perform.

computer can execute

form known

are in a

"machine language" or simply "machine code." Machine code instructions con-

sist

of sequences of numbers, ultimately reduceable to zeros and ones

code). These instructions deal with into a particular

memory

address, or adding a

low

level operations

location, reading a

number

such

(i.e.,

as storing a

number out of a

particular

binary

number

memory

into an accumulator. For example, the following

snippet of machine code causes two numbers, which are stored in memory, to be

added together and saved into another memory location for

later use:

'"

8B45FC 0345F8 8945F4

'

Of course, we humans

'

have a hard time trying to formulate algorithms in a

machine language so we soon developed a language consisting of abbreviated words

to stand for the particular operations a given

could write programs out in

this

machine could perform.

We

"assembly" language, then translate the language

machine code using one machine instruction per each

into the corresponding

assembly instruction.

Assembly code was

easier to write

than the machine code

it

replaced, but

because different machine architectures used different machine code, each had to

be programmed in

its

own

one machine type had

version of an assembly language.

to be rewritten to

problem with assembly languages tions of moving

6

around

bits

is

A program written for

run on another machine type. Another

that they

still

dealt with the

low

level instruc-

of information.

CHAPTER

1

INTRODUCTION

Higher

level

languages were later developed to allows instructions to express

higher level concepts.

With

a

low

level

language

like

an assembly, one had to write

individual instructions to place a

number from

a specific location in

a temporary holding area;

number from

a different

finally, store

high

add

a

memory

one could write a

into

location; and,

memory. With

the result of that operation in a third location in

level language,

memory

a

single instruction that encapsulates the con-

cept (see figure 1.2).

Machine Code

Assembler Code

movl a, %eax addl b, %eax movl %eax, c

8B45FC 0345F8 8945F4

Comparison

Figure 1.2

High Level Code

c

of macliine, assembly,

a + b

=

and high

level

languages

Aside from the obvious advantage of making the instructions easier to read

and write

for

ment down

humans, compilers were created the corresponding assembly

to

that translated each high level state-

and machine codes

for different

machines. This meant that one could a write a program once in the high

guage and be able to run

it

on any machine

that

had a compiler

level lan-

for that language.

Early high level languages included Fortran, designed primarily for mathematical

computing;

intended

COBOL,

initially as

from which

exist

Perl

On

to choose, each with particular strengths is

about the language that

of

When

will

and weaknesses. The

be taught in

its

initially released in

1987.

nature (and certainly says something about

this

book.

inception

Perl's

its

is

creator as well).

Larry was faced with a problem that involved complex text processing and

report generation exceeding the capabilities of

what could be done

standard Unix tools like sed and awk he

made

viewpoint

he saw that

whole

Pascal,

Perl

was created by Larry Wall and

illustrative

programming; and BASIC and

teaching languages. Today, a wealth of high level languages

remainder of this chapter

1.2

for business related

to solving this particular task,

class

easily

with

a choice. Rather than restricting his his

problem was

just

one of a

of problems for which existing tools and languages provided no simple

solutions. So, in the spirit of long-term laziness, Larry

Wall created a new tool for

solving such problems. Perl did not capabilities

ON PERL

simply

fill

a gap in the existing toolset. Perl incorporated the

of existing tools and borrowed freely from other languages.

It

soon

7

became the language of choice (and

its

for getting things

done

in the

usefulness soon spread to other operating systems

another way, Perl didn't

one particular gap

fill

Unix environment

and environments). Put

in the toolset;

it

became

the tool to

use for filling gaps everywhere.

Designed not only for Perl

has become

incarnation,

it

practical usefulness, but also for

offers

and quite

at the

module system,

of the features

all,

moment. You want

you can write programs

I

just

a

may

program

on your hard

So what

analysis.

is

good

Perl

to interact

drive,

but the

new

files

Perl,

and writing plain

realize all the

code that

easily write

or directories, rename

files,

you save

files

files,

tasks

and more. but one

is

use of Perl.

ments you send

to a database server

to a webserver

simply encoded

text. Similarly,

A

significant portion

search capabilities are

and the

results

HTML pages

and the

programs use a simple, encoded (CGI).

ways you might

will read in directory list-

delete (unlink)

Query Language (SQL)

Client-server interactions, such as the Structured

you send

Pro-

text.

are representations

Automating many kinds of system administration and backup

common

and

for?

and directory names themselves

you can

it

some kinds of

with textual data. Not only the plain text file

that

accurate, but

either. Perl isn't well suited for

not seem like a big deal until you

of textual data. With ings, create

you

Perl. Telling

do anything you want wouldn't be

to

Perl excels at reading, processing, transforming,

want

with

lot to

for example, writing operating systems, writing device drivers,

doing heavy numeric

cessing text

references, nested data

mentioned don't mean a

know yN^iztyou can do

to

wouldn't be too far from the truth

programming:

current

a bit more.

Perhaps some, or even

you

its

extremely powerful regular expression enhancements, object-

oriented programming, a well defined structures,

continued expansion,

programming language. In

a powerful general purpose

it

it

returns, or the

sends back, are

all

HTTP

communications between a webserver and external text protocol

of Internet

powered by

and pass

write programs that "glue" language.

it

Perl

know

sites

as

Common

Gateway

dynamic content or

that provide

back to the webserver. Because

because of

its

Interface

programs that handle requests from the web-

work between other programs,

And

requests

largely plain or

server; turn these into queries for a database's server; collect, transform,

the resulting data;

state-

Perl

Perl

is

makes

and format it

so easy to

often referred to as a

wide use sticking various things together

throughout the Internet, Perl has also been called the "duct tape" of the Internet. I

mentioned

that Perl

module system. This means

mon for

8

ones

—have

anyone

was designed that

to

grow and

many of the common

has a well defined

that

it

tasks

—and some uncom-

already been encapsulated into modules that are freely available

to use (see the discussion of

CPAN

later in this chapter).

CHAPTER

1

When

you

INTRODUCTION

want

to write a

CGI program,

you can use

(FTP), or interact with a database, that provides

your

do

write client software to

file

and debugged module

a well tested

most of what you need, leaving you only

transfer protocol

add the code

to

to deal with

specific task.

Not only do modules make

it

modules provide extensions that allow you itself

might be

less

well suited.

I

common

easy to use Perl for

but some

problems for which

to use Perl for

mentioned above that

tasks,

may

Perl

Perl

not be a good

choice for numeric analysis. However, there are modules designed to extend

Perl's

handling of numeric data by providing arbitrarily large integers and floating point

numbers. There

is

Data Language (PDL) module package, which pro-

also the Perl

on

vides extensions for doing fast mathematical computations

numeric

ten in that language

is

languages are interpreted, meaning that a program writ-

read by an interpreter program that translates each state-

into the appropriate

machine code and executes

cannot be detected until the program

is

encounters a statement that generates an

meaning

that a

code before

whole program

is

even run. (Some

is

caught until the program

you run

is

it.

already running and the interpreter error.

Other languages

many

errors, called

running.) Perl

is

run-time

to the perl interpreter

many

and sometimes

will

errors,

not be

still

both interpreted and compiled.

program and compiles

an internal format (not machine code), then interprets Thus, in

are compiled,

kinds of errors can be caught before

a Perl program, Perl reads the entire

like a regular interpreter.

In such languages, errors

read by a compiler and translated into machine

can be run. In this case,

it

the program

When

of

data.

Some programming ment

large matrices

it

into

this internal representation

places in this book,

I

will

sometimes

— but they both

to the perl compiler

refer

refer to

perl itself Perl, however, refers to the Perl language.

The advantage of such errors

a compiled/interpreted system

is

that

kinds of

can be detected during the compilation phase, before a program begins exe-

cuting. Yet

you do not need

executable each time you

guage such

as

to separately

make

compile and link your code into a binary

a change, as

you would

for a strictly

C. You can simply type in your program and run

piler/interpreter takes care of the rest.

The

tradeoff

is

a

little

it.

whose statements

are individually interpreted with each

programs usually run very Yet another benefit of

many lower level compiled cifically

fast,

Perl's

ON PERL

is

running.

perl

faster

com-

A

than one

run through. That

interpreted nature

is

said,

memory management.

languages, like C, your program

When

The

especially for text processing types of tasks.

with allocating and releasing the necessary

your program

compiled lan-

loss in speed.

program compiled into machine executable code runs somewhat

Perl

many

you program

is

memory

In

required to deal spe-

for storing data while

in Perl, the perl interpreter takes

9

memory when needed and

care of allocating extra

releasing that

memory

so

it

can

when it is no longer needed. This doesn't mean you can completely ignore memory issues. You still need to make choices such as whether to read a file completely into memory or read a file one line at a time. But you don't have to worry about actually allocating and releasing the memory yourself. be used again

Later in this chapter, greatest thing since

I

will

continue

peanut butter

this discussion

of Perl and

why

it is

the

bread really wasn't such a big deal until

(sliced

peanut butter arrived on the scene). Right now,

turn to the

let's

more

practical

matter of ensuring that you have a perl distribution up and running so that you

can begin your Perl programming journey.

1.2.1

Getting started

If you are

not using a system that has perl

distribution

and

install

it

The

yourself

installed,

latest

you

will

have to obtain a perl

source distribution can be found

Network (CP7\N)

on

the

Comprehensive

src.

Pointers to binary distributions for various platforms can also be found there

Perl Archive

in the /ports directory.

and the process

is

own

On

can be a

downloading the

unpack

to

it

it,

$ $ $

$ $

file.

.

time consuming.

The

installed.

first

process

is

thing you want to do,

from the above mentioned

CPAN

README file

site, is

for

your

is

just

,

You

will

have to answer a variety of ques-

your system and where you want things

all that's

The

;

'

configure step will take awhile.

NT systems

Win/

compiler

version of perl. Essentially, the process

'

usually

probably want to compile

This should provide you with enough information to

rm -f config.sh Policy. sh sh Configure make make test make install

The

C

will

go into the resulting directory, and read the

own

tions about is

little

you have a

latest distribution

system and the INSTALL build your

you

a Unix-like system,

version of perl, assuming

simple, though after

distributions contain detailed installation information,

usually painless.

Unix-like systems your

The

http://www.perl.com/CPAN/

at

needed in most

On Win32

ActiveState version of perl, which

installed.

Picking the defaults

cases.

systems, your best bet is

is

probably to obtain the

available at http://www.ActiveState.com/

For Win95, you will probably need to get the before starting to install the perl distribution.

DCOM package and install

You can

find

it

it

at

http://www. microsoft. com/com/dcom/dcoml_2/download.asp

10

CHAPTER

1

INTRODUCTION



;

Installing the ActiveState version of perl

The

archive.

you have good reason not

defaults unless

able to run perl

a matter of double-clicking the

is

process will ask a few questions and

install

command prompt and

from the

you should accept the

to accept them. After this,

run the perldoc

you should be

utility to access

the documentation (see later in this chapter).

MacOS

You can

get a compiled binary distribution of Perl for the

the ports section of

CPAN:

link should automatically redirect

you

Installing this version involves

and

few configuration

setting a

README

file.

The major

the help

menu.

1.2.2

Running Perl

MacOS

in

http://www.perl.eom/CPAN/ports/index.html#mac. This to a nearby mirror site.

unpacking the archive, starting the program,

details that are

sections of the

pointed out in the included

documentation should be accessible via

Once you have perl installed, creating and running a Perl program is a simple process. The following is a simple, one line program that prints the string "Hello World": \

print "Hello World\n" Create a fortable.

find a

new

text file (plain text) using

This should be a text

list

editor,

any editor with which you

not a word processing program

are

com-

—you can

of decent editors for various platforms by pointing your browser

at http:/

/reference.perl, com/query. cgi?editors

Now

enter the above statement. Save the

program from the command

file as "first."

You can then run

the

line as

perl first

On many Unix-like

systems you can create your script to be run as

executable program. This

gram and Here

is

method means adding

setting the executable bit

the

new

on the

file

a special

first

line to

if it is

an

your pro-

with the chmod Unix command.

program:

/usr/bin/perl print "Hello World\n" #

!

The

first line,

characters #

!

called the

followed by the

"pound-bang" or shebang full

path to where perl

in other words, the absolute directory path to the perl

ON PERL

is

line starts

located

with the two

on your system

program. If you save

this as

11

and then type chmod +x first

before

at the

prompt, you can then invoke the

program hke: first

,T, ,

i:.

If the current directory

PATH (the environment variable hsting you may need to qualify the above call as

not in your

is

search paths for executable programs),

./first.

The

ActiveState port

into a batch

you

If

good idea put one chapter

(see

2).

called a "droplet,"

is

turn your perl program

utility to

PATH and called like any other program.

in the script, line

in,

is

and then choose run script from the script

not necessary on non-Unix systems, but

because perl will check

With MacPerl, you can which

is

a version

it

for

command

always a

it is

line switches

such

also save the script as a Mac-specific

you can execute by double-clicking

its

as

item

icon.

Getting help

1.2.3 Perl

pl2bat

MacPerl then you should be able to simply choose new from

are using

menu. The shebang

-w

a

that can be placed in your

file

menu, type

the file

comes with

a relatively easy language to learn, but

not a small language. This book

it is

does not attempt to be a reference for the Perl language. If you have perl installed, however, you already have the most up-to-date language reference available. perl distribution includes a large

Unix manpages and/or

amount of documentation

The raw documentation

format called Plain Old Documentation {POD), and included perldoc utility (or shuck on the Mac).

you can enter perldoc perl list

installed as

at

your

command

is

in a plain text

mark-up

is

also readable using the

To view

the initial perl pod-page

line

prompt. This document pro-

of the remaining sections of the core documentation. Another useful

starting page

tents

is

HTML pages (or some other format, depending on instal-

lation configuration details).

vides a

that

The

is

perldoc perl toe which provides a more in-depth table of con-

of the Perl documentation.

One

extremely useful

of documents

set

is

the set of perlfaq documents. Very

down into smaller problems and addressing those. However, when learning a new language, some of the smaller problems are often difficult because you do not yet know how to express the solution in the context of the language you are learning. This is when it is time often one begins to tackle a problem by breaking

to turn to the Frequently

Do

Asked Question (FAQs).

not assume that the

FAQs only

that your question will not be

but they are no

12

less

it

found

valuable for that.

address simple or

there.

On

"little"

Many FAQs do

questions and

have simple answers,

the other hand, there are also

CHAPTER

1

many

real

INTRODUCTION

programming particular

problem seems

to be, you'll often

we

use in the FAQs. In chapter 3,

FAQs

FAQs.

issues addressed in Perl's

for information

The FAQs

we might

No

and

are viewable lists all

will begin developing a tool to quickly search the

with the perldoc

is

easy or difficult your

need.

of the questions you

Another source of help

how

have good luck finding something of

are divided into nine sections, or

and

matter

The perltoc page

utility.

will find

named perlfaql

files,

to perlfaq9

describes each of these

answers to in each document.

the Usenet community. There are separate news-

groups for discussions on miscellaneous Perl topics {comp.lang.perimisc)., discussions

on

Tk

graphics toolkit

moderated group that you can

read, but partici-

modules {comp.lang.perl.modules), and the

perl

[comp.lang.perl.tk).

There

is

also a

pation requires registration {comp. lang.perl, moderated)

pated in Usenet newsgroups before,

I

recommend

you have never

If

.

that

you

partici-

take a look at

first

news, announce, neivusers.

Although forums

all

of the newsgroups are open to public participation, they are not

for questions addressed in the perl

documentation and FAQs. The people

participating in these groups are knowledgeable to have tried to find answers to

and

helpful, but

you

are expected

your questions in the documentation before turn-

ing to the newsgroup. These newsgroups are not free help desks. If you treat

you

as such,

On

will likely get ignored or worse.

a related note, remember,

ners often approach a learn.

programming

programming language

This can lead to asking questions

like

is

about problem solving. Beginsimply another application to

as

"How do

I

do "X"

learning a

new word processor application, one might formulate

"How do

I

does not provide single Before you ask a

It

approaches

fail,

a question such as

provides ways to formulate solutions to problems.

It

"How do

FAQs

When is

commands

Perl's built-in fianctions (see

search the

my

in Perl?"

documents?" But a programming language

create footnotes in

not simply an application.

Then

them

I.

or functions for every conceivable problem.

." .

question, search through the documentation

perldoc perlfunc)

to see if

to see if your question has already

on

one meets your needs.

been answered.

two

If these

ask yourself how you would go about solving the problem without a

computer. For example, recently a question appeared on the comp.lang.perimisc

newsgroup asking (and not There

you

is

for the first time)

no simple even or odd

how

to

tell if

an integer

is

built-in Perl fiinction that solves this

odd or

even.

problem

for

The obvious question to ask yourself is "How do / tell if an integer is odd?" Most people simply notice if the last digit is one of 0, 2, 4, 6, or 8. If

directly.

even or

so, the integer is even.

Algorithms often

You may not succeed

in solving

not be the optimal solution, but

07V PERL

arise

from such simple beginnings.

your particular problem, or your solution

this

common

sense approach

is

the

first

may

step in

13

thinking like a programmer.

One more

question in the newsgroup,

to use

is

avenue, before resorting to asking your

one of the Usenet search engines

(for

example,

www.dejanews.com) to search the Perl newsgroups for similar questions that have been asked and answered in the

past. Finally, if you

have exhausted

all

may

avenues

of inquiry without success, ask your question on the appropriate newsgroup. Be

on what you've

sure to include information

you

are not just looking for free

Perl

handouts but are actually interested in learning.

Another place to begin exploring a wealth of

Perl related information

home

will find links to

page

sions of the Perl Perl Archive

From

www.perl.com.

at

there

Network (CPAN). At CPAN, you can with

its

you

If

new

are

encounter quite a

when

in the hacker

--esr/jargon/,

may want

the newsgroup

you

community. You can find

started,

and

beyond the standard

it

is

I'll

you may

is

the Jargon

One File.

slang terms

this file at http://www.tuxedo.org/

run through a few terms that you might happen

two standard

if

you do ask

a question

on

answered in the FAQ, you're unlikely to get responses

RTFM which

Quite often, people are

to look at

common and not-so-common

as you begin to read the literature. For example,

These

in general,

or use your favorite search engine to locate a copy.

Just to get

upon

problem domains.

using the above-mentioned resources.

a large dictionary or lexicon of

found

the

and download not only the

programming and/or Usenet

to

of jargon

bit

find

scripts for various

very good, and quite extensive, resource you is

is

HTML ver-

standard libraries and modules, but also a large

number of contributed modules and

This

you

documentation and FAQs, plus pointers to the Comprehensive

Perl source distribution,

Jargon

know

tried so the helpful people there

will

Perl

means, "Read The F****** Manual."

quote material from the "camel" or "llama" books.

books by published by O'Reilly and Associates. The

books feature pictures of the respective animals on

gramming Perl (camel) and Learning

Perl (llama)



their covers

and

are titled Pro-

See appendix C. Both are very

good books, by the way.

A

couple of other frequently used terms are "grep," meaning to search, and

"grok,"

meaning

to understand.

Grep

derives

from the standard Unix search

of the same name. If you want to grok Year 2000 (Y2K) issues in grep the

FAQs

Another of data

fied set

term

as a string (a

of syntactic

rules.

is

"parse." Specifically, parse

More

generally, parse

is

to break

up a piece

—according

to a speci-

means

sequence of characters or words)

often used in the context of sim-

ply recognizing and/or extracting particular bits of data from a larger

You may

you should

for the relevant entries.

common

—such

Perl,

utility

also see reference to "p5p,"

which

chunk of data.

refers to the perl5-porters, a

group

of people responsible for maintaining and upgrading the actual Perl distribution

14

CHAPTER

1

INTRODUCTION

— across the

many platforms on which

it

Another group

runs.



set

of groups

really

the Perl Mongers, a collection of user groups distributed around the world. (Visit

is

http://www.pm.org to find your nearest group or to start one.)

member of the Winnipeg

Perl

Mongers (Winnipeg.pm

happen

I

to be a

for short).

A variation of the Monger moniker seems to be Perl M(o|u)nger, which ular expression talk (you'll learn refer to either

Monger

munge," which means

reg-

about regular expressions in chapters 6 and 10) to

or Munger. Munger, then, essentially, in the

cess, slice, dice, julienne,

is

is

the

noun form of the verb

"to

context of data processing, to parse, pro-

massage, fold, bend, or otherwise mutilate

manipu-

(i.e.,

late) data.

This in

little

interlude barely scratches the surface of the jargon

your journey.

only a

If

nothing

new area of study, but

a

new linguistic

the Jargon File mentioned above

1.3

A

warned you

else, at least I've



you

will

run into

are entering not

arena as well. Don't forget to check out

contains

it

that

you

much more

than mere definitions.

bigger picture

The

previous sections have dealt with basic information

Perl.

Now we will take a brief step back and take in a larger view. My first choice for

a

title

Perl,

for this section

was

"Practical

and the Rest of this Book," but Several years ago,

inspection, repair, different,

I

worked

and Philosophical Remarks on Programming, that

and construction

diver,

tasks, often in

and there was no such thing

we had

—even

for

me.

doing a variety of underwater

fast-moving water. Every job was

as a transportable stable

could be used in every situation. Insted, selection of tools, a

was a tad long winded

an inshore

as

on programming and on

work platform

that

the next best thing: a shop with a

wide variety of surplus construction

steel (rings, bolts, rods,

angle iron, and I-beams), and an arc-welder. This was a hacker workshop.

We

cre-

ated reusable bracing and clamping components, and rigged up a variety of different scaffold systems that could be lowered

from the

barge, positioned,

and anchored

various ways to bridge piers of different sizes and shapes, pilings, or

tems.

That arc-welder was the key

variety of components into

In the

to being able to quickly

and

working solutions we could deploy

programming world,

Perl

reminds

me

very

solidly

brute force

utility.

I

to

its

think a better analogy

quite capable of hacking out

fast,

gate sys-

connect a

in the field.

much of

that

and, in particular, the arc-welder. Perl has been nicknamed the Swiss

saw of programming languages due

dam

in

workshop

Army

chain-

multitude of built-in tools and overall is

that of a Swiss

Army

arc-welder,

sometimes crude, one-time solutions,

one

as well as

building and joining (virtually seamlessly) components for solving more complex or longer term problems.

A BIGGER PICTURE

15

Perl

is

not your average everyday programming language. In

programming language. There

exceptional everyday

programming languages out computer

and

Perl

purists,

seem

scientists

there than

to get great pleasure

seems to have more than

who complain

So what makes

that Perl

its fair

No

and any program you write using

seems to

much

the

some

satisfy the

The language

same high

level

of sticks anyway,

share of detractors from computer science

single feature of Perl

Perl

makes

it

outstanding,

could also be written using another lan-

things easier to accomplish than they might be

in other languages, but this can't be offer pretty

an

too big, ugly, and redundant.

is

Perl so great?

guage. Certainly, Perl makes

lot

is

Of course, some

stick at.

from shaking a

it

more high

are currently

you can shake a

tact,

all

there

is

to

it.

There

are other languages that

level functionality as Perl,

and do

so in a

way

that

above mentioned stick wavers. Yet Perl remains wildly popular.

continues to evolve, and

its

user base

is still

growing unchecked.

What

appeal might Perl have beyond the sheer functionality that might

attract

programmers from other languages, and why would experienced program-

mers

fall

more

language

this

if

other "better" programming languages exist?

blame squarely on the shoulders of Perl's

lay the

I

with

in love

precisely, slightly

first

creator, Larry Wall. (Well,

above and between Larry's shoulders.) Besides being a com-

puter programmer, Larry also has a background in linguistics. Consequently, the fact that Perl has the qualities

of a natural language

some of Larry's own musings on ral.html.

Here

Two dancy I

will

1

no accident. You can read

these qualities at http://kiev.wall.org/--larry/natu-

touch upon only a couple of points in

this regard.

of the things Perl receives criticism for are the richness and the redun-

in the language,

mean

is

that Perl

is

which

are

by no means unrelated

a big language, incorporating

issues in Perl.

and supporting a

By

large

richness,

number of

powerful and specialized features directly in the language. In contrast, other languages tend to be minimal, providing standard libraries for specialized list

tasks.

The

of Perl's specialized features includes regular expressions, pattern match operaprocess

tors,

ming

management,

and directory manipulation, and socket program-

tools, to highlight just a few.

to provide

it

TMTOWTDI

exists in natural

is

"),

Perl.

is its

ability

This

is

Indeed, the Perl slogan

is

which stands

for "There's

task.

More Than

It."

Critics say that this

languages and in

(pronounced "timtoady

One Way To Do But

Another example of Perl's richness

more than one way of saying or accomplishing the same

redundancy;

learn.

file

both richness and redundancy make the language harder to

only true up to a point. Certainly,

I

may

be able learn a complete

minimalist programming language rapidly, but assuming there are libraries providing additional functionality, libraries to

16

do the things

I

I

would then

might want

still

have to learn to use the various

to do. Perl

is

not really more demanding in

CHAPTER

1

INTRODUCTION

this respect.

You do not have

to learn the entire language before

you

the essential elements plus any extra built-in features

present task. Similarly ble

way

to say

Perl's

you

start,

merely

find necessary for your

redundancy doesn't require that you learn every

something before you begin. Redundancy merely expands your

options. Indeed, the positive consequence of these natural language qualities

make

that they

possi-

it

make

easier to think in Perl; they

easier to express

it

is

not

your

thoughts in Perl.

There

is

also a strong sense

of community

among many

More than just a large number of users sharing information on Perl community shares a sort of Perl spirit. Perl's "naturalness" fulness

—not merely

clever

programming

more than fun and games, but

I

think

programmers.

various forums, the lends itself to play-

but poetry, puns, and other games

tricks,

Of course,

of the sort people play with natural languages.

Perl

this

community

significant to note that

it is

coming from other languages who have found

their calling to be

spirit

is

programmers

growing tedious

have expressed gratitude that Perl has made programming fun again.

The

Perl

community

are given freely

on the newsgroups and many

the development and evolution of Perl itself tion of contributed

modules from

ming, unlike most creative originality.

available

concept of sharing. Help and advice

also holds a strong

The

acts,

community

Perl

Perl

talented

Beyond

programmers

places a high is

programmers cooperate on

that all

is

CPAN



a vast collec-

over the world. Program-

premium on

reuse rather than

no exception. Several hundred modules

on CPAN, ranging from database

interfaces to

faces for several graphics libraries to Internet

development

programming

are

tools to inter-

to date manipulation

modules and much more. Whenever you find yourself facing a new programming challenge, check out

module

that will

CPAN. Chances

make your

task

much

are pretty

program with

it.

The

that

someone has written

a

easier.

By now you might be wondering when we start learning to

good

will stop talking

about Perl and

next two chapters concentrate on aspects of

writing good code and the process of developing programs. In the latter chapter,

we

follow the development of two Perl programs from

working programs.

A good deal of Perl will

A BIGGER PICTURE

initial

idea through to

be presented along the way.

17

CHAPTER

P7

Writing code 20

2.1

Structure

2.2

Naming

2.3

Comments

24

2.4 Being strict

29

2.5

21

A quick style guide

31

2



Writing code

guage



is

that

is,

typing in the set of instructions in a programming lan-

only one part of the programming process. In

next chapter,

a relatively small part of the

it is

may

small the act of writing code

The

decisions

gramming which

How

in turn affects

well

how

programming

be in the overall process,

you make here can have a

process.

it is

on

large effect

you write your code

easy

fact, as

to fix (if a

all

affects

problem

we

will see in the

process. But, it is

however

an important

act.

other steps in the pro-

how

arises),

easy

it is

to read,

upgrade, and add

additional features. Perl

—meaning and understand—but

often derided as being a "write only" language

is

written in Perl are inherently hard to read

this

of

all

symbol on a standard keyboard

simply

is

false.

the fianky symbols used in Perl (Perl uses every

for

variety of shortcuts Perl offers the

programs

Some presumed

Mostly, these accusations arise from people unfamiliar with Perl. difficulties arise as the result

that

one purpose or another); some

due

are

to the

programmer; and some occur because people sim-

ply writing bad code (this happens in every language). Although at

first

glance, Perl

appears to be a difficult language to read,

it is

not terribly

difficult to write hard-to-read

code in Perl

—but

easy-to-read code in Perl either. This chapter

not worry tions.

if

you

really isn't. Certainly,

it

is

it is

not terribly

difficult to write

Do

about writing simple, clean code.

don't recognize or understand the Perl code in the following sec-

They are only examples

to illustrate style issues.

and apply these guidelines

tunity to learn Perl

As we mentioned

You will have plenty of oppor-

in the chapters that follow.

previously, high level languages exist for the benefit of the

programmer, not the computer. The computer does not execute the instructions in a high level language. the particular

code:

first

be translated into the machine language of

computer system where the program

high

last chapter,

They must

level

They provide

will be run.

a level of portability across

instructions because certain operations that

all it

the different machine types that easier for

might take

humans

it

easier for

humans

to write the

several lines of

code can be written in a single simple statement in the high

make

in the

languages offer three advantages over low level machine

have translators for that language; they make

they

As we saw

machine

and

level language;

to read those instructions. This last benefit

is

the

subject of this chapter. It is

level

up

to the

programmer

to take full advantage of whatever facilities the high

language offers to produce easy-to-read programs.

read affects

how easy it

WRITING CODE

is

to write, debug,

and maintain.

How Let's

easy a

program

is

to

begin with structure.

19

;

Structure

2. 1

This book

is

organized into parts, chapters, sections, subsections, paragraphs, sen-

and words. Whitespace plays a

tences,

large role in delimiting these elements.

wouldn'twanttoreadabookwithnowhitespaceinitnowwouldyou?

look

Let's

You

at

an

example of bad code written in Perl: $s=0; $i=l;while ($i>= {$response eq 'q' or $response =~ m/''\d+$) $is_valid = 1; else { print "Invalid Input: enter an integer or

{

'q'

to quit\n";

}

The

conditional reads like

this: if

the response equals

tains only digits, then set the valid indicator to Finally,

we

translate the last

loop, or whether the response

A FIRST PROGRAM

q,

or the response con-

1.

chunk, which

tests

whether we should end the

was right or wrong.

43

;

;

;

;

;

;

;

;

= if ($response eq 'q' { $quit = 1; elsif {$response = $solution) { print "Correct\n" else } print "Incorrect: $question $solution\n" )

,

,

,

}

{

}

Once

again, performing our

the top level,

chunk

substitution from the low level back

up

to

we have our completed program:

faqgrep sort perlf aq4 .pod: How do I sort an array by (anything)? perlf aq4 pod How do I sort a hash (optionally by value instead of key)? perlf aq4 pod How can I always keep my hash sorted? .

,

.

.

know

Users would then

that the information they seek

is

of the perlfaq documents, which can be read using the

located in section 4

command perldoc

perlfaq4. Before

we

format of the perlfaq pod question

is

=head2. So for

tem.

files.

The format of these

contained on a single line and

we know

our pattern on a

You

will also

that,

when we

to

result in a lot

pod

files is relatively

about the

bit

simple.

with a pod directive that looks file,

the perlfaq

files

we only need

were

they were installed into a directory

installation

know a little

Any like

to search

with that sequence of characters.

know where

/usr/local/lib/perl5/5.00502/pod find the

starts

to

read through each

line that begins

need

On my system,

To

we need

begin designing the program,

,

.

: ..

installed

on your

sys-

which

will

named

.

,

on your system you can type perl

-v,

of output regarding the configuration of Perl on your system. Near

the end of this output

is

You should

subdirectory under one of these listed directories.

find a

/^o>=

,

/usr/bin/perl -w use strict; «set directory and filename list>> «set search pattern» >= my $pattern = $ARGV[0] or die "no pattern given:

FAQGREP

as the first

element in the @argv array can be referred to as $argv 0

because

in the array. In

program

$

!

"

51

;

We will explain the $ARGV[0]

is

or die syntax

empty or contains

shortly. In this case,

it

simply means that

program

a false value, then the

if

with the

will exit

given error message.

We

have seen the while and until loop structures in the previous example.

Perl has another looping construct that

values. This structure

designed specifically to loop over

is

The

called the f oreach loop.

is

foreach variable (list of values) statements;

syntax of the loop

lists

of

is

{

"'" }

In this structure, the loop executes once for every value in the

each execution, the loop variable

is

assigned the next value in the

During

list.

list.

You may

To

declare your loop variable prior to the loop or directly in the foreach line. ate over each

file

we may

in our array of filenames,

iter-

use

= foreach my $filename (@faq_files) {

}

Before you can read a dle.

A file handle

you need

file,

a Perl data-type that

is

automatically opened the

file

it

handle,

The

it

and

associate

to be read

we may

from elsewhere. Once we open a

read lines from the

syntax of the

file

openO function

a system operation that

failure occurs, so

we

We may

do

this

using a logical

or

operator like

tion.

open

(

f ilehandle

You

,

may

fail

for reasons such as the

using an if conditional, but

$filename) or die("can't open

tor.

The former

ter

4 for a discussion of precedence.)

it

file.

We want

is

file

to (

)

does

know func-

more commonly done

to evaluate the expression

on the

$f ilename

'

:

$

!

"

)

operator instead of the or opera|

just a higher precedence version

When

'

of the same operator. (See chap-

Perl encounters

left,

the open

that expression evaluates to true Perl ignores the right

52

with a

it

so:

|

first tries

is

$filenaine).

always check the return value of the open

will often see this written using the is

Perl

explicitly

associate

open(filehandle,

is:

or the user does not have permission to read that

such a

han-

file

using the same syntax: .

a

file is

and

file

not if

with a

keyboard by default, unless you've

Opening exist,

it

handle stdin for reading, stdin reads from what

called the standard input, usually the

file

open

associated with an input or output chan-

our method for reading input from the keyboard using .

nel. Recall

redirected

to

is

CHAPTER

{

)

it

function in this case. If

hand

3

such a statement,

expression. If the

left:

WRITING PROGRAMS

;

expression

and

exit

Perl then evaluates the right

fails,

hand expression

right

prints

;

its

is

a call to the die

argument

)

(

function,

)

The

to the screen.

value of the current system error, so

die

(

hand

we

expression. In this case, the

which causes the program

special Perl variable $

include

it

in the string

we

to

holds the

!

pass to the

function to provide a better diagnostic message about what went wrong.

We may define our next code chunk as «read files and print matching lines>>= open(FILE, Sfilename) or die{"can't open '$filenaine': while () {

$

!

"

)

" }

close FILE; It is file

^

important to remember that the

handle being read. You do not close

{)

is

the input operator and file

an input operator, just a

file

is

the

handle:

close FILE; Perl does

some

extra

magic when we use the input operator

within a while conditional.

while

(

defined ($_

When you

It

read from a

file

undefined value at the end of the file is

and assigned

only thing

automatically converts the conditional to read

)

=

as the

)

{

using , the input operator () returns an So, in this conditional, a line

file.

to a special Perl variable, $_. This value

is

read from the

then checked to see

is

defined. In this way, the loop will be executed once for every line in the

ting $_ to each line in turn.

We we

time

The loop will

exit

when

the end of the

file is

file,

if it

set-

reached.

use regular expressions again to test for matches against our pattern. This

introduce the substitution operator,

works the same

s/pattem/replacement/. This

match operator with regard

as the

to

matching the pattern, but,

instead of just matching, Perl replaces the found pattern with whatever

is

in the

second half of the s/// operator. The replacement part of the substitution operator

is

just a string,

not a regular expression.

with =head2 and to see

them

if

we

We will use this to find lines beginning

strip off those characters

print out that line from the

without replacing them so we don't

file.

= s/ ''=head2 / / and m/$pattern/) { print " $f ilename \n$_"

if

{

:

}

Notice that we did not use the =~ binding operator with either the substitution or the

match

operators. This

is

because

being bound to a particular variable,

FAQGREP

it is

when

either operator occurs without

automatically

bound

to the special $_

53

;

variable,

which, remember,

is

;

;

;

'

the variable that contains each line in our

,

file

due

to

the magic while condition mentioned above.

That completes the whole program. chunks into

All that remains

their relative places to create the

whole program

to insert the code

is

listing:

= # /usr/bin/perl -w use strict; my $f aq_directory = /usr/local/lib/perl5/5 00502/pod' perlf aql pod 'perlf aq2 .pod' 'perlfaqS .pod' my (afaq_files = perlf aq4 pod 'perlfaqS .pod' 'perlfaq6 .pod' perlf aq7 pod 'perlfaqS .pod' perlf aq9 pod !

'

.

(

'

.

'

,

,

.

'

,

,

.

'

,

,

'

'

)

'

.

;

die "no pattern given: $ my Spattern = $ARGV[0] foreach my $filenaine (@faq_files) open(FILE, $filename) or die "can't open $f ilename while () if (s/^=head2// and m/$pattern/) { print $f ilename \n$_" !

"

|

|

{

'

'

:

$

!

"

{

"

,

:

-

)

,

.

,

close FILE;

This compiles fine with perl

-c but running

to search for a pattern

it

sort produced an immediate error about not being able to open a trates the

kind of extra information that the

We check our open out giving the this

is

where

full

it

openlFILE,

Now

"

call

and

realize that

when

running perl

it

files

with-

variable can provide.

!

we

are trying to

(

trying to

open the

$faq_directory/$f ilename"

will return to

illus-

open the

We knew we needed the FAQ directory location, and needed. We have to change the open function to use the full

intitial specification.

we

)

of

This

pathname.

was

path and filename

(

$

file.

)

)

file:

or die "can't open file:

$!";

faqgrep sort produces the output we showed in our

We will

not proceed further with

and modify

it

this script at this time,

to optionally print out the full answers to

but

match-

ing questions rather than just the filename locations.

We have covered a lot of ground in this chapter. We have taken two programming projects from initial ideas through to working programs. We have also been exposed to quite a If you

don't

bit

of Perl code in the process.

had no previous experience with the

worry

if

some of it seemed

a

little

Perl language prior to this chapter,

over your head.

The main purpose of this

chapter was not to teach Perl, but to introduce you to the process involved in creating programs.

54

Too

often,

it is

tempting to jump right into writing code when

CHAPTER

3

WRITING PROGRAMS

given a problem, especially often have a

seems

like a

way of requiring more complex

When you sit down topic

if it

to write

and purpose or goal of the

an

essay,

essay.

Programming has

edit

revise

tion

you

give to the early specification

it.

solutions than

you need

to

Then you need

and develop an outline of your arguments. and

simple problem. But simple problems

Finally,

a similar

first

to

first

imagine.

have a clear idea of the

do the necessary research

you can write the

development

and design

we

cycle.

stages, the less

way

right

to design a program, or to

the prevailing

wisdom

is

and writing code.

33 1

that .

decompose

you should ^

a

is

no

single

problem into subproblems.

design your

program

first

then

atten-

time you'll have

spend debugging or redesigning and rewriting your program. There

to

essay,

The more

Still,

before diving in

Exercises

Modify the mathq program

to keep a

running score of right and wrong

answers. 2

Design and pseudo-code a program that simulates the rolling of two standard six-sided die

and

prints out the total of the

roll.

Then

consider

how

be modified to display the faces of each dice rolled, for example: a

and

5

might be displayed

#######

EXERCISES

might

roll

of 3

as

#######

####### ###### ####### #######

it

-

#######

55

PART II Essential elements

CHAPTER Data: 4.1

types

and variables

60

Scalar data

65

4.2 Expressions 4.3 List data

67

4.4 Context

73

74

4.5 References to variables

4.6 Putting

it

4.7 Exercises

4

together

76

77

59

At a

basic level, a

program operates on

data.

Not

among programming languages

mental variation

surprisingly, a source

of funda-

how they carve up

the world

is

in

of data into meaningful or interesting types of discrete data.

Some

languages are

drawing fine-grained distinctions between

splitters,

ous kinds of data. For example, in some languages, what

we normally think of as

simply numeric data (numbers) might be divided into integer numbers and

may

ing point numbers. These

be further divided based on their

much

storage space they require in

might

differentiate only

on

tions defined

memory). Other languages

between numeric data and character

These type distinctions

and second,

a given type of data,

may

two ways:

are applied in

A variable,

vari-

first,

float-

size (i.e.,

are

how

lumpers and

data.

in terms of the opera-

as restrictions

on the

types of

we mentioned in the previous chapter, can be thought of as simply a named memory location where a value of a particular type can be stored. In a splitter language, you may have several different types data a variable

contain.

as

of variables: an integer type that can hold only integers (again, perhaps further

on

divided into short and long integer types based

the size of the integer), a float

type for holding floating point numbers, a char type for holding a character, and

perhaps other types Variables

as well.

may also

be

(scalar) or structured. Primitive

classified as primitive

or scalar variables hold a single piece of data; structured variables hold a collection

of data. In a

floating point Perl

list

much

lumper language.

a

singular or scalar values

and

plural or



holding single pieces of data

number

(such as 42 or

3

such a

is

of

list

It

draws a primary distinction between

values. Perl has only

the scalar variable.

A

one variable type

scalar variable

may

for

hold a

It

may

also

hold a reference to another variable or

location (see section 4.5).

Similarly, Perl has a

ordered

list

.14159) or a character string (such as h or hello or this

string has 29 characters).

memory

variable that

of integers, and another array variable defined to hold a

numbers.

very

is

you may have an array type of

splitter language,

defined to hold a

list

list

list

of scalar values.

might

type of variable called an array that can hold an

The

scalars

need not

all

be the same type, for example,

be: (42, 'hello', 3.14159). Perl also has

another plural variable

type called the hash or associative array (see section 4.3.2).

4.1 The

Scalar data

simplest

way

to use data within a

program

represented directly within the program.

world program

60

in chapter

1

We

is

as literal



data

have already seen

that

this

4

DATA: TYPES

explicitly

with our hello

where the program printed out the

CHAPTER

is,

AND

literal

string

VARIABLES

:

Hello World followed by a newline (represented by representations of numbers with

print 42; print 3.14159; print -2;

# #

\n).

Here

are a

few

literal

comments:

an integer a floating point number # a negative integer

Perl also allows for a

few other

literal

representations of numeric data such as

scientific notation:

2.31e4 2.31e-4

is 2.31 times 10 to the 4th power, is 2.31 times 10 to the -4th power,

or 2310C or 0.000231

Additionally, in literal representation only, a

taken to be a is

taken as a

number

number

binary, octal,

in octal (base 8) notation,

and hexidecimal numbers,

Finally, Perl allows

We

bers.

please refer to appendix D.)

notation notation

(base ten) (base ten)

one further notational convenience

numwe use

for representing

is 1369253

is 7214300.312413

important to

realize that these integers

and

floating point

character data for that matter) are not internally represented

sequence of digits you see on your screen (or in

sentation.

with

writing out large numbers:

1_369_253 7_214_300.312_413

binary format.

is

and a number preceded by an Ox

can use underscores with numbers to enhance readability, just as

commas when

It is

a zero

in hexidecimal (base 16) notation. (If you are unfamiliar

is 139 in decimal is 506 in decimal

0213 Oxlfa

number preceded by

Not

all

floating point decimal

The number

this

book).

numbers have

0.2, for example, if printed

numbers (and

and stored

They

as the

are stored in

a precise binary repre-

out to 20 decimal places turns

out to be a representation of 0.20000000000000001110. This leads to occasions

when

the results of some mathematical operations are not exactly

expect. This

is

perlfaq4 for further discussions of such data String or character data

and double-quoted

strings.

marks or apostrophes, quoted print

(

strings

what one might

not a Perl problem but a fact of binary representation (see perldoc

first.

in

two

basic forms in Perl: single-quoted strings

Single-quoted strings, delimited with single quotation

are the

more

literal

Recall our Hello

"Hello WorldVn"

SCALAR DATA

comes

issues).

of the two forms.

We will look at double-

World program from chapter

1

);

61

This prints out the string of characters Hello World followed by a newline.

The

\n

is

a special sequence denoting a newline in a double-quoted string. This

referred to as backslash interpretation. available within

Table 4.1

double-quoted

Backslash interpretation

There

strings, as

in

are several backslash interpretations

shown

in table 4.1:

double-quoted strings

X

\a

alarm

\cX

control

\b

backspace

\Onnn

octal byte

\e

escape

\xnn

hexadecimal byte

\f

formfeed

\l

lowercase next

letter

\L

lowercase

next\E

\u

uppercase next

letter

next \E

\n

newline

\r

carriage return

\t

tab

\U

uppercase

W

backslash

\Q

backslash non-alphanumerics

\"

double quote

\E

end\LAU,or\Q

'

until

until

For any other character, the backslash means interpret the next character ally (losing

any

special

meaning

is

it

may

have had). This

is

liter-

usually referred to as

escaping a character. So, to include a double-quote character or a backslash character

within a double-quoted string, you escape them with a backslash:

print "This \\ is a backslash"; print "Here \" is a double quote";

# #

prints: This prints: Here

\ "

is a backslash is a double quote

Aside from backslash interpretation, double-quoted strings also allow variable interpolation. This

by

its

means

present value:

that a variable in a double-quoted string will be replaced

;

Svariable = 'Hello'; print "$variable World";

Array variables

may

prints: Hello World

#

also be interpolated,

but hash variables are not subject to

interpolation in this manner. All scalar variables begin with a $ symbol

and

array variables begin with a @ symbol, so a consequence of this interpolation if

you want

to have to

with a backslash so that

one of those symbols it is

$variable = 'Hello'; print "\$variable World";

interpreted

#

prints:

in

is

all

that

your string you must precede

it

literally:

$variable World

Single-quoted strings cannot interpolate variables. Single-quoted strings only allow for two special cases of backslash interpretation: a backslash

62

CHAPTER

4

DATA: TYPES

may

be used to

AND VARIABLES

;

escape a single quote

(i.e.,

to allow a single-quoted string to contain a single quota-

tion mark), or to escape a backslash as in a double-quoted string.

often helpftil in terms of maintenance to adopt a style of coding where

It is

you only use double-quoted

strings

tion or interpretation not offered

marks

for

all

when you

require a double-quotish interpola-

by single-quoted

strings.

Use

single quotation

simple strings.

Of course,

we want to use it in a program at best. If we could name an item of data, then just use the name to refer to it, we would be much better off. Well, we don't do exactly that, but we can create named containers (i.e., variables) to hold bits of data. Then we can access the data through the name of the container. specifying data literally every time

would be tedious and error-prone

4.1.1

Scalar variables

A variable

is

a container or slot of memory, associated with a name, where

store data values. Perl's scalar variable type can string, or a reference.

We

discussed variable

you can

hold any scalar value: a number, a

naming

in chapter 2,

but

let's

quickly

review the particular naming rules for variables in Perl here.

A variable

name

in Perl, apart

or an underscore character and

may

or underscore characters (well, any has a large

number of special

sequently,

you seldom have

from

its

number

legal

and

illegal variable

do not follow these variables

following

list

names you may use with your own

$total_amount $_private $field_3 $abcl23 $This_is_a_LONG_variable_name illegal illegal

The

digits,

than 255 characters anyway). Perl

worry about giving your

$ainount

$!var $13_var

less

built-in variables that

to

number of letters,

be followed by any

or clash with built-in variable names.

flict

type symbol, must begin with a letter

rules.

names

Con-

that con-

provides examples of

variables:

legal legal legal legal legal legal

(must start with letter or underscore) (starts with digits)

WTiile Perl does not force you to declare your variables before you use them, the strict

pragma discussed

in chapter 2 does force

you

to declare

your variables

or use fully qualified variable names (discussed in chapters 7 and 16).

way

to declare

your variables

is

The

simplest

with the my declaration:

my $variable; my $naine; my $f oo, $bar) (

SCALAR DATA

63

;

As you can

you can

see,

a comma-separated

it.

;

list

;

;

declare a

list

of variables by using parentheses around

of variables.

Once you've declared a variable, you'll need to know how The equals sign (=) is the assignment operator:

$foo $bar

=

to assign a value to

42;

Hello $greeting = "$bar World\n" print $greeting; =

'

'

Figure 4.

memory

1

shows the relationship between a variable name and

its

value in

during declaration and assignment statements.

Variable

Name

Code

Value

declaration

my

$foo;

$foo

#

$foo

#

assignment $foo = 42;

Figure 4.1

Scalar variable declaration and assignment

Assignment may

also

be combined with declaration either singly or in

list

form:

my $greeting = 'Howdy'; print "Sgreeting WorldVn" my ($first, $second) = ('Hello', print "$first $second\n" It is

the

'World');

important to note that assignment writes a value into the variable

memory

location), but using a variable, as in the

(i.e.,

into

print statements above, does

not remove the value from memory, but only accesses the value from the variable. If

programs were confined to the

would be of limited

use.

We

64

We will

examine

this

data contained within them, they

need to obtain data from outside the program from,

for instance, a user typing at a disk.

literal

keyboard or by reading data from a

more

in

file

stored

on

depth in chapter 6 when we discuss input and

CHAPTER

4

DATA: TYPES

AND VARIABLES

;

output. For now,

we

consider only the simple case of obtaining scalar values from

We

a user at the keyboard.

input

file

,

can do this using the input operator and the standard

handle stdin as follows: ""

my Sinput; print "Enter a value: " $input = ; print "You entered $ input";

In a scalar context, such as assignment to a scalar variable, the operator reads in

one

line

of input from the standard input, which

board unless you've otherwise redirected

it.

is

usually the key-

So, in the snippet above, the

statement prompts the user to enter a value.

print

first

The assignment statement

takes

everything the user types at the keyboard up to and including the newline (generated by hitting the enter key) and assign that input to the variable $input.

4.2 An

.

Expressions

expression

is

something that evaluates

to a value. It

able, a function that returns a value, or the result

may

be a

literal value,

familiar with, addition, subtraction, multiplication,

the operators +

,

-

*

,

and

,

/

respectively.

a vari-

of an operation on one or more

values or expressions. Perl supports the basic mathematical operations

and

you

division, represented

are

by

For example:

=3+5;

$foo $bar

=

{$foo

The

+7)

/

5;

sum of its two

addition operator returns the value of the

in this case are the

two simple expressions represented by the

the right

and the

result

of the expression ($foo +

Operators also have a

relative

onstrated in the above example.

precedence

Of the

7)

on the

operands, which

literals

the second line, the division operator also has two operands: the

tion

,

3 and

literal

5.

In

value 5

on

left.

level associated

with them,

as

dem-

four basic arithmetic operators, multiplica-

and division have a higher precedence than addition and subtraction, meaning

that multiplication

and division operations

subtraction. In the example above,

are evaluated before

we used

any addition and

parentheses to override the standard

precedence, causing the addition to take place before the division because parentheses

have the highest precedence.

Had we

$foo would have been added to the Perl has

two additional numeric

exponentiation operator the

modulus

ands.

It

result

operator),

is

**.

it is

of 7

/

5.

operators: exponentiation

and modulus. The

Like the four simple arithmetic operators above (and

a binary operator. In other words,

returns the result of its

EXPRESSIONS

not used parentheses, then the value of

left

it

takes

operand raised to the power of its

two oper-

right operand:

65

;

'

$foo = 4 ** 2; $bar = $foo ** 3;

The modulus dividing the

$foo is 4 raised to the power of 2 or 16 $bar is 16 raised to the power of 3, or 4096

#

,

#

operator

may

(%)

be

less familiar. It

operand by the right operand. Both operands are taken to be

left

by removing any

gers (or converted to integers if necessary

The

result

of 10 %

remainder and testing

is

because 10 divided by 3

3 is 1

even or odd, you might

if it is

provide a ready means of making such a

test.

The

2 will only have a non-zero remainder

N

an odd

two binary

Perl also has (.),

and

will

be handy to

is

inte-

fractional portions).

3 with

left over. 1 is

1

the

the value returned by this expression. If you recall the question of

an integer to see

know string.

by an

right away, the

This

if

is

realize that this

operator can

of any integer N, modulo

result

integer.

string operators: concatenation, represented

repetition, represented

newline from a

returns the remainder after

is

x.

There

chomp

(

)

is

also

one

which removes

function,

by a dot

built-in function that a trailing

remove the newline from an input value

useful to

obtained using the operator described above. $foo = 'hello $bar = $foo 'world' '

.

$foo = 'bo' X

3

#

;

;

concatenation: $bar is now 'hello world'

#

$foo is now

'

bobobo

print "Enter a value: $ input = ; chomp ($ input # removes newline from $input )

'

;

Perl also offers several

shorthand assignment operators that combine a scalar

operator with the assignment operator, the complete perlop pod-page (perldoc

list

of these

is

available in the

perlop), here are a couple examples to illustrate the

concept: $a = 42; $a +=5;

same as: $a

$b = 'hello $b .= 'world'; '

$a + 5

=

;

#

same as

:

$b = $b

.

'world'

You might now wonder what happens you attempt is

"

,

#

to use a

if

numeric operator such

are only defined for

add

as addition to

the second application of type distinction

Numeric operations

a scalar variable contains a string

I

mentioned

numeric

values,

it

to a

in the

and

and

number. This

opening section.

string operations are

only defined for string operations. In a strongly typed language where each variable can only hold a particular type of value such as an integer or a character string, the

66

compiler can detect an attempt to add two mismatched variables and

CHAPTER

4

DATA: TYPES

AND VARIABLES

;

;

' '

cause an error before the program

is

actually run. In Perl, with only

type for both strings and numbers, such information piler.

An

gram

is

not available to the com-

is

running and the variables are evaluated for their values.

problem

number

expected, the

is

in a very relaxed

manner.

number

is

expected,

it is

used where a

is

converted to a string. (For example, the

is

converted to a

something reasonably interpreted

as a

if

a string

number according

leading spaces in the string are ignored. If the

a number), they are taken as the

number

If a

3.14 becomes the string of characters 3.14.) Similarly,

Any

scalar data

attempt to add a string and a number cannot be detected until the pro-

Perl solves this

string

one

number

number. Any

used where a

to the following rule:

non-space characters are

first

minus sign followed by

(or a plus or trailing

is

number

non-numeric characters

are

ignored. If the string does not have an obvious numeric interpretation, a value of 0 used.

is

'

3

.14

converts converts converts converts converts

'

3.14'

'

3.14abc' 'abcl23 '

'

number

This conversion keyboard, which

is

is

to 3 14 to 3.14 to 3 14 .

'i

.

.

;

to 0 to 0

^ :

a useful device that allows

read as a string, and use

it

you

as a

to read a

number.

have done some calculations and wish to print out the verted back to a string for output.

Run

[

number from

the

Similarly, after

you

number

con-

results, a

the following example

,

v

program

a

is

few times

using different values for input. Try using: 42, 42abc, and hello. # /usr/bin/perl -w use strict; print "Enter a value: " my $input = ; my Sresult = $input + 5 print "result is $result\n" !

;

If

you

program

are using the

-w switch for warnings, which

warnings

will issue

interpretation. Try

when

I

highly recommend, the

a string value does not have

running the preceding

script again

an obvious numeric

with the same inputs but

without the -w switch.

List data

4.3

A list

is

simply an ordered collection of scalar values.

comma-separated

LIST DATA

list

It is

represented literally as a

of scalar values or expressions enclosed within parentheses:

67

)

|

list of four values list of three values list of three values

(1,2,3,4)

Ca', +

(4

"red") $foo, 1)

42, 5,

A nested gle flat

where a

list is

inserted within a

4),

is the same as

(1,

3,

We've already seen an example of using a

list

(3,

5)

my

able declaration using the

sin-

it

5)

4,

in the previous discussion of vari-

The print

declaration.

argument, though we've only used

its

simply evaluated to a

list, is

list:

2,

2,

(1,

list,

function also takes a

)

(

so far with a single element in the

list

as

list:

print ("the value of \$a is $a\n"); print (' the value of $a is ', $a, "\n"); #same output as above In the second version we've used a

same output

second

is

The

as the first version.

(Hence, we did not have

of three scalar values to produce the

list

element

first

the value of the variable $a.

And

double-quoted strings

symbol in the

the third element

quoted string producing a newline. The practical in

{ '

one

'

— ,

two

'

,

'

three

'

,

Creating this type of

and

lists,

is

four

'

list

in

producing

'

one

'

,

'

'

,

'

three

'

,

'

your code can make

four

first

lists

version.

of quoted

'

it

difficult to read for

prone to the error of forgetting a quote. The qw

two

The

list:

sequence of whitespace-separated "words" and produces a {

string.)

simply a double-

demonstrated here by the simplicity of the

is

the quote function. Consider the following

'

is

of variable interpolation

utility

Perl has a convenient list-making function for strings

the single-quoted string.

is

to use a backslash to get a $

list

(

)

long

function takes a

of quoted "words":

'

is the same as:

qw(one two I

three four)

used quotes in the above paragraph to describe "words" because they don't

have to be words in the ordinary sense ters

separated by any

just sequences

amount of whitespace. The qw

use parentheses to delimit acter.



its

argument.

One

(

)

of non-whitespace characfunction does not have to

can use any non-alphanumeric char-

Common choices are slashes or vertical bars:

qw/one two three/ qw one two three

'

I

68

CHAPTER

4

DATA: TYPES

AND VARIABLES

;

;

;

Array variables value may be stored in an

;

4.3. 1

A list

prefixed w^ith an @

array variable (see figure 4.2).

symbol (think of it

as a stylized "a"

fiar

An

array),

array variable

but otherwise

lows the same naming rules as those for scalar variables mentioned

ment

to an array variable

assignment operator to assign a

@array =

same

the

is

list

as for scalar variables.

earlier.

is

fol-

Assign-

Simply use the

value to an array variable:

(42, 13, 2);

@array#

$arrav[1]#

An

Figure 4.2

array variable

is

a

list

of scalars.

three' ©foo = (1, 2, ©bar = ($a, $b, 3 + 4) ©copy = ©foo; '

)

Whenever you

mind. That way you won't find element of a ©foo ©bar ©new

it

list

you cannot

surprising that

representation in

store

an array

as

an

list:

=

(1

=

('four',

=

(0,

,

you should keep the

picture an array,

2

,

'

three

©foo,

'

)

6);

5,

©bar,

7);

is the same as:

©new

= (0, three '),(' four (1, 2, which resolves to: ©new = 0 1 2 three four '

(

,

Lists

and

,

,

'

'

,

'

arrays are ordered.

'

,

'

5

6),

5,

,

,

6

,

7

7

)

)

We can access

individual elements of them using

a subscript notation to refer to the position, or index, of a value in the

list:

©array = {9, 10, 11, 12) $second = $array[l]; # assigns 10 to $second

You might bered

1

,

find

two things odd

and why the

access into the array

bol. First things first. Perl, like

LIST DATA

at this point: is

why

the second position

is

num-

prefixed with a $ instead of an @ sym-

many programming

languages, starts counting

69

from

zero.

indices

numbered from 0

On any

we

This means that the four element

stored in the array has positions or

to 3.

the second point, recall that an array holds a

given element in the array

say

@foo

list

is

-

^



(42,

12,

of scalars, so the type of $

When

symbol.

^

,

,

=

list

always a scalar type denoted by a

2)

;

performs the equivalent assignment of

this

Well,

$foo[2])

$foo[l],

{$foo[0],

it's

=

12,

(42,

2)

not quite the same thing. In the

equal that three element

list,

while in the

first case,

the entire array

only the

latter case,

is

set to

three elements

first

are set to those values. If the array previously held ten items, the last seven

remain

unchanged.

Each element of the array does

is

a scalar in

is

own

its

right. All the array variable

allow us to refer to the whole collection as one

on the

certain operations

list itself

such

beginning, the middle, or the end of the

Before

we proceed

from a

list

of scalar variables just

as

we saw

and

to

perform

list.

array operations,

of scalar values to a

does not just apply to array elements. You can assign a list

list

adding or removing elements to the

some of these

to discuss

that the above assignment

as

named

when

earlier

list

list

we should

realize

of scalar variables

of scalar values to any

declaring and assigning multiple

scalar variables:

{$foo, ($foo,

$bar) $bar)

That

12); ($bar, $foo) (42,

=

last line is possible

values of the list

=

two

list

#

$foo gets 42, $bar gets 12 swaps $foo and $bar

because the

variables create the

of variables on the

that

# ;

left. I

list

list

on

the right,

—which would

slice

of an array or

list



that

is

evaluated

which

is

then assigned to the

— $bar

first.

is,

assigned to $foo, then $foo

leave both variables with the

Aside from accessing individual elements, a

The

right

point this out so that you don't mistakenly assume

assignments happen sequentially

assigned to $bar

on the

same

we may access what

value. is

referred to as

list

of indices into the

DATA: TYPES

AND VARIABLES

a sublist corresponding to a

array:

@foo ©bar

70

=

(10,

=

(afoo[l,2];

12,

14,

16)

;

#

@bar gets the list

CHAPTER

4

(12,

14)

;

When we

;

are accessing a slice (sublist),

we

A slice of an array

a

with a subscript

list.

is still

slices

need not be

in a consecutive order:

©foo ©bar

=

(10,

14,

=

©foo

There

12, [3

are

0

,

,

1]

two

©foo

=

array,

(11,

we

12,

=

#

©bar gets the list

10,

(16,

12)

of built-in functions for adding and removing elements

pairs

array.

For adding to or removing from the begin-

use unshif t and shift respectively:

13)

shift (©foo)

For working

or array type value. Also, these -

unshift (©foo, 10); # ©foo is now (10, unshif t (©foo, (8, 9));# ©foo is now (8, $bar

list

16)

;

from the beginning or end of an ning of an

use the © symbol in combination

$bar gets

#

;

at the

end of an

11,

13)

12,

10,

9,

11,

©foo is now

8;

array,

we

12,

13)

10,

(9,

11,

12,

13)

and pop functions

use the push

for

adding and removing respectively: ©foo

=(1,2,3);

^

push(@foo, 4); push(@foo, (5, 6));

# #

©foo is now ©foo is now

$bar

#

$bar gets

=

pop(©foo);

One

last

point to

make about

6;

comma

comma-separated

(1,

2,

3,

4)

(1,

2,

3,

4,

©foo is now

(1,

2,

3,

how

4,

5)

they are interpo-

you print an array using print ©array, no

separates the values printed. Perl will just treat the array as a list

double-quoted string

of values to be printed. However, like

print "©array", the array

between each value. Both behaviors are good ing the values of the special variables: $ list

6)

5,

array variables concerns

lated within double-quoted strings. If

space or

.

,

,

defaults,

if

will

you

print an array in a

be printed with a space

but can be changed by

the output field separator,

and

$

"

alter-

the

separator.

4.3.2 Hash variables Perl has

one other variable type that holds collections of

variable (formerly %

symbol.

A hash

known

as

an associative

array).

does not have a convenient

array can be represented as a

list,

scalar values

A hash variable

literal

is



the hash

prefixed with a

representation in the

way an

because the elements of a hash are stored in an

order completely independent of the order in which they were assigned.

A hash a

name,

is

like

an array that

called a key (which

LIST DATA

is

not ordered, where each value

must be

is

associated with

a string), rather than a positional index.

These

71

keys are used as subscripts to access individual elements just as indices are used as subscripts into arrays. ets,

But while array subscripts

are contained inside square brack-

hash subscripts are contained in curly braces.

Even though a hash such a

case, the list

%hash

=

('first',

is

list,

taken to be a

42,

The hash above

not a

is

a

list

'second',

can be used to assign values to a hash. In

list

of key/value

pairs:

12);

has two elements, one corresponding to the key first and

one corresponding to the key second: %hash = ('first', 42, 'second', 12); print " $hash{second} \n" # prints: 12 ;

Perl has

an alternative to the

comma operator that is

assign to hash variables, the => operator. This that

it

works the same

on the

also automatically causes the value

useful

left to

when

using

lists

to

comma except

as the

be quoted:

%hash = (first => 42, second => 12); print " $hash{second} \n" # prints: 12 ;

The like

list

two

in the first statement

pairs

still

of elements separated by a comma, and

the quotes are no longer necessary. Just

may

be interpreted

%hash

=

%hash

first

(

has four elements, but

=>

as a

42,

hash

remember

(see figure 4.3

second =>

it is

72).

it

looks

more

easier to read because

that a hash

on page

now

is

not a

list,

but a

list

;

13);

#

$hash{first}#

Figure 4.3

A

hash variable associating keys with scalar values

Because a hash the beginning or

always add a

%hash

is

not a

72

no functions

end of a hash: there

new element by simply

(first => 42, $hash{ third} = 7; =

list,

is

exist to

add or remove elements from

no beginning or end of a hash. You can

assigning a value to a

new key

in the hash:

second => 12);

CHAPTER

4

DATA: TYPES

AND VARIABLES

;;

And you

can delete a key (and

delete $hash{ first }

You can

named keys the

list

ordered

get a (

)

#

;

its

value) using the

delete function:

removes the key 'first' and its value

of the keys or the values in a hash using the appropriately

list

and values

(

do not expect these

functions. However,

)

of keys or values in any particular order. Remember, a hash list.

The keys

(

function merely returns a

)

list

to return

not an

is

of keys depending on the

order in w^hich Perl internally stored those keys.

%hash = (first => 42, second => 12); $hash{ third} = 7; ©keys = keys (%hash) print "@keys\n"; # printed: first third second

hash

Perl also allow^s list

slices,

of values by providing a

-O

similar to array slices, with

list

of keys

as a subscript to the hash.

symbol, rather than the normal % symbol, for hash

;

Here, @h_slice

hash

12)

(7,

an ordinary array and (ahash{ 'third'

,

'second'

function iterates through key/value pairs in a hash. This

done with one of the looping constructs shown Hashes are powerful

of them in examples in

4.4

}

is

the

tools for organizing sets

is

gener-

in the next chapter.

of data.

We will make heavy use

later chapters.

Context

draws a primary distinction between scalar values and

Perl

use an ©

slice.

The eachO ally

is

You

access a

slices:

%hash = (third => 7, first => 42, second => 12); ©h_slice = @hash{ third' ,' second' } # @h_slice gets '

which you may

It

values.

what type of value may be assigned

tion extends farther than just

type of variable.

list

also affects

how

are evaluated within a context that

This distinc-

to a particular

certain expressions are evaluated. Expressions

may

be either scalar or

list

(or void, but

we

won't consider that here). Consider the following:

@foo $bar

=

(11,

=

@foo;

In the

An

12,

first

13)

statement,

array expects a

list

we have

so the

ment, an array on the right typed language,

CONTEXT

this

list is is

a

list

on the

right being assigned to an array.

evaluated in a

list

context. In the second state-

being assigned to a scalar variable. In a strongly

would be a type mismatch.

How can you

assign an array to a

73

The assignment

scalar?

expecting a scalar value;

is

it is

providing scalar context for

the array. Perl has a rule for evaluating an array in scalar context: return the

num-

would be

ber of elements contained in the array. So, in the above example, $bar assigned a value of 3.

you might

Similarly,

@foo = (12) @foo =12;

The

; '

:



first

statement assigns a

ond statement

tries to assign just

scalar value

is

evaluated in

which

is

(12),

list

with one element to the array @f oo.

context,

list

=

(2,

3,

4)

(0,

1,

@foo);

@foo

evaluated as a

is

Perl's

produces a single element

Perl

list,

the earlier examples of array assignment:

Obar gets

#

2,

1,

(0,

3,

4)

why

list

context to any element within, which

list

in the second statement rather than as a scalar.

built-in functions return different values

which they

on

and

;

provides a

list

sec-

assigned to the array.

=

A

The

the scalar value 12 to the array. In this case, the

We saw an example of context in @foo @bar

one of two ways:

assign a scalar to an array in

is

the array

Many

depending upon the context

of in

depending

are called. If a function has different return value behavior,

context, the perlfunc pod-page entry for that function will clarify that different

behavior.

4.5

f

References to variables

Earlier

we



we mentioned

storage bin with a

warding address.

you

or

is

name and an

how

grams given

follows earlier

it

circle

its

you'll

is

stored.

see

that

for that vari-

examine the variable each variable

pointing to a storage bin

something

value

up the address

variable, Perl looks

sents the address associated with the variable Perl encounters

reference as a for-

of variable names and their associated

lists

chapter,

this

immediately followed by a black

Whenever

you can think of a

to the correct storage bin. If you

in

but

string, or a reference,

one. If you think of a variable as a

but the address where

own

its

WTienever you access a

name and

make

a reference to another variable in a scalar variable,

are not storing that variable's value,

addresses.

to

address, then

When you store

Internally, Perl maintains

able

number, a

that a scalar value can be a

never said what a reference



dia-

name

is

that circle repre-

name.

like $foo,

it

expects the

$

symbol

to be

immediately followed by something that evaluates to an address where a scalar value it,

is

then

stored. Perl checks

its

replaced by

its

f oo

is

(or set if we are assigning

74

internal scalar

address,

name

and the

lists

for the

name

f oo. If it finds

scalar value in that bin

is

retrieved

something to $foo).

CHAPTER

4

DATA: TYPES

AND VARIABLES

;

Perl allows its

you

)

to assign the address of

one

variable's storage bin, rather

value, to another variable using the backslash operator. This

than

called assigning a

is

reference to a variable:

$foo =42; $bar = \$foo; It is

_

#

important to

realize that

$bar

is

a scalar variable with

In this case, however, the value stored in that location bin.

Hence,

if

.

$bar gets address of $foo's bin

you print out the value of $bar you

will

(



own

storage bin.

the address of another

is

not see 42 but a representa-

tion of the address that looks like: scalar 0x8050fe4

tion consists of the type of reference

its

)

This printed representa-

.

—and

a reference to a scalar

a representation

of the address. To actually use the reference to access the contents of that storage bin,

you need

to dereference

it.

Remember

that Perl expects to find something that

an address immediately following the

resolves to

symbol before our reference

variable,

it

will

$

symbol, so

if

we

place another

$

be followed by something that resolves

to an address: '

$foo = 42 $bar = \$foo; print "$bar\n"; print "$$bar\n";

'

;

In the

name

first

'

#

#

prints the reference: SCALAR 0x8050fe4 prints: 42 (

print statement, Perl sees $bar as a $ followed by a scalar variable

that immediately resolves to an address. Perl looks

nal tables

and

up the address

uates the second dollar sign, retrieves the value, is

in

its

inter-

prints the value stored there (another address). In the second print

statement, Perl discovers a dollar sign followed by another dollar sign.

sign

:

which

is

which

an address

It first

eval-

is

followed by a valid variable name, and

as

we have

already seen.

Now the first dollar

followed by an address, so Perl follows that address, retrieves the scalar value

stored there (42),

and

prints

it

out.

Figure 4.4 shows a graphic depiction of a scalar variable containing a value,

and another

scalar variable containing a reference to that variable.

diagrams, an address

is

A reference value

denoted by a black

is

circle

As

in the earlier

pointing to a storage bin.

always a scalar value, although

it

may

refer to the address

of an array value or a hash value:

©array = (11, 12, 13) $aref = \@array; print "@$aref\n"; In this case, Perl expects the @ symbol to be followed by either a valid array

name, which corresponds to an address, or an address pointing

REFERENCES TO VARIABLES

to

an array value.

75

$scalar = 42;



$scalar

$s_ref = \$scalar;



$s_ref

A

Figure 4.4

As

reference to a scalar variable

in the previous example,

ment

above, the @ symbol

is

$aref evaluates to an address. So, in the print

state-

followed by an address that Perl uses to retrieve the

stored array value. Figure 4.5 depicts an array variable

and a

scalar variable con-

taining a reference to that variable.

@array =

(42, 13, 2);

@array 0-

$a_ref = \@array; $a_ref



Figure 4.5

At

A

reference to an array variable

this point,

references.

it is

only important that you understand the basic concepts of

Their usefulness won't come into play until

later chapters

with passing parameters to functions and creating data structures. references

4.

6

more

Putting

data,

76

many

We

deal

will cover

fully in those chapters.

it

together

In this chapter, you've been presented with well as

when we

all Perl's

basic data

and

variable types as

simple operators and functions for reading in data, manipulating

and displaying the

results.

You may not

CHAPTER

realize

4

it,

but already you can write a

DATA: TYPES

AND VARIABLES

;

variety of simple but complete

programs using only what has been covered so

For example, consider a program that asks for a measurement in inches and plays the equivalent measure in centimeters.

Here

is

a version of such a

with excessive comments to remind you of various operations. to centimeters using the

approximation that

1

We

far.

dis-

program

convert inches

inch equals 2.54 centimeters:

#! /usr /bin/perl -w use strict;

my ($inches, $centiineters

)

#

;

declare variables

print "Enter a measurement in inches: $inches = ; chomp {$ inches )

$centimeters

#

$inches

=

#

display a prompt

read input value from user remove newline from input value

#

;

";

*

2.54;

#

calculate converted value

print "$inches inches is approximately $centimeters centimeters\n"

You could

write

more complex programs

as well, taking

many more

values

from the standard input and calculating complex mathematical formulas using Such programs may become long because they must repeatedly

these values.

prompt

for a value

and read a

repeatedly executing the I

mentioned

and running

Perl

ters.

earlier

at

next chapter addresses ways of

to simplify such problems.

that

you

will learn far I

We will

cover

more by writing

encourage you to do the

Quite a few more functions and operators

many of them

exist in addition to

in the following chap-

are always available to you.

Use

second viewpoint on things we've already covered or to take a

some functions we

haven't yet explored.

Exercises

Write a program that asks for a weight in pounds and displays the equivalent weight in kilograms

2

and repeat here

presented here.

either to get a

look

4.7 1

same piece of code

The

Remember, the perlfunc and perlop pod-pages

them first

we have

of input.

programs than by reading about them.

exercises that follow.

those

line

(1

kilogram equals 2.2 pounds).

Write a program that calculates the gross pay of an employee. The program should ask for the hourly rate of the employee,

how many

regular hours,

and

overtime hours the employee worked. Pay for overtime hours

should be calculated

EXERCISES

how many

at

time and a half (1.5 times the regular hourly

rate).

77

CHAPTER Control structures 5.1

80

Selection statements

5.2 Repetition: loops

84

5.3 Logical operators

89

5.4 Statement modifiers 5.5

Putting

it

5.6 Exercises

together

92 92

97

78

5'

;

A program

is

a series of statements

tliat are,

by

beginning to end. This sequence of execution

of the program. Not series

tion.

strictly linear fashion.

Other problems require

some condition

In the previous chapter, explicitly address the

(or

Some problems

a series

require a choice

particular criteria or condi-

of steps to be repeated a certain number of

met.

is

we

we

discussed simple expressions but

concept of statements. In short, a statement

is

did not

an expression

combination of expressions) followed by a semicolon. For example

This context.

we

referred to as the flow of control

more courses of action, depending on

or

times or until

is

from

problems are amenable to being solved by executing a

all

of instructions in a

among two

default, executed in sequence

are

realistic

is

a statement, but

Such a statement

it

doesn't

do anything.

will generate a

assuming that the -w switch

is

just a literal value in a void

It is

warning when using the -w switch (and

being used throughout this book).

examples of statements are

foo = 12;

^

....

print "$foo\n";

'

Statements are usually written on a single

line,

but

this

the language. Occasionally, statements are too long to line.

More

is

\

;

'

not a requirement of

reasonably on a single

fit

Consider the following statement, which might have been used in the solu-

tion to the final exercise in the previous chapter:

$gross_pay = $regular_hours $hourlY_rate * 1.5;

This

is

$hourly_rate

*

a perfectly valid statement, though

$overtiine_hours

+

it is

*

not particularly readable

when

written in this fashion. This statement might be better written as

$gross_pay

$regular_hours $overtime_hours

=

Using whitespace in statement.

An

this

* *

$hourly_rate $hourly_rate

manner helps the

even better way to rewrite

$regular_pay $overtime_pay $gross_pay

= =

=

this

+ *

reader understand that this

statement

$regular_hours * $hourly_rate $overtime_hours * $hourly_rate $regular_pay + $ overt ime_pay;

Multiple statements

may

be grouped into

by enclosing them within a pair of curly

CONTROL STRUCTURES

1.5;

*

is

a single

to use three statements:

1.5;

compound

braces.

is

statements called blocks

A block also

introduces a scope on

79

variables declared within

which simply means

it,

that a variable declared in a

my

declaration within a block only exists within that block. In other words, the variable in question

my $foo

=

42

is

visible

only within that block:



;

.

{

my $bar - 13; print "$foo and $bar\n"

#

#

;

both $foo and $bar exist within this block

}

print "$foo and $bar\n";

Scope declaring

is

all

an issue we

is

will address

$bar does not exist here

more

which they

— think of

such variables are

Your

are pri-

entire pro-

Perl automatically putting curly braces at the

beginning and end of your source code serves to

all

are declared. Blocks can be nested.

considered a block

now we

closely in chapter 7, for

of our variables with the my declaration and

vate to the block in

gram

warning,

#

file.

Aside from introducing scope, a block

group statements into a compound statement that can be selected or

repeated with the use of control statements.

5. 1 The

Selection statements

basic selection statement in Perl (and

many

other languages)

is

the if state-

ment. This statement has the following general syntax: if

Condition

(

There

is

)

{

Block

no semicolon

}

on

after the closing brace

contains multiple statements and the structure

is

a block. Often, the block

written as follows (according the

principles laid out in chapter 2):

if

(

Condition statement; statement;

When

)

{

Perl encounters

then the block of statements

an if statement is

it

evaluates the condition. If true,

executed. Otherwise, execution skips the block

and

continues with the next statement following the block (see figure 5.1).

But what

is

and what

a condition

any expression. This expression value.

Remember,

is

true

(i.e.,

in a scalar context, the result

and

false?

The

condition

is

simply

evaluated in a scalar context to produce a scalar

Perl interprets variables

depending upon context

80

is

scalar or

list

and expressions

context).

of the evaluation

is

When

slightly differently,

something

the scalar value.

CHAPTER

5

is

A value

evaluated is

false if

CONTROL STRUCTURES

statement;

statement;

Flow diagram

Figure 5.1

undefined

it is

of

an 1£ statement

(a variable that

has no value stored in

a string containing only a zero. Every other value

my $foo if

=

$foo

(

is

it),

empty

a zero, an

string, or

considered true.

1; )

{

.

print "$foo is a true value\n"

'

;

:

r

i

,

'

'

}

,

If you

ment of:

run the snippet above

inside the if block. If

as a Perl

program,

you change the

initial

it

will execute the print state-

assignment to $foo to be one

0 (or anything that evaluates to zero; 0.0 or 0el2 for example),

'

'

or

,

'

0

'

then the if block will be skipped.

Often we do not want

to merely test a variable's value.

that value in terms of another value. In other words,

is

We

want

to evaluate

the variable's value less

than, greater than, or equal to another specific value? There are operators, called relational operators, for each tests

a

numeric

common

relations

of these

and one

tests.

These operators come

tests string relations (see table 5.1

mistake to use the wrong type of relational

test,

in

two forms: one

on page

82).

especially to mistak-

enly use a numeric test for string data. Perl will issue a warning about this

being used. (You are using -w,

SELECTION STATEMENTS

It is

if

-w

is

aren't you?).

81

Table 5.1

Relational operators

Numeric

String

$a < $b

$a

It

$a

le

$8

$b

$b

$a

$a gt $b

True

if

$a

is

$a ge $b

True

if

$a

is

greater than or equal

TO q)D

$a

== $b

$a != $b

$a eq $b

True

if

$a

is

equal to $b

$a ne $b

True

if

$a

is

not equal to $b

You can use the if statement the condition if

(

is

Condition Again, this

to choose set

true or something else if it

{

)

is

Block

}

else

{

is

Block

up

false.

choices. Perl can

do one thing

if

This form has the general syntax

}

better written as an indented structure according to principles

outlined in chapter

2.

Figure 5.2 shows the flow of control in such a structure.

statement;

statement;

TRUE

FALSE

else statement;

statement;

else statement;

statement;

statement;

statement;

Figure 5.2

82

Flow diagram

of

an if/else statement

CHAPTER

5

CONTROL STRUCTURES

,

The

following example displays one thing

something if

else if the relation

$foo

;

"

.

>

)

r' .

;

}

greater than $bar, or

is

is false:

$foo > $bar { print "$foo is greater than $bar\n" else { print "$foo is not greater than $bar\n" (

if

''

'

'

'

' '

-'O

;

!

'

:

:

.

}

Additionally,

you may

select

one of several blocks

ple selection criteria in elsif clauses (note, there

is

to execute

no

by using multi-

"e" before the "i" in the

elsif keyword), which can be inserted between an if block and an else block (see figure 5.3).

state me fit;

statement;

TRUE

BLOCK

1

TRUE

BLOCK

2

FALSE I

ELSE BLOCK

statement;

statement;

Figure 5.3

Flow diagram

of

an i£/elslf /else statement

SELECTION STA TEMENTS

83

;

;

;

we can determine

In this manner,

if

;

a value

greater tlian, equal to, or less

is

than another value: if

}

$foo < $bar { print "$foo is less than $bar\n" elsif $foo > $bar { print "$foo is greater than $bar\n" (

)

{

)

}

else

{

print "$foo must be equal to $bar\n"; }

5.2 The

Repetition: loops

other primary flow of control structure

block of code that

ify a

the loop.

A loop allows you to spec-

to be repeatedly executed a specific

is

some condition

(determinate loop) or until a program

is

number of

met (indeterminate

is

to calculate the average of ten values input

by a

user.

loop). Consider

You could

out ten pairs of statements that prompt for and accept user input. write the code once

What

and loop over

you do not know

if

you must use a loop

this case,

czW&d z sentinel

The

value,

in

it

times

Or you

write

could

10 times.

advance

how many

values will be averaged? In

to repeat the input statements until a special value,

h emtttd.

basic indeterminate loop,

which

you'll find in

many

other languages,

is

the while loop:

while

(

Condition

)

{

Here the condition executed.

Once

condition

is

Block is

the block

}

tested and, if

is

it is

true, the statements in the

block are

finished, control returns to the top of the loop

and the

The block will be repeatedly executed as long as the conOnce the condition returns a false value, execution next statement following the block. Whenever you write an

tested again.

dition evaluates to a true value. will continue

with the

indeterminate loop, you should always double-check the statements in your loop-

block to ensure that the conditional will eventually

an

infinite

Here

loop that won't stop running until you is

fail.

kill

Otherwise, you will have

the program.

an example that calculates the average of an undetermined number

of grades: # /usr/bin/perl -w use strict; my ($average, $total, my $grade = !

$count)

=

(0,

0,

0)

'

'

84

CHAPTER

5

CONTROL STRUCTURES

{

;

"

while (Sgrade ne 'q') print "Enter a grade or 'q' to quit: $grade = ; chomp $grade; if (Sgrade ne 'q' $count = $count + 1; $total = $total + $grade;

,

{

.

"

)

'

} :

,

}

# avoid division by zero if nothing entered Scount 11=1; $average = $total / $count; " print "The average is $average\n"

'

,

;

.

:

_

we first declare our variables for the average, the total, and the count of how many values will be eventually entered. These variables are initialized with values of zero, using the list assignment we saw in the previous In the example above,

chapter.

We

also declare a variable to

with the empty is

not equal,

prompt the variable.

sentinel

string.

We

hold each grade

(ne), to the string q, the

block

chomp

(

)

was entered.

add the grade

off the newline If the sentinel

to the total.

and read

initialize

this into the

determine

to

was not entered, we add one

it

to

we

grade if

the

our count and

This entire loop will be repeatedly executed until the

which point the loop

to calculate the average

and display the

while loop may

input and

and use an if statement

user enters a single q, at

A

it is

executed. Inside the block,

is

user for a grade, or the sentinel value q,

We

as

then enter the loop statement. If the value of the grade

also

is

finished,

and the program goes on

result.

have an optional continue block immediately

fol-

lowing the main loop block. while

(

Condition

)

{

Block

The continue block

is

}

continue

{

Block

executed each time the while loop continues to be

executed by default (see figure 5.4 on page 86).

much trol

in practice.

statements

WTien

we

it is, it is

The continue block

its

is

not used

usually in conjunction with additional loop con-

will discuss later in this chapter.

statement, complete with

}

However, we can use the while

continue block, to help define the next kind of loop,

the for loop:

for (initialization; condition; In this statement, the tialization expression

the loop

body

is

and

executed.

REPETITION: LOOPS

first

tests

iteration)

time the loop

is

{

Block

}

encountered,

it

evaluates the ini-

the conditional expression. If the condition

WTien the loop body

is

is

true,

finished executing, the iteration

85

statement;

statement;

FALSE

statement;

Loop Block

statement;

statement;

Continue Block

statement;

statement;

statement;

Figure 5.4

Flow diagram of a while loop statement

expression

is

evaluated,

and the condition

is

tested again.

An

example

will help

show what happens: for

(my $count = 0; $count print "$count\n";




.

and ;v.

.;

{

print "still running\n" }

while 1 print "still runningVn" (

)

-

{

-

}

Infinite loops

have no value in programming unless you can

the loop to exit with just this sort

some other statement.

Perl provides a

force

last statement to do

of thing.

while 1 { print "Enter a value: " my $input = ; chomp $ input; if $ input eq 'q' { print "You are exiting the loop\n" (

somehow

)

(

,



,•

:

)

last; }

print "You entered $input, here we go again\n" }

The last statement Perl provides

will force

an

exit

of the immediate enclosing loop.

two other loop control statements

ment, the next and redo statements. the loop block to be skipped

Briefly, the

in addition to the

last

next statement causes the

and the continue block

to be executed.

state-

rest

of

Control

then returns to the condition of the loop. In the case of the for loop, the loop

body

is

skipped, and the iteration expression

REPETITION: LOOPS

is

evaluated again before returning to

87

;

;

;

A

the conditional expression.

non-empty

lines in a

number of

simple example might be to count the

file:

# /usr/bin/perl -w use strict; my $ count =0; while { next if length ($_) $count++ !

(

-

,

.

)


[2] -> [0] \n" "$$aref [2] -> [0] \n" "$$aref [2] [0] \n" "$aref-> [2] [0] \n"

Notice in the

'

,

,

prints: 42

#

;

actually a reference to an

is

;•;

'

;

'Now, consider a nested structure of arrays (This

used between

is

subscript:

13,

$aref->

"

as the final value evaluates to a reference.

the arrow operator ->, which

is

may con-

how much

are to read.

anonymous

(We could

array or hash values in Perl

also create

anonymous

used in practice.) In other words,

if you

is

scalars,

by using a

sca-

but anonymous

dereference a scalar variable

that did not previously contain a reference to a value, Perl automatically creates

anonymous

value of the appropriate type and stores a reference to

my $variable; $variable-> [0] = 42; print $variable\n" print $variable-> 0 \n" "

[

]

it

;

as if

array

it

:

held a reference to an array, Perl automatically created

and assigned the value 42

CREA TING REFERENCES

in the variable:

#prints: ARRAY 0x8 04a9 6c) #prints 42

Here, the variable $variable did not contain a reference, but it

an

(

;

"

it

to the first

element of that

when we used an anonymous

array, storing a reference

145

;

to that array in the scalar variable.

As we saw

containing a reference just prints out

its

type and

This way of creating anonymous values quite useful. In fact,

we

will use this

lem given above. Here's a top

level

in chapter 4, printing out a variable

method

is

in

memory address.

called autovivification

and can be

our solution to the marking prob-

breakdown of the program:

= /usr/bin/perl -w use strict; my $data_file = 'scores.txt'; my %students; «read file and build data structure>>

.

.

.

#

!

This code.

of

We

is all

relatively straightforward, so

begin by opening our data

the

first

jump

right into the data structure

and looping through

chomp o'ing the newline off a

arrays. After

fields:

file

let's

line,

element will be the student name, the second

number, and the

last field is

to build our hash

it

we split ()

We

the score for that assignment.

is

it

into array of

the assignment

use these fields to

build the structure.

«read file and build data structure>>= open{SCORES, $data_file) || die "can't open file: while

{ chomp; my ©fields = split /:/;

.'^^^

name => 'Anne Smith', age => 35, beer => Pale Ale

{

'

,

name => 'Bill Jones', age => 21 beer => 'Dark Ale',

{

'

^

'

},

stims

=>

name => 'Sara Tims', age => 32 beer => 'Wheat Ale',

{

}

)

;

foreach my $id (keys %employees) { print " $employees { $id} {name} drinks: $employees { $id} {beer } \n" }

This snippet of code produces the following output: Sara Tims drinks Wheat Ale Bill Jones drinks: Dark Ale Anne Smith drinks: Pale Ale :

Mixed structures

8.1.3

Some

languages provide a special data type for a variable that can contain a mixed

collection of other basic data types "struct" in

things

C.

Perl's

you can



example the "record"

for

nested structures give you complete flexibility over the kinds of

nest.

We've already seen one mixed structure with the hash of Consider a more elaborate kind of

arrays used for the student's assignment scores.

record for each employee than that used in the example above. just a single

employee record

of records, just

my %employees

as the

=

in Pascal or the

will consider

here, but the hash could contain a multiple

number

example above does:

asmith =>

(

We

name => 'Anne Smith' age => 35, children => ['Amanda', 'Amy'], beer => Pale Ale Lager

{

[

'

'

,

'

'

]

}

)

print

"

;

@{ $employees {asmith} {children}

CREA TING REFERENCES

)

\n"

;

ttprints

:

Amanda Amy

149

;

Scope

8.2

You know

and references

that a lexical variable only exists during

diate enclosing block).

Most

to

hold a value,

importantly,

it

it

(i.e.,

no longer

also stores

memory

is still

can be released and used for something

imme-

to a variable

and

Perl sets aside

extra information about that value.

how many

things are pointing at that

and

how

it is

Perl determines

needed by the program or

else. Let's

,.,

.

if

that

memory

consider the simplest of cases:

;

{

(i.e., its

when

Well,

exists)?

called a reference count,

is

whether a value stored in

my $outer

some

maintains a count of

particular value. This

current scope

So what happens when we take a reference

that variable goes out of scope

memory

its

,

.

:

.

my

$ inner = 42 $outer = \$inner;

}

print

"

$$outer\n"

;

#

prints: 42

Now, you might be $ inner

thinking, "Hey,

how

can $outer

has gone out of scope?" That's a good question.

never referred to $ inner, able $ inner itself

is

it

just a

still

The answer

referred to $ inner 's stored value.

name

associated with a particular

$inner

refer to is

that $outer

Remember, the

memory

after

vari-

location that

When we take a reference to a variable we get a reference to the mem-

holds a value.

ory location to which depicts graphically reference count

is

what

is is

associated, not the to variable

going on before, during, and

depicted by

how many arrows

name

itself

Figure 8.4

after the inner scope.

The

are pointing at a value.

$outer %-

my $outer;

$ inner

my $inner

=

42;

$outer

=

\$inner;

$outer

print "$$outer\n";

Figure 8.4

150

$outer

Reference to a variable going out of scope

CHAPTER

8

REFERENCES I A GGREGA TE DA TA

S TR UCTURES

;

;

This

you want

ability to

,

have a reference to a variable going out of scope

to create a reference within a subroutine

while you work with

my $foo = get_ref print "@$foo\n";



;

(

it

)

and

store

in a lexical variable

it

before returning this reference back to

when

useful

is

main program. ...

;

12

prints:

#

3

'

sub get_ref { my $temp = [1, return $temp;

^^^.M


(' arrow syntax');

;-'•,(

'

.

(

{

,

'

.

(

'

sub

f oo

'

)

{

,

print "$_[0] \n"

;



;

}

To

an anonymous subroutine, you use the sub keyword followed

create

immediately by the block of code that

my $sub_ref

sub

=

will serve as the subroutine:

{

my $arg = shift @_; print $arg\n" "

'

}

$sub_ref ->

(

"

I

-

;



^

m anonymous");

'

References to subroutines are convenient for a variety of things, but the most

popular usage references.

is

probably for creating dispatch

tables

by creating a hash of function

Consider an interactive program that asks the user for a

command

to

execute:

my %dispatch_table

=

foo

{

bar quit q help )

=> => => => =>

sub { print "You chose 'foo'\n" sub { print "You chose 'bar'\n" \&quit, \&quit, \&help,

},

},

;

print "Enter a command (or while () {

'q'

or

'quit'

to exit):

";

chomp;

my if

$ command (

=

$_;

$dispatch_table $command} $dispatch_table $command} -> {

{

152

{

)

(

)

CHAPTERS REFERENCES / AGGREGATE DATA STRUCTURES

;

;

else { print "Illegal command:

}

print "Enter a command (or

$command\n" or

'q'

.

;

'quit'

•-

.

to exit)

"

:

}

.

,„,

.

.

,

,

'

sub help { print "The available commands are:\n"; foreach my $com keys %dispatch_table print "\t$com\n";

./

. :

{

)

{

.

'

}

:

-

sub quit exit

.

{

0; ^ •

}

'

Here we have two This

tion.

different keys in the hash referring to the

The

times, but encouraged.

mous

and

to create

when only one was

store

two

a duplicate anony-

rent lexical environment. is

when

subroutine in Perl

Such a subroutine

anonymous subroutine

that an

that

environment

sub speak { my $ saying return sub

separate, but identical, function references

really necessary.

83.1 Closures When you create an anonymous

even

would have been make

alternative

function for both the q and quit keys in the hash. In other words, Perl

would have

this

same quit func-

not only allowed, since a "thing" can be referenced any number of

is

carries

no longer

is

=

shift;

{

print "$saying $_

[

0

is

called a closure. lexical

its

in scope.

]

!

\n"

deeply

it is

;

}

bound

to

its

Another way

cur-

to say

environment around with

This

is

;

it

best shown by example:

i

}

my $batman my $robin

speak Indeed = speak (' Holy ') $robin-> mackerel # prints: Holy mackerel! $batman-> {' Robin ') # prints: Indeed Robin! =

(

'

'

)

'

;

'

'

(

)

;

;

Here, the subroutine speak assigns

it

the

first

assigned the $ saying.

syntax. ated.

will

makes a new

element of the parameter

anonymous subroutine of whatever

()

passed to

it.

It

be given to

it

as

first

an argument. Both $batman and

anonymous subroutine from speakO, but with

still

Ssaying and

then returns a

that uses that lexical variable along with the

These anonymous subroutines

Each

list

lexical variable

new

parameter

$ robin are

different values of

are then called using the

arrow dereference

contains the original value of $saying with which they were cre-

Thus, even though the referenced subroutines are called in a different scope

REFERENCES TO FUNCTIONS

153

than where they were created, they act

them



as if

The

they were

still



,

;



in terms of any lexical variables used within

within the same scope in which they were created.

we

kinds of things closures are good for are somewhat specialized, but

will consider

one particular usage

here: creating a stream based

on

a particular

mathematical function.

You may or may not be the Fibonacci numbers.

number

in the series

is

familiar with a mathematical series of

The

first

two Fibonacci numbers

2.

0,

The

first

1,

1,

We

2,

ten Fibonacci

3,

1

numbers 21,

13,

8,

5,

,

is

also

1

and the fourth, the

,

n fibo(n-2)

= =

We

iterative

if

+

can translate

lated the Fibonacci

what

result

of

1

+

1

that position

this definition into a

simple recursive subroutine that calcu-

We could also cre-

for a given position in the sequence.

to be able to print out the first five such

that calculates the Fibonacci

times

—more

if

it's

number

tinue the sequence

sequence

One

it

later.

And,

—probably

answer

numbers

to store

still,

would be repeating

array, a cache, for

for position

a recursive function

Then, the program would have

its

more

is

each time

it

A subroutine

to be called at least

for each position in the

list.

calculated the next position in the

a lot of computational

the best in this case

that have already been



is

work

already performed.

to create a separate storage

computed and have

from where

it left

the subrou-

off

to use closures to provide a steady stream of Fibonacci

bers that can be continued at

But

numbers, then do some

n would have

—once

quickly.

position in the sequence in order to con-

tine utilize this array to continue the sequence

alternative

is

if n == 0 or n == 1 if n > 1

fibo(n-l)

number

number at

type routine that accomplished the same task but

we wanted

An

next

34

other things, then print out the next five numbers in the sequence.

five

The

can envision a recursive definition for the nth Fibonacci number in the

fibo{n) fibo{n)

an

1.

are

sequence: for any position n in the sequence, the Fibonacci

ate

and

called

obtained by adding together the two previous numbers. So

the third number, the result of 0 + is

are 0

numbers

num-

any time:

# fibonacci stream generator: sub new_f ib_streain { my ($current, $next) = (0, 1); return sub{ my $fib = $ cur rent, ($current, $next) = return $fib;

($next,

$current

+

$next)

}; }

154

CHA P TER

8

REFERENCES I AG GREGA TE DA TA

S TR UCTURES



# create two new fibonacci streams my $fib_streaml = new_f ib_streain my $fib_stream2 = new_f ib_streain

#

print out first

f oreach

(

1

.

.

5

5

-

(1

.

.

10)

"

\n"

(

)

;



.

:

,

.

'



:

1 • .

(

)

"\n";

,

(

)

,

...



2

";

"

,

;

'

;

"

.

(1..5)

,

{

print out next

f oreach

:"

'

fibonacci numbers from stream

print $f ib_stream2-> print

..>

,,

.

print out first 10 Fibonacci numbers from stream

f oreach

#

)

;

{

)

print $f ib_streaml->

#

• :

(

5

.,

-,

-

-

,

.

fibonacci numbers in stream

-

/

•;

.-

1

{

print $f ib_streaml->

(

)

"\n";

,

-

.,,

.-..l

,

-,,

.

}

When two

the new_f ib_streain(

defines

lexical variables

anonymous subroutine fianction again creates

returns a

new

function

)

is

called,

within that scope.

It

it

new

creates a

scope and

then defines (and returns) an

that uses those lexical variables. Calling the generator

an entirely new scope with

its

own

and

lexical variables

closure that uses those variables. In this way,

you can

up

set

as

also

many

completely independent Fibonacci stream closures as you want. Closures are something of an advanced topic, so

we

are only introducing the

concept here.

8.4 Nested structures on the fly Imagine a

of employee data, with one employee record per line recording

file

employee ID number, 142al3 971a22 131b21 119dl7 12 3al2 666s66 777q42 If

first

name,

last

name, department, and type

John Doe Sales :pt Jane Doe Operations ft Amanda Smith: Sales :pt Frank: Cannon: Support :pt Ron: Gold: Support :pt Lucy :Kindser: Operations ft :Bob:Norman: Sales ft :

:

:

:

:

:

:

or part time):

(fiiU

'

"

'

'

:

;

-

:

.

.

:

.

c -?

:

:

:

:

we wanted

loop through

it,

to read this

file

and

splitting each line

create

an array of

on the colons

arrays,

we could simply

to produce an array.

We

could

then push a reference to that array into our main array:

NESTED STRUCTURES ON THE FLY

155

;

;

;

# /usr/bin/perl -w use strict; open(FILE, 'employee.dat')

;

!

die "can't open file: $!";

|

|

# build structure my ©employees; while {

)

{

chomp;

my ©fields = split /:/; push ©employees, \@fields; .•

}

;

,

print info from structure foreach my $person (©employees) { print "Name: $person->[l] $person-> print "Department: $person-> [3 \n" #

[2

]

";

,

]

}

This can be simplified

mous

fiirther

by using the split function inside an anony-

array constructor:

my ©employees while (

)

{

'

chomp; push ©employees,

'

split {/:/)

[

];

}

You can use any expression constructor,

and the expression

inside an

will

be evaluated in a

evaluation will be the contents of the In the case of data of this #! /usr/bin/perl -w use strict open (FILE, employee dat .

'

sort,

anonymous

list

new anonymous

a hash of arrays

context.

The

result

hash)

of the

array.

• .

'

)

||

die "can't open file: $!";

build structure my %employees; while my ©fields = split /:/; = shift ©fields; my $id $employees $id} = [©fields]; )

anonymous

may be a better choice for a structure:

#

(

array (or

-

{

^ :

{

}

print info from structure foreach my $id (keys %employees) print "ID = $id. Name = $employees { $id} ->

#

{

[

0

]

\n"

}

156

CHAPTER

8

REFERENCES I A GGREGA TE DA TA

S TR UCTURES

;

;

Or maybe

;

;

,

,

;

;

you'd like to read the data into a hash of hashes so that each field

could be accessed by a

#! /usr /bin/perl -w use strict; open (FILE, employee dat .

'

name:

field

'

||

)

die "can't open file:

$

!

"

# build structure my @field_names = qw(fname Iname dept type) my %employees while { chomp my ($id, ©fields) = split 1:1; @ { $employees { $id} } @f ield_names } = ©fields; {

.

^

)

{

}

print info from structure foreach my $id (keys %employees) { print "$id: $employees { $id} { f name } $employees{$id} {Iname} \n"

#

Another way to structure hash using department names

this data to generate a report as the

would be

primary key pointing to a hash of ID number

keys, which, in turn, each point to a hash of the rest of the record fields.

an idea of the structure, here %departments

=

(Sales =>

is

{

'

to create a

what part of it would look 142al3

'

=> {fname => Iname => type =>

like to build

To

you

give

manually:

'John', 'Doe' '

pt

'

'' '

,

,

},

'131b21'

=> {fname => Iname -> type =>

'Amanda', 'Smith', pt

' :

'

'

}, },

Support => {'119dl7' => {fname => 'Frank', Iname => 'Cannon', type => pt '

'

}

,

}, )

print

"

;

$departments {Sales}

And here

is

{'

13 lb21 '} {Iname} \n"

;

#

prints: Smith

one way we could build such a hash of hashes of hashes on the

# /usr/bin/perl -w use strict; open (FILE, employee .dat

fly:

!

'

'

)

||

die "can't open file: $!";

# build structure my %departments my @f ield_naines = qw( fname Iname type)

NESTED STRUCTURES ON THE FLY

;

157

2 1

;

;

;

;

;

;

;

;

while { chomp my ©fields = split 1:1; my %record; @record{@f ield_names} = @f ields 1 2 4 $departments{$f ields [3] } {$fields [0] } = {%record}; •

(

)

,,

[

,

,

]

print employee data by department foreach my $dept (keys %departments { print "$dept:\n"; foreach my $id (keys % { $departments { $dept } { my $record = $departments { $dept } $id} print "\t$id: $record-> { fname} $record-> { Iname} print " ($record->{type} \n"

#

)

}

)

{

"

)

} }

8.5

Review

Reference Creation

^

$scalar =42; $sc_refl = \$scalar; $$sc_ref2 = $scalar;

# # #

©array $a_ref $a_ref

(42,

13,

2:

\@array [®array]

# #

;

#

@$a_ref3

=

©array;

# #

(a => 42, \%hash; {%hash}

%hash $h_refl $h_ref2 %$h_ref3

=

explicit reference to $scalar's location. implicit creation of new scalar location holding value of $scalar (autovivif ication)

explicit reference to ©array's location. explicit creation of new array location holding copy of ©array. implicit creation of new array location holding copy of ©array.

b => 13) # explicit reference to %hash's location. # explicit creation of new hash location # holding copy of %hash. # implicit creation of new hash location # holding copy of %hash.

%hash;

Dereferencing print "$$sc_ref2"; # prints: print "©$a_ref3"; # prints: print "${$a_ref2} [1] # prints: print "$a_ref2-> [1] # prints: my ©ary = keys %$h_ref2; print "©ary" # prints print " $ { $h_ref 3 } {b} # prints print " $h_ref 3-> {b} " # prints

158

42 42 13 13 13 a b

;

13

;

13

CHA P TER

8

REFERENCES I AG GREGA TE DA TA

S TR UC TURES

8.6 1

Exercises

Write a routine that reads the following table into a two-dimensional such

as

array,

an array of arrays or a matrix:

three one two four five six seven eight nine

Then have

the routine transpose the rows

and columns

to

produce the follow-

ing output:

one four seven five eight two three six nine

EXERCISES

159

CHAPTER

9

Documentation 9.1

User documentation and

POD

9.2 Source code documentation 9.3 Tangling code

170

9.4 Further resources

178

160

161

164

At

this

point you have acquired the tools and concepts for creating both simple

and sophisticated programs using the

ment your programs they

are incomplete.

With each program you

ken."

Some would even

you need

write,

However, unless you docu-

Perl language.

consider

them

"bro-

both user documentation

to supply

and source documentation. User documentation

is

simply the instruction manual for using your pro-

gram. Source documentation

is

want when maintaining,

will

someone

else

you or another programmer understand code you or

revising, or just trying to

We discussed techniques for source documentation back

has written.

make

in chapter 2: trying to

the documentation

the source code as self-documenting as possible

by

using good formatting, choosing good variable names (which applies to choosing

good names

for subroutines

comments. Sometimes

and filehandles

all this is

and by using informative

as well),

not enough. Later in

this

chapter

other ways of documenting your source code using Literate

we

techniques. First, however,

standard Plain

Old Documentation (POD)

level

documentation

is,

as its

gram or module. For programs,

name

this

what outputs user

is

it

The

interface to the

module: the

format.

implies, intended for the user of the pro-

it

expects,

what options

it

essentials

might

take,

of

and

file

using the

to

is

POD

way

to provide user level

embed

markup language.

POD

enough

within your script and not worry about

underestimated. Learning to use

POD

be available in a standard form since is

as

written in

documentation

for

your

Perl

the documentation directly in the Perl source

the Perl compiler understands just well

in the Perl distribution

of functions and/or methods

availability

and return values of each of those functions or methods.

current standard

programs or modules

it

Perl's

produces. For modules (discussed in parts 3 and 4), the intended

well as the calling interface

use

documentation using

(LP)

another Perl programmer. Documentation in that case should cover the pro-

grammer

code

Programming

documentation should cover the

running the program: what arguments

will discuss

and POD

User documentation

9. 1 User

will cover creating user

we

is

all

easy

is

a simple

to ignore

it.

markup language

that

—which means you can

These two benefits shouldn't be

and means your documentation

will

of the standard documentation included

POD format.

Using

POD to include your doc-

umentation within your program means that any user of your program or module automatically has the documentation.

your documentation when in a separate text

USER

it is

You may be more

right there in the

same

likely to properly

file

as the

update

code rather than

file.

DOCUMENTATION AND POD

161

'

Let's

POD

code.

such

utility

for

POD

consider

on

own

its

way of marking up

before

we

a

as

pod2html or pod2latex) can convert the pod-source

your document using utility translators to

else for

just

first

of formatting

is

files,

a translator.

The

of tags

set

a

1

heading

-

'

2

heading

covered

lists,

.

,

........

*

is

i

-

And this is a paragraph of plain text below the second level heading

= item

small and

is

list:

This is a level

4

the various

PostScript, manpages, or

files,

This is a small paragraph of text below the level 1 heading.

=over

you can write

We will just consider the basic elements here.

This is a level

=head2

into source code

that

is

(a translator

You can then use

tags.

HTML

in Perl source

tags we'll consider are structural tags, that indicate headings,

and items within =headl

set

which there

in the perlpod pod-pa.ge.

The

one

produce LaTeX

POD

of

utility

it

program

plain text so that another

is

one or another formatting programs. The

something

embedding

discuss

.

.

.

/.

'

;

' ,

',

. .

t

..

-

^

{'I

: -

^"S'-

' '

-



first bulleted list paragraph = item

*

r

,^

^

second bulleted list paragraph = back

'

,

'

'

The =headl and =head2 and sub-headings. The =over

some

formatters.

lowed by an

asterisk,

tags are fairly self explanatory, they create headings 4 tag starts a

Each item

in the

list is

produces a bulleted

list,

the

list

and the =cut

tag

is

list

item.

tag ends the current section of

POD command or structure tags

paragraphs also need to flush to the

lems.)

a indent size

A numbered list can

asterisk.

POD

The =back

until another

if fol-

be created

tag ends the

=xxx

style

of

encountered.

All the

lines.

number implying

tagged with an =item tag which,

simply by using increasing digits in place of the

162

{

'

=cut

for

f,

,;„

left

need to

start at the left

margin. Plain

margin and need to be separated by blank

(Note, though, that seemingly blank lines containing spaces can cause prob-

Within a plain paragraph,

several formatting tags can be used:

CHAPTER

9

DOCUMENTATION

.

=headl Simple Formatting

Within a plain paragraph some text might be tagged as I or B or as C (presumably formatted in fixed width font) The S tag means that the text inside the tag delimiter should not be broken across lines at the spaces. .

You can enclose filenames in an F tag, index entries using X, and links with the L tag. Links are for manpage references, and a tag like: L would translate to "the blah manpage" but this added text can be controlled (see L for further information) = cut

--

Aside from structural elements and plain paragraphs, you can also create "verbatim" paragraphs

no

—where what you

interpretation of tags, etc.

You

write should get typeset as

create verbatim paragraphs

it is

with no wrapping,

by indenting each

line:

=head2 Verbatim Example This is a plain paragraph that might be wrapped by the formatter and will have B interpreted in the process. A I paragraph, perhaps showing a code example can be achieved by indenting:

#this is a verbatim paragraph my $foo = "nothing B in here"; print $foo; =cut

And

most of what

that's

bonus of POD

is

takes to write simple

it

that everything

manpage format we

documents. The added

between an =coirimand type tag and an =cut tag

ignored by the Perl compiler. Therefore, tion (in a

POD

it is

simple to write your user documenta-

will discuss next) directly

within your program or

module. By keeping your documentation inside your program you are more to

remember

to update

sure that anyone

who

it

and new

to reflect changes

receives

your program

is

features,

also receives the

many translators printing. Or the user

and you

likely

are always

documentation. This

user can then use one of the

to

ment

can view the document using the

for viewing or

perldoc

may

utility

included with the Perl distribution. (Some distributions of Perl

not include the perldoc

comes with a viewer

The supply

USER

produce a nicely formatted docu-

utility:

the MacPerl distribution, for example,

called shuck.)

standard convention for writing

at least the

minimum

embedded

POD

documentation

sections of a standard Unix-like manpage.

DOCUMENTATION AND POD

is

to

The

163

minimal

you should include

level- 1 sections that

solves the world's

problems are shown in table

Sections to include

Table 9.1

in

called foo that

9. 1.

POD documentation

Section

Description

NAME

The name

SYNOPSIS

A

brief

program

for a

program or module and

of the

usage example

of the

a

few words about what

program showing

its

it

does.

calling syntax, or a rep-

resentative sample of the functions within the module.

A more extended

DESCRIPTION

using

OPTIONS

=head2

any, put

If

them

you choose the

A

FILES

SEE ALSO

BUGS

,

'

,

.

discussion of the program, possibly with subsections

tags. a

in

of the files

A

list

of

If

you have bugs

manpages

Author's

want

There perlpod.)

omit

name and

if

none, leave this section out.

programs and/or documentation.

for related

in

your program that aren't ironed out yet,

them

If

this section.

used by the program,

list

or risk being told about

AUTHOR

here, or perhaps under the description above.

list

latter option,

list

them here

again and again...

contact information, plus any copyright statement you

to include (which also could

be under

are other relatively standard sections

You can and should add any

its

own

level

you may want

additional headings

1

heading).

to include. (See

and information about

your program that you deem appropriate.

You may include is

POD

almost anywhere within your program.

that if the compiler reads a

ment,

it

will ignore everything

finds.

Some

some

place

authors put

it

at the end,

things in Perl,

9.2

POD

it is

all

the

tag directive

up

it is

looking for a

and including the next =cut

to

up

it

to the

was

mean

state-

directive

it

throughout the source code. Like

many

Sou rce code documen tation largely

tainable code. I

new

programmer.

Source code documentation was discussed to some extent in chapter ter

basic rule

POD information at the beginning of the program;

and others mix

a choice left

when

The

about using good

style

and comments

Often those guidelines are

all

well-respected ones, not crackpots

to

2.

That chap-

produce readable, main-

you need. (Some programmers

—go

as far as to say that if

—and

your code

needs extra comments to be understood, you need to rewrite your code.) Sometimes, however, this

is

not enough. You

may want

to include detailed explanations

of certain algorithms, provide diagrams, or present the code in an order different

from the more

linear order the compiler expects. Literate

Programming

(LP) tech-

niques can be used for such things and more.

164

CHAPTER

9

DOCUMENTATION

LP

neither specific to Perl, nor necessary for

is

can be such a

LP

in

usefiil tool that

Perl,

but

it

remainder of this chapter to using

will devote the

I

programming with

your Perl programming.

Programming

Literate

Knuth

method of programming developed by Donald

a

is

in the early 1980s (Knuth, D. E. "Literate Programming." The Computer

Journal (27)2:97-1

1984).

1 1

The

essence of LP

embodied

is

in a

quote from Knuth:

Let us change our traditional attitude to the construction of programs: Instead of

imagining that our main task

on explaining

rather

The tion

basic concept

is

what

to instruct a computer

to do, let us concentrate

to

humans what we want the computer

is

that

one should be able

and the program source code together

to do. (Knuth,

file.

984.)

program descrip-

to v/rite out the

in a source

1

This can be presented in

an order suited to explaining the code to humans. The program source can be extracted

from the

file

The documentation,

and tangled together into

then,

is

the original

file,

its

which

proper order for the compiler. is

run through a process called

weaving to produce the description and code in a form ready for typesetting (usually

by LaTeX, but other

target formats such as

Quite a few LP systems

You

are already familiar

that

POD

are also a

Many

are designed for a particular

few language-independent LP systems.

with some of the syntax of one, the noweb system, created

by Norman Ramsey. Some sections of

out there.

exist

programming language. There

HTML can be used by some LP tools).

Perl

programmers use

POD as a form of LP, intermixing

POD-formatted description within the source code.

is

much

better suited to

documenting the source code

documenting the

itself

Thus, in

Personally,

interface of a

this section

we

I

feel

program than

will consider the

noweb system of LP.

Back

in chapter 3 (and again in chapter 5),

name and

define chunks of code.

we used

a simple little syntax to

That syntax comes from noweb, though we did

not use the complete noweb syntax. This syntax allowed us to break our programs

up into manageable

little

pendent of the order

in

units that

which the

back into the main program

The

(and nothing

symbol

nated

else

on

at the left

when

a

discuss in

any order we wished, inde-

particular lines of code

had

assembled

chunk of code

in

noweb begins with double

chunk name, immediately followed by an equals

that line).

A chunk of documentation

is

begun with a

margin, followed by a space or a newline.

new chunk

to be

the root chunk).

actual syntax to define a

angle brackets enclosing a

©

(i.e.,

we could

begins or the end of the

SOURCE CODE DOCUMENTATION

file is

A chunk

is

sign

single

termi-

encountered.

165

;

@

:-

;

;;

.

This is a documentation chunk in which we would explain why the following code assigns the answer to the universe (42) to the variable $foo then does other stuff and finally prints out what $foo divided by 2 is: =

my

=

$ f oo

42

;

«some other chunk>> print "$foo

is $bar\n"

2

/

@

This is another documentation chunk, in which we would explain the significance of dividing $foo by 2 if doing so had any significance, which, of course, it doesn't. "' = my $bar = $foo / 2 '

®

,

'

f:

.

,

Now, in a

file

if

the above were to represent a complete (albeit useless) program, saved

named

useless.

—on

notangle



nw, then running the tangler

that source

-Rchunk

notangle

as:

in noweb, the tangler

is

called

useless. nw would produce

the following output:

my $foo = 42; my $bar = $foo / 2 print "$foo / 2 is $bar\n";

Had

the placement of the

have been the same.

' .

.

"

two chunks been

The chunk

and any referenced chunks

-

^

'

.

>

.

reversed, the tangled output

command

given on the

.

line

found and printed,

is

are replaced with their definitions.

would

This means you can

begin designing your program using high level concepts as chunk names, then design and define each of the chunks in the order that makes the most sense to

—much

you

as

we

did for the

f aqgrep

and primes programs

in chapters 3

and

5

respectively.

A particular chunk in

one

place)

definition. final

to

definition

may

also

be continued

by simply using the same chunk name when

Continued chunks must occur

tangled code

(i.e.,

not fully specified

starting another

chunk

in the order they are to appear in the

—notangle simply concatenates

all

continued chunks together

produce a single chunk:

Documentation stuff. = my $foo = 42

.

.

,v

,

«another chunk»



. .

'

®

'

More documentation.

.

.

'

V,^

.

.

«another chunk»= my $bar

166

=

$foo

/

2

CHAPTER

9

DOCUMENTATION

;

@

then we continue defining the original chunk:

Yet more documentation..., =

print "$foo

is $bar\n"

2

/

@

Tangling the above version produces the same output ple. All the

continued component chunks of chunk are concatenated together and

then the chunk tions in the

is

chunk

represents a complete

You can

same

its

own

file

A

file.

program or module or something you want

to

given noweb source

module

write such a

the source code

testing

A

you may write a

functions to ensure they are

chunk of

fi^r

all

library

as a literate

a

might hold more than one

file

module

source

that defines several fiinc-

file.

You can

program whose purpose

working. In

way, you could even have each

this

code follow the chunk (or chunks) of module code that

module code or the

tangle out either the

Of course, we may still make sages to the screen, noting the

all

where

is

our code and Perl

name and might

Perl thinks the error

gled program that

ers

file

errors in

we

this

actually run.

is

not

really a

line be).

Hence, the problem.

compiler about what corresponding code

number

it is

is

it is

reading.

on

We

it tests,

Then you could

The problem file

Many language file.

tell

These

can use them to

located in the useles.nw

file.

is

mes-

found the error

it

that we've gone to

separate

numbers and

line

in that

will print error

number where

understand special directives that do nothing except

reading and what line

file.

code or both.

test

the trouble of writing our source code in chunks in a

be correct. Well,

also include in

to test each of the

is

thus keeping related code close together in the literate source

(or at least

with their defini-

to use as the root chunk.

notangle which code chunk

told

root chunk. For example,

the

are replaced

same manner.

tangle out into

tions.

Any embedded chunks

printed.

The -R option root

previous exam-

as the

from the tan-

name

file

will

not

compilers or interpret-

the compiler

what

file it

are useful for lying to the

tell

the compiler where the

Wlien an

error

is

encountered,

the error messages will point to the corresponding code.

In Perl, a line directive takes the form of a special

#line 13

"file" by

itself

on

a line. Actually,

it

comment

that looks like

will recognize directives

match-

ing the pattern

/'-#\s*line\s+ (\d+)

\

s*

(

\s "([""]*)") ?/

In that pattern, the $1 variable would hold the line able

would hold the

file

name. The

included, Perl uses whatever

it

number and

name

is

optional. If the

currently thinks

is

the

file

SOURCE CODE DOCUMENTATION

file

file

the $3 vari-

name

is

not

name.

167

The

noweb can be given

tangler in

and

a -L option

it

useless.

nw file

an example, a

as

notangle -Rchunk -L useless. nw

file.

Using

call to

>

xxx

would produce the following output

in a

#line

line directives

.nw source

into the tangled code referring to positions in the original

the above

emit

will

named

file

xxx:

"useless. nw"

3

my $foo =42; #line 11 "useless. nw"

my $bar = $foo / 2 #line 5 "useless. nw" print "$foo / 2 is $bar\n";

'

;

That

foo

effect, a line directive is

from a code chunk. Now,

2; (where the

/

;

with the -l option in

is,

enters or returns =

.

-

$

symbol on $foo

was an

if there is

error,

such

as

it

my $bar

missing), then running the resulting

would produce the following

tangled code as perl -w xxx

emitted whenever

errors:

Unquoted string "foo" may clash with future reserved word at useless. nw line 11. Argument "foo" isn't numeric in divide at useless. nw line 11.

So we could proceed directly to our original noweb source problem. revise,

We

and maintain the

If that

were

be beneficial. tion.

only use the tangled code to run the program.

The

write, debug,

literate source.

noweb (and other LP systems) allowed you to do,

all

original

However, there

is

to find the

file

We

.nw source

file

file

to

would

still

would be your program's documenta-

another side of the LP system

tem can weave your source

it

— "weaving." The noweb

produce LaTeX or

HTML

sys-

documentation

for

formatting and viewing. This means you can write the documentation chunks using whatever capabilities the target formatter allows such cal formulas,

HTML

diagrams,

cross-references,

lists,

and

as:

typeset mathemati-

Both the LaTeX and

indexes.

backends perform additional formatting and cross-referencing of your

actual code chunks, line options to the

and any

identifiers

you

specify,

through the use of command

noweave part of the system.

When you end a code chunk, you may use a special directive to specify a list of identifiers



variables, filehandles, subroutine

that chunk. In the

woven

and

all

cross-referenced

168

etc.

— considered

"defined" in

version, such identifiers will be cross-referenced

index can be produced listing defined,

names,

all

the chunks that used that identifier.

and indexed

and an

defined identifiers, the chunks where they were

Chunks themselves

for easy reference purposes.

CHAPTER

9

You may

will also

be

specify such

DOCUMENTATION

:

by following the @ symbol that ends the code chunk with a space and the

identifiers

%def

.

directive, followed

by

a space-separated

of identifiers on the same

list

line:

In a documentation chunk. Here is a code chunk which includes identifier definitions marked at the end of the chunk

«chunk>>= my $foo =42; my $bar =13; © %def $foo $bar Now in another documentation chunk. Unfortunately, because this

noweb system,

it is

book

However,

not being typeset using LaTeX and the

to be

shown

I

have created noweb source

program (shown

sophisticated version of the faqgrep

program

is

not possible to show you here exactly what the typeset documen-

tation actually looks like.

tangler

.

in chapter 13)

These

later in this chapter.

will

more

for a

files

and the simple

be available at

httpill

www.manning.com/Johnson/., where you will find a link to additional online resources for the book, including source code

(PDF) versions of the

Other

9.2. 1

One good

typeset

and

PostScript,

and Portable Document Format

documentation for the two programs mentioned.

ofLP

uses

use of an LP style of programming

teaching purposes, which

exactly

is

many of the programs you

why

I

is

the presentation of source code for

have used a form of LP

have encountered thus

far in this

when

presenting

book.

A couple of related uses arise from the fact that a single literate source file may contain the source code for more than one program, each with further elaborated.

Of what

possible use

is

this?

its

own

rectly.

program

that verifies that each

Using LP, you can create two root chunks

and one

for the test suite

lowed by

its

code

intended to

it is

time. If a

test,

component. In

for the

this way, test

wish to cor-

program or module, the program, fol-

code remains close to the

and the documentation can deal with

component needs

easily adjusted at the

—one

also

component works

— then develop each component of

related testing

chunk

Consider that you are writing a

program or module with many small components (chunks). You may write a comprehensive test

root

issues at the

to be fixed or modified, the appropriate test

same

code

is

same time.

This idea applies to

test

data as well: you

may keep chunks of code

that deal

with particular kinds of data next to chunks of data designed for testing that particular

chunk of code.

Similarly, a

program may read an external configuration

ing a set of parsing rules.

As above, the program and the

parsing data can be written in the same source

SOURCE CODE DOCUMENTATION

file

file

or a

file

describ-

external configuration or

in a parallel fashion.

169

None of this our tangling

You

time).

The

tem.

implies that literate code need be contained in a single

script given

are

fi:ee

multiple

do not need

your

to create

files

literate

match up with the multiple

to

LP

senting the source code in a logical fashion. Like Perl,

93 The

it

r

lacks

all

just described.

the formatting

the real typeset version.

LaTeX

—such

(When

We

call

^

is

:

is

program

that

would be

typeset



the previous chapters,

and introduce

the

like.

The

include a few

Remember, the

fijll

are available for

following program will allow

code without fetching and

many

available in

using noweb and

We will

automatically added.)

will also serve to tie together

It

is

markers so you can see what they look

to try out writing literate source

work with

the presentation only partially literate

http://www.manning.com/Johnson/.

noweb system.

literate

about giving you more

and typeset versions of the following program

plain text source at

your

design and documentation.

and cross-referencing

the literate

cross reference material

identifier definition

download

ultimately

files

to break

following section presents a simple tangle-like program that will

noweb syntax we've

you

how you approach both code

flexibility in

Tangling code

because

at

along chapter or section boundaries or whatever works best for pre-

files

freedom and

file

source code for a sys-

produced by the tangling process. Instead, you might choose sources into

(though

below in section 9.3.1 does only operate on one

to use multiple files

file

installing the actual

of the things you've learned in

a couple of new fianctions

you

haven't seen yet.

A simple tangler

9.3.1

Now that we know what the chunk definition and the

reference syntax are,

we can

build a limited tangler program to allow us to write our Perl programs using

noweb

'

intermixing code chunks and documentation chunks (we are in a

s syntax,

documentation chunk

right

now) throughout the source

We want our tangler to operate similarly to specify the root ferences.

it

to a

name you want

file

yourself

it.

The second

difference

root chunks

matically find

all

chunk names.

A root chunk

We

call

is

add two

We assume that the root chunk name

that, if

file

named

is

given, our

to their respective

program files

program outline

also the

(i.e.,

will auto-

based on their

any chunk not used inside another chunk

initial

is

running our tangler program

blah and writes the tangled code

no -R option

and print them

dif-

STDOUT so you

our tangler pqtangle for Perl Quick Tangier and write

named pqtangle. nw. Our

170

is

We

prints the tangled code to

to use for the tangled code. So,

with a root option of -Rblah creates a to

notangle, allowing a -R option to

a -l option to include line directives.

The notangle program simply

have to redirect file

chunk and

file.

definition. it

in a

our root chunk) looks

CHAPTER

9

file

like

DOCUMENTATION

— «pqtangle»= /usr/bin/perl -w use strict; >\s*$/

,:[;

;

}

push

@

{

$chunks { $1 }

}

,

"

$begin_of f set $f lie $line_no " :

:

(a

^

'-:v

When we finished reading through the chunk, we used the autovivification syntax to

push a

tion of the that are

string containing the byte offset,

chunk we

inside the inner

At

matched

while loop was

this point,

starts in the file

the one

we know

and the

line

number informa-

line

%chunks hash contains keys

values hold an array of offset/ information strings for

every location where that chunk's definition is

name, and

just parsed. In other words, the

chunk names and keys

$1 variable here

file

is

continued throughout the

which

is

now out

of scope.

the byte offset location where every

chunk

number of the

chunk

first

code

The

The one matched

in the outer if statement.

localized to that block

file.

line in that

definition definition.

We also have a record of every chunk used in another chunk and, hence, every chunk that cannot be a root chunk. We now populate our ©roots array with the root chunks we need to root

chunk name

the root chunks. since the keys are

@roots array =

($Root)

©roots

{

=

($Root)

TANGLING CODE

173

;

}

:

;

;

else { foreach my Skey (keys %chunks) { push @roots, $key if not $used{ $key) }

}

@

Printing out the root chunks for each root

do

is

simply a matter of opening a

chunk name and printing out

>= my $shebang_special = 0; $shebang_special = 1 if $line =~ m/^#!/; @

At

we need

this point,

Hne

to create a formatted

directive, substituting the

we used in the $Line_dir variable. We format this line directive in a separate function. Then we need to print out this line directive, but only if the current line is not a shebang line, or an embedded chunk reference. In those cases, we would want to print a line directive when processing that chunk.) We use a simple set of logical ORs that terminates at the first true expression, and thus only prints out the line directive when needed: correct information for the placeholders

= my $line_dir; if

($Line_dir) { $line_dir = make_line_dir $line_nijmber $f ilename) $line =~ m/"\s*>\s*$/ || $shebang_special print PROGRAM $line_dir "

;

,

(

.

|

|

"

,',

;

,

,

'

}

• .

we have

Since

and define our

it

literate

,

,

here.

just

used the make_line_dir

This example also

(

illustrates the

. ,

function,

)

we should go ahead

point about being able to write

source in the order that makes sense for discussion.

First, let's

add

to

the subroutine definitions chunk:

=

Now

the function to format our line directive

operations. file

'

The

function

is

$line_dir variable and assign

variable,

which holds the

to lexical variables.

it

of substitution

number and

the

We then declare a new

the value of our file-scoped $Line_dir

line directive string

with placeholders.

replace the placeholders with their proper values

$line_dir

a simple set

passed parameters for the current line

name, which we immediately assign

lexical

is

Finally,

we simply

and return the value of the

variable:

«suJb make_line_dir>>= sub make_line_dir {

176

CHAPTER

9

DOCUMENTATION

;

;

my $line_no, $f ile) = @_; my $line_dir = $Line_dir; $line_dir =~ s/\%L/$line_no/ $line_dir =~ s/\%F/$f ile/ $line_dir =~ s/\%\%/%/; $line_dir =~ s/\%N/\n/; return $line_dir;

;

' ;

(

-

,



;

' .

,

}

@

In order to tangle out our chunk,

ing the chunks in the line

first

That

place.

we is,

when

use a loop similar to that used

we

does not match a chunk terminating

pars-

continually loop as long as the current

We

line.

must make

sure

we

read in

we would be looping forever on the same line. Inside the loop, the current line might be an embedded chunk reference, in which case we need to tangle out that embedded chunk. Note, we are capturing the leading whitespace if there is an embedded chunk reference, as another line in both blocks of the if /else statement or

well as the it

chunk name



way we can

this

call

print_chunk

the

a string representing the current indentation level so

appropriate indentation. If the line does not contain a the code line (and print out a line directive following

)

(

routine and pass

our tangled code has the

chunk reference, we will it

if it

was a shebang

print

line):

«tangle out current chunk>>= while ($line if

}

~ m/ " \@\s*$ \@\s\%def / { ($line =~ m/ \s* ?).

is

\u$l/g

/

is

—and

replace the

matched

uppercase. Let's assume the

letter in

can simply rename

letter

text

^which

is

captured

with a space, followed

document

this file as article. html.bak



is

in a

file

called arti-

and use the following

new version of article, html:

to create a

# /usr/bin/perl -w use strict; opendNPUT, article, html, bak open(OUTPUT, >article.html while () { if m/< [Hh] [l-3]>/ { !

'

'

)

||

'

'

(

)

||

die "can't open file: die "can't open file:

$

$

" !

;

!

"

)

s/

(

[a-z]

)

/

\u$l/g;

}

print OUTPUT $_; }

190

CHAPTER

10

REGULAR EXPRESSIONS

A more command

general version of this

line

and print the

be redirected to another

program would simply read a

resulting output

file

on standard output

given on the

so that

it

could

file:

# /usr/bin/perl -w use strict; while! { if m// { [a-z] / \u$l/g; 8/ !

)

(

'

)

(

.

,

)

'

}

print

•:' ;

}

If

you named

this

program cap_heads, you could run

it

from the

command

line like this:

perl cap_heads article html bak .

.

This way you can use edit the

program

10.2.2 There

to

modify other such documents without having

Character

class

Each of these has

ter class.

We

sho rtcu ts

Table 10.1

their use in chapter 6.

We

Escape sequences for commonly used character classes Description

\w

equivalent

any

to: [a-zA-Z0-9_l, a

to:

word character

an underscore character

letter or digit or

equivalent

[^a-zA-Z0-9_l, a

any character that

is

equivalent

to: [0-9],

\D

equivalent

to: ['^0-9],

equivalent

to:

[

not a

non-word character or underscore

letter, digit,

any single

\d

\s

character

for review:

Escape sequence

\W

commonly used

a variant to stand for the corresponding negated charac-

saw each of these and some examples of

them here

to

to replace the filenames.

are three special escape sequences that stand for

classes.

repeat

it

article.html

>

digit

any non-digit character

\n\f\r\tl,

a

whitespace character

a space, newline, formfeed, return, or tab character

\S

equivalent

10.3

to: [^ \n\f\r\tl, a

Greedy quantifiers: take what you can get

Another greedy quantifier that operates fier.

non-whitespace character

similarly to the star

is

the plus (+) quanti-

This one matches one-or-more of the previous components. You can think of

GREEDY QUANTIFIERS: TAKE WHAT YOU CAN GET

191

;

m/f (o+)bar/ is

the

same

as

as

being the same as m/f (oo*)bar/, in that matching one-or-more

matching one thing, then zero-or-more of the same thing. The pat-

m/fo*bar/ would match against the string fbar, matching and

tern

characters followed

fbar because there

The The

ters.

star

is

plus

is

f

then zero o

by bar. The pattern m/fo+bar/ would not match against

isn't at least

one o following the

f in that string.

an indeterminate quantifier that can match any number of characonly slightly determinate in that

must match

it

at least

one thing,

but could match any number of additional characters. Perl also offers a few other greedy quantifiers with varying degrees of indeterminacy. These are listed in

meaning and an example with an equivalent formula-

table 10.2 along with their

tion using constructs

we

already know:

Greedy quantifiers

Table 10.2 Quantifier

Description

?

match zero-or-one time

m/fo?bar/

equivalent to m/f (ol)bar/

{n}

match exactly

n

times

m/fo{2}bar/

equivalent to m/foobar/

{min,}

match min-or-more times

m/fo{2,}bar/

equivalent to m/foo+bar/

match

max}

{min,

In the All the

at least

first

all

much

grouped alternation means

most max times

)bar/ might seem strange.

(o|

match an o or match nothing.

is

last

greedy quantifiers and will as

at

example above, the equivalent m/f

ordering of the alternatives in the are

min times, but

equivalent to m/f(ooooloooloo)bar/

m/fo{2,4}bar/

7\lso, if

the

example surprised you, remember that these

first

try to

match

as

much

as possible (or as

they are allowed) before trying lesser amounts.

10.4 Non-greedy

quantifiers: take

what you need

Often, greedy quantifiers are simply too greedy for your intended purpose. Consider trying to

match and capture

all

the text

on

a line

up

to the first occurrence of al5:

$line = "one two three al5 four five six al5 sevenVn" $line =~ m/ .*)al5/; print "$l\n"; # prints: one two three al5 four five six (

What happened? The string

192

star

and then backtracked

is

until

greedy and matched

an al5 could match.

CHAPTER

10

all

the

way

to the

What we need

is

end of the something

REGULAR EXPRESSIONS

;

that will it

match

matches

as little as possible

and then check the

of the expression to see

of the greedy quantifiers have a non-greedy form that

yet. All

the quantifier followed by a question mark.

now

rest

if

simply

is

A revised version of our example above

using a non-greedy star quantifier would be

$line

=

"one two three al5 four five six al5 sevenXn"

$line =~ m/

.

(

print "$l\n";

The

i

(

.

*?)al5/; #

prints: one two three

*?) tries to

acter followed

match zero

characters followed by an al5, then

by an al5, and so on

until

it

finally

one char-

matches fourteen characters one

three and succeeds in finding a following al5. The other non-greedy ver-

two

and operate

sions of the quantifiers are given in figure 10.1

10.5

in a similar manner.

Simple anchors

An anchor is

a

form of zero-width

This means

assertion.

matches not a character,

it

but a position with certain properties. You have already seen two such elements in chapter 6: the

can be used to match

(caret)

can be used to match the end of a you've already seen

string.

at the

beginning of a

string,

and the

$

Another anchor type regex element that

the word-boundary element \b. This matches at a position

is

between a word character (\w) and a non-word character (\w.) or between the beginning or end of a string and a \w character. To get an idea of what

match a position

rather than a character,

let's

it

means

to

consider another simple example

depicted graphically.

we

In figure 10.5 strings foo

(between the

step through

running the pattern m/\bfoo\b/ against the

bar and foodbar. At step start

of the string and the

1,

\b matches at the start of the string

f character) in

both

cases.

Because

this

zero-width assertion, the pointer remains pointing at the same position in the get string. In order to

on

the string,

first

show

we advance

the regex

components

in the place

the pointer to the next regex

component down below the

regex.

This

is

simply

\b and the f regex element both are successful at the

The

pointer then advances along in the usual

hit the final \b. In the first case, there

the o

and the space

is

a

is

word-boundary

In the second string, the position

lies

again a

a

tar-

where they match

component but drop

the

my way of showing that the

first

position in the string.

manner

in each string until

we

match because the position between

position.

Thus

the regex succeeds at step 6.

between an o and a

d,

which

characters, so the regex fails at this point. (Note: although not

SIMPLE ANCHORS

is

are

shown

both word

in the figure.

193

\b]»

EHa0 0 H

oj

TjiTopTolalu

a

b

0

Jo

-{iHo}QD{b}S& failure

Figure 10.5

Stepping through a pattern match with anchors

the pointer

would return

attempt to match the

first

to the

component

The word-boundary anchor at a position in a string

When on

a

beginning and the whole regex would repeatedly

has a complement, the \B anchor, which matches

between two word characters or two non-word characters.

the /m modifier (we will discuss modifiers in the next chapter)

match or substitution operator

at the start

against every position in the string.)

and end of

lines

means

it

within a multi-line string.

match only the beginning and end of a is

and

that the

$

is

used

anchors can match

The \A and

string respectively, regardless

\z anchors

of whether

it

a multi-line string or not.

The

final

simple anchor

the /g modifier

is

the \G anchor, which works in conjunction with

and anchors the match

to the last position

matched

in a repeated

match operation.

194

CHAPTER

10

REGULAR EXPRESSIONS

10.6 Until

and backreferences

Groupingy capturingy

now we

have only used plain parentheses for grouping subexpressions.

disadvantage of this technique expression

that anything

is

captured and assigned to a special variable based on the position of

is

and memory

the parentheses in the overall expression. This takes extra time

you

regex machinery, and, often,

capturing

its

expression

is

matching the

(?:

A

text.

form of parenthesization that

matched

for the

grouping a subpattern, not

are only interested in

subexpression) form. For example,

ested in capturing the

would

The

matched by the parenthesized sub-

will

only group an

you

are not inter-

if

the earlier example using m/f (u oo)bar/

text,

|

be better written as m/f(?:u oo) bar/. I

When capturing parentheses are used, cial variables

$3

$2,

($1,

.

.

.

the captured text

where the

)

(xyz)

/

the string foobarbaz, $2

These

string xyz.

left

to

would contain the

variables

may

string bar,

and $3 would contain the

be used within the replacement part of a substitu-

These

from within

strings

is

exited.

another pattern match successfully matches or the

This

is

useful for extracting particular bits of data

of text. (We saw examples of this sort of thing in chapter

Using capturing parentheses in

a pattern also

same pattern using the

able later within the

approach to searching a

for

file

tern such as m/\b(\w+) \s+\l\b/i. This

makes the captured

back to previously matched

multi-line strings

string to read a

words even line.

if

one

This

Remember

file

is

—such

characters, followed

at the

as

when

we

tells

the special $/ variable

end of one

possible because

line

we used

line

and the other

is

at the

word boundary following

backreferenced text here

is

the

match opera-

set to still

an empty

catch double

beginning of the

the \s sequence instead of just a space.

that the \s sequence represents the character class

placed a

is

—we can

two words may be separated by one or more of any of those that

[

\n\f \r\t] so the

characters. Also note

the backreference. This ensures that the

not simply the beginning of a larger word such

perfectly logical string This

a

by whatever word

catch doubled words that might differ in case. If used

by paragraphs instead of line by

is

text.

double words would be to use a pat-

was matched by the capturing parentheses. The /i modifier

we can

text avail-

would match something resembling

word followed by one-or-more whitespace tor to ignore case so

6.)

... escape sequences.

\3

\2,

\1,

are called backreferences because they refer

A simplistic

next

special variables

but automatically localized within their immediate enclosing block. They

current scope or block

on

pattern m/

did match against a target string, then $1 would contain

will retain their values until either

These

If the

right.

tion or in statements following a successful pattern match. are global,

of occurrence of

digits reflect the order

the subexpressions themselves counting from

(foo (bar)baz)

stored in a set of spe-

is

as in the

thistle is bristly.

GROUPING, CAPTURING, AND BACKREFERENCES

195

;

A

program that makes use of

this pattern to locate

doubled words and to highlight them somehow can be #

!

as

paragraphs containing

simple as

/usr/bin/perl -w

use strict; =

$/

";

read files in paragraph mode

#

while { print if s/ \b \w+ (

)

(

(

)

\s+

\1

(

)

\b/ * $1*$2 * $3 * /gi

)

}

We captured the first occurrence of the word, the second occurrence of the

word

into three separate variables so

the text with a few asterisks inserted to also

used the /g so that

we could

the intervening whitespace, and

make

the doubled

we could

words stand

replace

out.

We

highlight multiple occurrences of doubled words

within a paragraph. Jeffrey Friedl gives a more involved version of this program in his

book^ that allows an intervening tag such

bled words, uses

ANSI

an

HTML tag and

escapes to highlight the text,

lighted lines rather than the

improvements

as

whole paragraph. You

between the dou-

prints out only high-

encouraged to add similar

are

above version of the program.

to the

Prime number regex

10.6.1

we developed a program to list all the prime numbers from 2 to N (where N was a number entered by the user of the program). That program was straightforward and relatively efficient. Here we will show another program to list Back

in chapter 5,

prime numbers, one that

is

neither straightforward nor efficient, but nonetheless a

marvelous example of something (of what we're just not If

you

visit

and search the comp.lang.perl.misc

http://www.dejanews.com

archives for the terms "Abigail"

and "prime,"

you'll eventually find a rather surpris-

ing usage of regular expressions to determine if a given a frequent poster to the comp.lang.perl.misc (as far as

I

know)

as a clever little one-line

natures. Since then,

sions

it

and with a few

you search the

has been

archives.

primes from 2 to

N

The

(where

/usr/bin/perl -w use strict; my $N = shift @ARGV;

N

you

following is

number

newsgroup and

program

in

this

is

the group

are sure to find quite a is

prime. Abigail

is

example originated

one of Abigail's sign-off sig-

commented on within

variations, so

sure).

on

few

a

few occa-

articles

an extended version that

when

lists all

the

an argument to the program):

#!

1

Friedl, Jeffrey.

196

#

get the number

Mastering Regular Expressions. Sebastopol, CA: O'Reilly and Associates,

CHAPTER

10

1

997.

REGULAR EXPRESSIONS

;

(my $number = 2; $number my ($dir, $file) = / # default initial path $dir 11= return ($dir, $file) (

'

.

'

)

;

}

This version version

is

a

much

is

little faster

simpler overall.

A quick benchmark shows

that the

first

(around 18%) than the regex version. So unless you were

going to be doing a large number of such operations, the simplicity of the regex version probably outweighs any efficiency concerns your program might have.

The substrO data

files



files

function

where each

is

also

field

commonly used

of data

starts in the

the same width for every line (or record) in the

might be a better choice

this type

character group designation followed by

and

3.

We

file.

is

(The unpack {) function

fields.

The

data consists of a four-

measurement data with

have already identified

fields

field

occurs.

Now we simply want

print as comma-separated data for use in another program:

to extract only the

widths of 2,

where missing data

we will

# /usr/bin/perl -w use strict; while () chomp my ©fields = (substr($_ substr $_ substr $_ substr ($_

column

of task though.) In the following example, we

have some fixed column data with a few missing

3, 3, 2, 3, 2, 2, 2, 2,

to pick apart fixed

same column position and

columns of complete

data,

which

!

(

(

)

0,

4)

,

9,

3)

,

17,

2)

25,

3)

substr 4, substr $_, 14, substr $_, 21,

2)

,

(

3;

(

2;

;

print join(',', ©fields print \n" "

}

DATA 120B2212 110622 116953 13 632101 1021911793 3929090 220b26 220b29125111 118952934 096 220bl81231182811596233 63 0093 140D2611810821112882831 092 140D23 1062011291293833096

An

interesting thing

function:

you do not

happens when you take a reference to the substr

get a reference to the literal substring

itself,

()

but to the given

region of the string:

my $string = 'foobar'; $ref = \substr $string, print $$ref \n" (

"

210

1,

4); #

prints

:

ooba

CHAPTER

11

WORKING WITH TEXT

.

$string = 'scoolly' print "$$ref\n";

prints: cool

#

Here, the reference in $ref

.

not to the particular substring ooba, but to the

is

four-character shce of $string, starting at position after

we

new string to of this new string.

11.4

second character). So,

now

$string, our reference

assign a

character shce

(the

1

refers to the four-

Translating characters

Another situation that often operation

is

and seems hke a good choice

arises

translating characters.

string into underscores.

One

Assume you want

obvious method

is

to

for a substitution

change

spaces in a

all

to use the substitution operator

with the /g modifier: $_ = 'this is a string'; /_/g; print; # prints: s/

'

-

'

;



"

-

this_is_a_string

"

The

translation operator (tr//) accomplishes the

same

task:

'this is a string'; tr/ /_/;

$_ =

print;

#

The

first

prints: this_is_a_string

thing to realize about the tr// operator

first

part as a regular expression:

first

part

is

a

list

treats

is

both portions

that

it

as lists

does not treat the

of characters.

of characters for which to search, and the second part

The

a corre-

is

of replacement characters: tr/SEARCHLiST/REPLACEMENTLlST/. By

sponding

list

default,

operates

it

it

on the $_

variable,

but

it

can be bound to any variable using

the binding operator:

tr/abc/cab/; tr/a-z/A-Z/;

# # #

replace a with c, b with a, and c with b. replace lower case letters with corresponding uppercase letters

$string =~ tr/A-Z/Z-A/;

# # #

You can

replace upper case letters in $string with their counterparts in a reversed alphabet

use this as an easy

method of doing

encoding scheme where the alphabet

maps

that a

rotl3

(

)

to

n and b maps

to o

is

ROT 13



encoding

a simple

divided into two halves and swapped so

and

function to encrypt a string, then

vice versa. call it

You can

again

call

the following

on the encrypted

string to

get the original text back:

TRANS LA TING CHARA CTERS

211

;

sub rotl3 { my $ string = shift; $string =~ tr/a-zA-Z/n-za-mN-ZA-M/ return $string; }

The tr// found. is

When

the replacement hst

replicated as the replacement

$count

tr/a//;

=

#

tr/aeiouAEIOU/ /

=

When ter

;

the replacement

of the replacement

tr/a-z/ABC/

This allows you to count characters in a

string:

$count gets number of 'a' characters in $_ $count gets number of vowels in $_

list is

it

are used, the search hst

'

shorter than the searchlist, then the last charac-

repeated to equalize the two

lists:

b to B, c to C and all other lowercase letters translate to C as well

a to A,

#

;

list is

#

characters in the search hst that

empty and no modifiers

is

list.

#

$count

number of

function returns the

#

Three modifiers may be used with the tr// operator,

in the

same way that

modifiers are used with the match and substitution operators: /c, /d, and /s

(which stand for complement, delete, and squash, respectively).

The

/c modifier

words, the

list

that, instead

same

of

all

means

that the searchlist

size as the searchlist, all list

taken as a complement

characters not in the given searchlist.

of replicating the

the replacement

is

last

The /d

character in the replacement



in other

modifier means

list

until

it is

the

matching characters that do not have a counterpart in

are deleted

from the

target string.

The

/s modifier

means

to

squash consecutive matching characters with one copy of the replacement character.

tr/aeiou/ * /c tr/aeiou/x/d; tr/aeiou/ /cd;

#

lis;

#

;

tr/

I

sions.

a

would

The

# #

like to stress again that the

row followed by

1

a

a b with xyz.

It

replaces each a with an x, each asterisk with a y,

z.

Exercises

Write a function that returns a such case

tr/ / operator does not use regular expres-

expression tr/a*b/xyz/ does not replace zero-or-more a characters in

and each b with

11.5

replace all non-vowels with an asterisk replace a with x and delete all other vowels delete all non-vowels replace consecutive spaces with a single space

as:

"The the way

list

of all doubled words (two words repeated

to...") in a string.

A doubled word may have different

and be separated by any amount of whitespace including an embedded

newline.

212

CHAPTER

11

WORKING WITH TEXT

2

Write a regex to substitute every occurence of the word apple with orange only

if is

followed by a space and the word peel.

Do

not change the apple in

pineapple. 3

Write a function that prints out a summary of the frequencies of each vowel in a string. For example, if passed the string

discontent, a e

3

o

3

u

1

it

would

This is the winter of our

print:

0

14

EXERCISES

213

CHAPTER Working with

lists

215

12.1

Processing a

12.2

Filtering a

12.3

Sorting

12.4

Chaining functions

12.5

Reverse revisited

12.6

Exercises

list

217

list

lists

217 221

223

224

214

12

;

an important and powerfijl feature of the Perl language, so

Lists are as

no

surprise that, just as with strings, Perl has a

few

list

it

should come

manipulation tools up

its

sleeve.

In earlier chapters, we've seen the essential built-in functions for working

with

arrays

and

to

name

exists

(

joining

,

lists

chapter, list

)

we



popO,

a

as well as a

shift

examine a few

map

grep

)

,

uses of the

reverse

(

12,1

Processing a

(

(

)

,

()

functions.

( )

We

more

will also consider

function.

)

(

-

-

list action

on each

;

This should probably be your

first

choice



_

when you need

,

.

ele-

^

'

)

}

to create a

this

built-in functions designed explicitly for processing

and sort

©list foreach my $item { # do something with $item

it

making or

for

and joinO functions. In

The standard way to process a list of data, or to perform some ment of a list, is to iterate through the list in a f oreach loop:

data, but

keysO,

unshiftO,

(),

few functions and operators

the range operator, and the split

will

data: the

—pushO, few—

hashes

-

.

to process a

-S-'-

of

list

does have certain limitations. Consider a simple case where you want

new

array that contains the value of each element of an existing array

multiplied by a factor of two:

my @list =(1,2,3); my @new_list; foreach my $item @list push @new_list, $item (

print "@list\n"; print "@new_list\n"

The map

(

)

# ;

#

.

{

)

*

2

;

prints: prints:

_

12

3

4

6

2

,

function allows us write the preceding code

assignment from one

list

more

directly as a

list

to another:

my ©list = (1, 2, 3 my @new_list = map $_ * 2 ©list; print "(ilistXn"; # prints: 12 print (anew_list\n" # prints: 2 4 )

,

"

The map

;

{

)

3 6

function has two basic forms:

map BLOCK LIST map EXPR, LIST

PROCESSING A LIST

215

:

The block

or expression

is

evaluated once for each element in the

ment, and the return of the hinction

Each time the block or expression value in the is

is

evaluated, the $_ variable

is

an

of each such evaluation's

list

the element in the

alias to

set to the

is

list,

argu-

results.

current

but changing

it

not recommended under most circumstances.)

When

a block

curly braces)

following

first

argument

(a series

you do not use

comma

a

is

of statements enclosed in

the value of the

last

statement

between the block argument and the

list:

@new_list

map

=

The map

(

)

{

alias to

*

$_

this

@list;

2}

may

function

how you might do an

used as the

is

the return value of each evaluation

,

in the block. Also,

is

$_

(Actually,

list.

the

is

list

be used to transform a

also

f oreach

with a

the current element in the

in place.

list

loop (remembering that the loop variable list):

.

^

=(1,2,3);

my ©list

foreach my Sitem $item *= 2;

.

#

©list { same as: $item = $item

#

prints:

(

Consider

)

*

"i

2;

:

}

print "@list\n";

2

4

6

. , ,

It is

hide the

more

not wise to use real object

map {$_ print "@list\n";

*

In this version, value that

is

is

(

receives in the

)

isn't

list; it

#

prints:

it

is

-,

2

4

...

.,

:

6

,

perfectly clear that list

@list

is

being assigned a

new

list

value.

limited to returning just a single element for each element

can return a scalar or a

=

qw/one two three/;

=

map

{

'"

~' '

from an array where each array element @key_list my %hash

array:

.

a modification of its previous

The map

)

,

©list;

2}

(

unmistakably changing the contents of the

=(1,2,3);

=

much because it can tend to function allows a The map

too

of the code in the indirection.

direct assignment that

my ©list @list

this "aliasing" feature

,,

,

$_ =>

This also has the

1

effect

}

of

is

list

value.

Consider

initializing a

it

hash

taken to be a key and given a value of

1

@key_list;

filtering

out any duplicate elements in the key

list

array because there can be only one of each key in a hash.

216

CHAPTER

12

WORKING WITH LISTS

Filtering a

12.2

list

Often we don't want to process a

ments that meet some grep

(

criteria.

we want

list;

Another

to filter



it

that

get

is,

ftinction, similar in syntax to

map

(

,

)

the

is

ftinction:

)

grep BLOCK LIST grep EXPR, LIST

' '

Unlike the map

element in the

list,

ftinction,

)

(

which returns

a value (or a

in turn.)

This function

is

list

of values)

ft)r

every

of elements for which the block or

this ftinction returns the list

expression evaluates to true. (Again, the $_ variable list

the ele-

all

commonly used with

is

assigned to each value in the

a regular expression as

its first

argument:

my @list = qw/one two three four/; my @new_list = grep m/'^t/, ©list; print ©new_list\n" # prints: two three "

;

Think of grep

(

)

as a filter that allows

through only things that pass a

the case above, only those elements for which the regex m/'^t/

ments that grep

(

start )

with a t



are passed

by no means limited

is

of the standard

FAQ answers

through to the new

unique elements in an

element.

The grep

{

)

filter

ele-

:

array:

,

'



create a hash to

— those

argument. Here's one

my @array = qw/one two two three four three two/; my %seen; my ©unique = grep {! $seen{$_}++} ©array; print "@unique\n";

Here we use the block form and

true

In

list.

to using a regex as a first

for extracting the

is

test.

_

count occurrences of each

here allows only those elements that

we have not

seen

already.

12.3

Sorting

Perl's built-in

sort

(

)

lists

function

is

versatile,

allowing you to supply a function (or

block) that performs the comparisons or simply to use the default comparison routine,

which

uses stringwise comparisons.

The

basic

form of the function

is

sort SUBNAME LIST sort BLOCK LIST sort LIST

SORTING LISTS

217

To

an array of strings, you can simply use the default sorting routine:

sort

©list = qw/one two three four five/; ©list = sort ©list; print "@list\n"; # prints: five four one three two

When

you want

how sorting

is

to provide a different sorting

accomplished.

Any sorting method must compare two

time to each other and determine or equal to the second element. care of figuring out

When

time. pairs

which

if

the

first

element

While sorting the

pairs

element

is

is

list,

the sort

you supply a comparison method, the sortO

-1 if the first

is

method using

elements

at a

(

)

function takes at

any given

fianction places these

Your comparison function

smaller, 0 if they are equal,

(remember the cmp and

larger

the default sort

element

$b.

to realize

larger than, smaller than,

of elements need to be compared

of elements into the localized variables $a and

must return

method you need

and

1

if

the

first

operators). For example, to duplicate

stringwise comparison,

you could use

©list = qw/one two three four five/; ©list = sort { $a cmp $b } ©list; print "©list\n";

And,

to

do

it

as a subroutine,

you could use

©list = qw/one two three four five/; ©list = sort stringwise ©list; print "©list\n"; sub stringwise $a cmp $b;

" ' '



"

{

}

To ©list

sort a

list

numerically,

you would use

=(3,4,2,9,1);

= sort { $a $b } ©list; print "@list\n"; # prints: 12

©list

To

3

4

9

reverse the sense of the sort order,

you simply swap the two

special sort

variables:

©list = (3, 4, 2, 9, 1) ©list = sort { $b $a } ©list; print "©list\n"; # prints: 9 4 ;

218

3

2

1

CHAPTER

12

WORKING WITH

LISTS

5

9

;

As you know, the keys often

want

to retrieve

;

in a hash are not stored in

them

in

some

any particular order, but you

sorted fashion:

Andrew => 35, Sue => my %fainily = foreach my $key (sort keys %family) print $key is $f amily{ $key} \n"

39,

(

Joseph => 14, Thomas =>

7

)

;

{

"

}

END this prints:

Andrew is 3 5 Joseph is 14 Sue is 3 9 Thomas is

'• 35, Sue => 39, Joseph => 14, Thomas => foreach my $key sort $f amily { $b} $f amily { $a} } keys %fainily) print "$key is $f amily $key} \n" {

(

..,.(.;.',

{

7

)

{

{

}

.,

END this prints Sue is 3 9 Andrew is 3 5 Joseph is 14 Thomas is 7

-

J •

:

r

-

.. ,

.

Sometimes, there

is

more than one

Consider a colon-separated data to sort

age

by

last

name, then by

file

first

field in the

of first names,

name

(if

the

last

data that you wish to sort.

last

names and

names

ages.

We

want

are equal), and, lastly,

by

(if all else is equal):

'

# /usr/bin/perl -w use strict; my Odata = ; my ©sorted = sort myway @data; print @sorted; !

sub myway { (split /:/,

$a) [1]

cmp (split /:/, I

(split /:/,

$a) [0]

(split /:/,

$a) [2]

$b)[l]

I

cmp I

;

(split /:/,

$b)[0]

(split /:/,

$b)[2]

i

}

DATA Sue Johnson :

:

3

Andrew Johnson Bill Jones 37 Bill Jones 3 6 Mike Hammer 45 :

:

:

:

:

:

:

3

:

SORTING LISTS

219

|

This makes use of a (or take a slice of) a

use of the

list

(logical |

list facility

just as

OR)

you can with an

array.

you can

The example above

entire function

OR'd

expressions that are

(cmp returns zero), the expression

on

The

operator.

discussed yet:

is

subscript

is false,

elements of the pair of lists (the

to numerically

the data into

split

and we do the next expression

first

names). Again,

compare the third element of the

if

compare

to

pair of lists (age).

The

opera|

of the

com-

first

comparisons are equal.

if all

not a very efficient way to perform such multiple

is

we go

they are equal,

tor short circuits (see chapter 5) so this subroutine returns the result

parison that does not return zero or returns zero

lists

pair of lists. If they are equal

|

This

makes

also

a single Perl statement

we

together. First,

and compare the second elements of that

at the colons

first

we have not

|

made up of three

the

that

comparisons.

field

For every pair of elements that need to be compared (which can be a large number of comparisons), we are performing between two and can greatly improve this

if

by each element of those

we

split

on the

six splits

anonymous

the data once into

We

data.

then sort

arrays,

arrays:

# /usr/bin/perl -w use strict; my ©data = ; my ©sorted; ©sorted = map { [$_, split /:/] } ©data; ©sorted = sort myway ©sorted; ©sorted = map { $_->[0] ) ©sorted; print ©sorted; !

sub myway { $a->[2] cmp $b->[2] $a->[l] cmp $b->[l] $a->[3] < = > $b->[3]

'



'

'"' '

. -

.

-

..

.

.

.

|

'

'

II

.

_

'

;

.

...

}

DATA Sue: Johnson: 39

Andrew Johnson Bill Jones 37 Bill Jones 36 Mike Hammer 45 :

:

:

:

:

...

:

35 ;

,,.

.

^



,

,

,

.

,

,

,,

In this example, we've

first

array for each line of data. This first

element, and the

next three array.

.

.

.;

:

:

the

> '

fields.

The

list

used the map

anonymous

of fields (the

resulting

list

(

)

function to create an

array contains the actual line of data as

result

of splitting the

of anonymous arrays

We then sort this array of anonymous arrays

is

in the

line

of data)

myway

(

)

sub, dereferenc-

we want to compare. Finally, we extract just ment of each anonymous array (the data line itself) in another map {

©sorted array contains

just the lines

of data

as the

assigned to the @sorted

ing each particular field

that the final

anonymous

now

)

the

first ele-

function so

in the sorted order

we wanted.

220

CHAPTER

12

WORKING WITH

IISTS

Chaining functions

12.4

.

i

1

One

of the most famous Perl idioms

after

Randal Schwartz. The Schwartzian Transform implements the sorting idea

called the "Schwartzian Transform,"

is

named

above in a more compact fashion by chaining the three map, sort, map operations

we

together into a single statement. Before

what chaining

sider

Chaining

is

tackle that particular idiom,

con-

let's

in the first place.

is

simply using the

results

of one operation or ftinction

work through

into another operation or function. Try to

as the

input

the following simple

example:

/usr/bin/perl -w use strict; my ©unique; my %seen; while! { #

'

!

';

'/.J.;:,-

^

v

,

;



)

chomp;

push ©unique, grep

$seen{ $_} ++

!

,

split

'

';

}

print "@unique\n"; DATA this is one line of data this is another line of data this is the last line of data

You might have guessed words three

'

that this

program

creates

an array of

how

works?

We

see

chained statement more understandable for a

push

(

grep

©unique,

Looked into

and a

resulting

it

(

!

at this way, the

list

push

(

split(

,

ditional expression

(

and a

function.

)

list.

from the split

()

The

'

the unique

help to

make

the

reading:

')

)

);

function has two arguments: the array to push

)

of things to push into that

from the grep

resulting

$seen {$_}++

first

all

have chained together

may

functions in a single statement. Parentheses

list

,

.

But do you

in the given data.

^

.

array.

The grep

list

(

)

The second argument also has

the

list

two arguments: the con-

argument to the grep

fianction

is

(

)

function

is

the

(using the special case of splitting

list

on

whitespace).

Most

often

you

will

not see chained functions with

all

those parentheses,

let's

look at the original version again:

push ©unique, grep

!

$seen{ $_} ++

CHAINING FUNCTIONS

,

split

'

';

221

9 5

The way one function

5

;

to understand chained functions at a time.

Here we could read grep

hst of words, fiker that hst through the

onto the @unique

Now

let's

is

to read

this in (

Enghsh

right to left

Une into a

as "spht the

and push the resuking

function,

)

them from

hst

array."

consider the Schwartzian Transform and the

method

sorting

last

given above:

#

!

/usr/bin/perl -w

use strict

my ©data

'

;

'



:

j,

;

=

my ©sorted

=

map

$_->[0]

{

}

sort myway

map

split /:/]

[$_,

{

Odata;

}

print ©sorted; sub myway

{

cmp $b->[2]

$a->[2]

$a->

[1]

cmp $b->

[1]

$a->

[3]

$b->

[3]

...

'

'

||

I

I

'

_

;

~ '

'

DATA

;

Sue Johnson 3 Andrew Johnson Bill Jones 37 Bill Jones 3 6 Mike Hammer 4 :

:

:

:

:

:

:

:

3 . ,.

.

,

.

^

:

Notice series

:

is

how we

have reversed the order of the map

evaluated from right to

anonymous

array,

then the resulting

myway, and the sorted returns the

first

left. First,

list is

(

calls

)

because the chained

each element of @data

list

is

turned into an

of these anonymous arrays

passed to the leftmost map

element of each anonymous

array.

We

(

)

is

sorted

function, which simply

could have even done the

whole sorting chain using a block instead of a named subroutine: my ©sorted

=

map

$_->[0]

{

sort

}

{

$a->

[2]

cmp $b->

[2]

cmp $b-> [1] $a->[3] $b->[3] $a->

|

|

I

I

[1]

}

map

222

{

[$_,

split /:/]

}

©data;

CHAPTER

12

WORKING WITH

LISTS

;

12.5

Reverse revisited

The reverse

(

function

)

reverse a scalar,

only used

is

context sensitive. While we've only used

you should be aware

that

it

takes a

as its

list

argument.

We

have

like this:

it

$string = 'halb'; $string = reverse $string; print $string; # prints: blah Here, the function expect because catenates

thus far to

it

all

we

taking $string as a one element

is

called

^

list.

It

does what

we

in a scalar context. In scalar context, this function con-

it

of its arguments into a single string and reverses that

string.

Consider

the following: '

©array = 'blah' Oarray = reverse @array; print (aarray\n" # prints: blah (

"

In a

;

context, such as this example, the reverse

list

order of the this,



;

)

list,

With

leaving each element unchanged.

the resulting

list is

a

function reverses the

)

(

one element

list

such

as

unchanged:

©array = one two three ©array = reverse ©array; print "©arrayXn"; # prints: three two one $string = reverse ©array; print "$string\n"; # prints: enoowteerht {

'

,

'

'

'

,

'

'

)

But what do you suppose the following

produce?

will

$string = 'halb'; print reverse $string;

The print 0 reverse 0

is

function takes a

performed in a

list

list

context,



printed in reverse order. This explains the the conunifyO subroutine in chapter 10. to use the

scalar

{

)

provides a

i.e.,

and the

bug

To

I

single

list

element

context list



so the

($string)

is

mentioned previously regarding

get the intended meaning,

function to explicitly put the reverse

(

)

you need

into scalar context:

$string = 'halb'; print scalar reverse $string;

The reverse

(

)

function can be used to print a

file

in reverse order (by lines):

print reverse ;

REVERSE REVISITED

223

;

It

can also be used to invert a hash (so long

unique) to create a

new hash

as the values

of the hash are

that has the values of the former hash as

its

keys and

the keys of the former as the values:

Jan => 1, Feb => 2, Mar => 3, Apr => 4, May => 5, Jun => 6, Jul => 7, Aug => 8, Sep => 9, Oct => 10, Nov => 11, Dec => 12 %num2mon = reverse %inon2nuin; $num2mon{ 3 \n" print " $mon2nuin{Mar } # prints: 3, Mar %mon2nuin =

(

)

}

,

12.6 1

2

224

;

Exercises when

Write a function

that,

by the hash

Write the same thing in one

keys.

given a hash, returns the

Modify the one-line version above

of hash values sorted

line using a

to also filter out

CHAPTER

list

12

map

any value

(

)

less

function.

than 25.

WORKING WITH

LISTS

CHAPTER

13

More I/O commands

13.1

Running

13.2

Reading and writing from/to external commands

13.3

Working with

13.4

Filetest operators

13.5

faqgrep revisited

13.6

Exercises

external

directories

229

230

233

225

228

226 227

)

In chapter 6,

we

buih-in open

(

)

covered the basics of reading from and writing to a function. This

You can

write data.

not the only way for your programs to read or

is

output of external programs, run external pro-

also capture the

grams (which might write data to their contents.

using the

file

files

or elsewhere), and open directories and read

This chapter takes us on a brief foray through these alternate I/O

mechanisms.

Running external commands

13. 1

Running an you are

external

program may not seem

an I/O operation, especially

like

program data or receiving

are not sending that

output. Nevertheless, you

its

communicating with the world outside your program, causing programs

run and, perhaps, data to be read from a

two mechanisms

Perl has

you

and exec

the system 0

(

)

files

or the console.

running external commands or programs when

functions.

difference between the

program

or printed to

to be

immediately interested in capturing the output of those commands:

are not

The

for

file

if

two

to replace the currently

is

Each of these

that the exec

running

Perl

(

will )

run an external program.

function causes the external

program while the system () func-

tion spawns another shell process to run the given

program and waits

for

it

to

complete before continuing. system (' Is ') # runs the 'Is' command and waits for it to complete print "Still runningXn"; # will print when 'Is' finishes ;

exec Is # substitutes 'Is' command for current running script print "Still runningXn"; # will only print if 'Is' failed (

'

We

'

)

;

can detect

failure in a

manner

similar to our tests of the

open

with one significant difference: the system () returns 0 for success

and an

error code for failure.

To

test for failure,

you need

(

)

call,

but

(a false value)

to use the logical &&

operator with the die statement:

system! 'Is') So,

die "hmm,

what happens

'Is'

command failedXn";

to the output

from the command? In both

cases, the stan-

dard filehandles are inherited from the Perl program, and the output goes to STDOUT. In the second example above, as indicated in the comments, the print

statement will not be executed cess has call

if

the exec

(

)

was

successful. Because the

new

pro-

completely replaced the current Perl program, nothing after the exec

(

can be executed.

Why would

you want

to

run external programs using these functions? Well,

even though you can write routines in Perl to do pretty

226

much

anything an external

CHAPTER

13

MORE I/O

;

;

command

On

could do,

it is

Unix systems, there

used for such things

sometimes simpler to

of command

are a variety

file

a large

Perhaps your program

files.

amount of data

and remove duplicate elements from before

like to sort

# code that writes lots of raw data to $tinp_f ile exec ("sort $tinp_file unique > $ f inal_f ile " )

|

commands.

line utilities exist that are often

and finding

as sorting, searching,

has just collected and printed to a

just use the external

that

you would now

exiting the program: ...

#

;

Or, similarly, say you wanted to do the same thing to an existing reading

in

it,

which

case

you would use the system {)

could continue after running the given detail regarding these fianctions

-f

command. The

and how they process

fiinction so

perldocs offer a

their

arguments

system and perldoc -f exec). More often you want

output of external commands.

Let's

look

at

some

file

before

your program little

(see

more

perldoc

to capture or read the

aspects of these last operations.

Reading and writing from/to external

13.2

commands Perl also provides

mechanisms

to

run external commands, to collect or read their

output, or to send data from your Perl program into the standard input of an external

command.

Backtics are the simplest

gram.

The

syntax

is

mechanism

for collecting data

to simply enclose the external

from an external pro-

command

in reverse (or

open-

ing) single quotation marks, also referred to as backtics

$listing ©listing

=

$listing ©listing

=

=

=

'ls\'Is';

#

#

'dir'; 'dir';

# #

$listing has output as single string ©listing has output as array of lines same as above on DOS same as above on DOS

In scalar context, backtics collect as a single string. In list context, the

meaning of "line" Alternately,

open

(

)

is

and return the output of the given command

output

returned as a

is

dependent on the current value of the

you can open

a

file

function by giving an external

list

of lines (where the

special $/ variable).

handle onto an external process using the

command

open(DIR, 'Is I') II die "can't fork: while 0 { chomp print "$_" if -s $_ 5000;

$

followed by a pipe as an argument:

!

"

}

close DIR

II

die "failed $!";

READING AND WRITING FROM/TO EXTERNAL COMMANDS

227

;

Here we open the Is process and pipe one

line at a

it

;

into our

file

;

handle so we can read

time and print only the names of files that are over 5000 bytes in

(see section 13.4).

Note, catching open pipe failures

is

not straightforward



size

please

see perlfaqS for a solution to this problem.

We can also open up a whole pipeline of processes, the exec

(

such

as the

one we used

in

example above:

)

open(SORTED, "sort $tmp_file while { print; (

|

uniq

|")

die "can't fork:

||

$

!

"

)

}

close SORTED

||

die "problem with SORTED:

!

$

"

Here we have opened the sort program on the put to the uniq our

utility to

handle so

file

we can

remove duplicate read

by

line

it

file

and,

lines,

$tmp_f lie, piped

finally,

its

out-

piped that output to

line.

This piping mechanism works either way but only one way

at a time.

We can

write data to an external process as well by using a leading pipe symbol:

open{OUT, "I sort

while { print OUT; (

|

uniq

>

$f inal_f ile"

)

die "can't fork:

|

|

$

!

"

)

-

'

-

}

die "problem with OUT: $!";

close OUT I

I

In this case, utility,

tems.

and

The

133

we

our data to the sort command, then to the uniq

are sending

finally redirecting

utilities available

to a

it

file.

These

on your system may

Working with

are standard utilities

on Unix

sys-

differ.

directories

Although we gave a few examples above of using external commands (is or dir) to obtain directory listings, Perl also has a built-in set

reading,

and closing

directories (directory handles)

opendir(DIR, /home/ajohnson my ©listing = readdir DIR; closedir DIR; '

And,

if

use the grep

you wanted (

)

'

)

||

to print out

function to

filter

of functions for opening,

and reading

their contents.

die "can't: $!";

all

the

files

with a

extension,

.txt

the output of the readdir

(

)

you could

function:

opendir(DIR, '/home/ajohnson') || die "can't: $!"; my Olisting = grep /\.txt$/, readdir DIR; closedir DIR;

228

CHAPTER

13

MORE

I/O

;

.

One readdir selves.

;

thing to remember though

(

)

"

;

that the directory entries

is

do not contain leading path information,

Thus, you couldn't try to

just the actual entries

open one of the

just

produced by

them-

because chances are

files

it

doesn't exist in your current directory.

my $dir = /home/aj ohnson/public_html opendir(DIR, $dir) || die "can't: $ my ©listing = grep /\.html$/, readdir DIR; closedir DIR; foreach my $file ©listing die "can't: $ # open(FILE, $file) || # must use full pathname to the file open(FILE, "$dir/$f ile" die "can't: || # more stuff '

'

!

(

)

"



'

,.

,

'

{

!

won't work

#

;

"

.

$

)

.

i.-

^

;

!

;

,

,

" ;

,

.

}

Of course, how do we a

file.

really

might be a directory

It

know

(or a

if

the $file in the example above

FIFO or

a socket).

The

filetest

is

really

operators are

often used in conjunction with reading directories to determine what kind of entity a particular directory entry really

13.4

is.

_

Filetest operators

There are

several filetest operators that can

particular directory entry. Table 13.1

used such

tests.

(See

Most of these

test is

on page 230 shows many of

perldoc perlfunc

for the

complete

list

of file

the

commonly

test operators.)

return a simple true or false value and are often used in condi-

tional expressions or

then the

be used to find out information about a

grep

{

expressions. If a filename or

)

file

handle

is

not given,

performed against the filename contained in the default variable

($_):

,

$dir = /home/aj ohnson/bin opendir(DIR, $dir) die "can't: $ " my ©listing = grep -f "$dir/$_", readdir DIR; closedir DIR; '

'

!

|

|

The

-M, -A,

and -c

tests

return an age in days (fractional) relative to the start

of the current program. Therefore, to started,

you could

test

if

As a simple example, tories

(

entries in the

readdir

-m 'filename'
$in-> $max]

-.v-V

'

,

'

"

[

}

($in->[$i], }

$in->[$max]) =

(

$

in-> $max] [

,

$in->[$i]);

'

}

Another simple sorting routine

ment of the

array

and consider

element in the array and insert first

two array elements.

SORTING

We

it it

is

a sorted

into

leave

the insertion sort. Here

it

its

list

of length

1

.

We

correct position in the

where

it is

if it is larger

we

take the

first ele-

then take the second list

comprised of the

than element

1

,

or

we

283

;

put

it

into position

third element position.

14 13 4 13 4 13 4 12 3 3

2

5

2

5

2

5

2

4

5

and move element

by finding

Here

5

1

;

its

1

inserted, inserted, inserted, inserted,

1

4 5 2

position. Similarly,

we add

the

proper position and moving any higher elements up one

are the intermediary results

[$i+l] = $in->[$i];

4

5

>

$val){

(3, 1, 4, 5, 2):

sorted sorted sorted sorted

]

;

$i = $i

-

1;

}

$in->[$i +1]

=

$val;

} }

Our

final

simple sorting algorithm

is

called the bubble sort because larger ele-

ments "bubble up"

to the top (or end) of the array. In this routine,

each element of the

array,

if

the

first

moved

We

element

to the

is

comparing

it

list

and the

rest

continue making such passes until

the array

14 13 4 13 4 13 4 13 4 3

5

is

14 13 4 13 2 12 3

284

Let's

doing

of the array

we

don't

consider one pass

is

so, the largest

slightly closer to

make any swaps

on our

array of (3,

element

is

being sorted.

that indicate that 1, 4, 5, 2):

2

5

2

5

2

5

2

2

5

Now, 3

now sorted.

step through

with the next element and swapping the two

larger than the second. In

end of the

we

5

2

2

5

4

5

4

5

[0];

288

to

the heap while the current element

greater,

and D)

the heap, restoring the heap property

index 0 in the heap

because any nodes further

which child node

C

by one;

needs to be passed two parameters: a reference to

same routine at

is

down through

2),

with,

and the current element

will use this

size

what the implementation of the pushdown

at

To begin

rather than Just assuming

we

size

looping (size

node and decreasing the heap

last

the

for the

we

array.

children nodes and set the

heap until eventually our heap

shown

is

last ele-

no longer a heap because the

elements, and our swap completes the sorting of the array.

pushdown 0 routine

this

in the array.

still

on our new smaller six-element heap.

this smaller

To turn

of the heap-array. This routine uses a

1

current element to the node with which

store

no longer con-

is

not larger than both of its children. To

is

named pushdown

a routine

we

with the

1

decrease the heap size from 7 to 6 so the last element in the array it is

thing to

first

end of the

in the heap, thus putting the largest element at the

sidered as part of the heap, even though

in

an example, we store 7

as

in element 0 of the array because there are seven elements in the heap.

heap-array into a sorted array,

is

build a heap from an input array,

of the heap there. Using the heap in figure 17.3

the size

The

partially sorted.

how we might

how we might produce

that the

is

consider

it is

CHAPTER

.

17

,

ALGORITHMS AND DATA STRUCTURING

{

;

while($i $child] $child++

$heap-> $child+l


[$i] >= $heap-> $child] $heap-> $child] ($heap-> [$i] = $heap-> $child] $heap-> $i $i = $child;

if

(

[

{

)

[

last

}

.

' ]

[

,

,

-.

)

[

,

;

)

}

7

6

-

12

8

10

8

1

2

Figure 17.4

10

6

7

4

9

6

6

4

5

6

7

0

7

4

9

12

6

4

5

6

7

The action

of pushing

6

10

down

8

how we

unshiftO

build

it

into a heap

its size

from the bottom up.

7, 10, 8, 12, 6). Figure 17.5

after unshifting the size

shows

onto the

4

9

12

4

5

6

7

9

7

3

4

4

6

in

12

5

5

7

the heap

the heap-array once

will build a heap-array

take our ordinary array,

7

an element

Now that we know how we will sort figure out

10

8

12

from an ordinary into

Let's



its first

it's

built,

array.

we need

To do

0),

consider an array containing

this array in

we

this,

element (element

to

and

(4, 9,

binary tree form and in array form

array.

Obviously, this binary tree does not satisfy the heap property. But you'll note that if

we

call

causes the 7

pushdown

and the 12

(

)

on node

3, the last

small heap. Similarly, calling pushdown

swapped turning 1,

{

)

3 along with

on node 2

it

children

its

causes the 9

that subtree into a proper heap. Finally, calling

the root node, works as

HEAP SORT

Node

to be swapped.

parent node in the structure, this is

now

and the 10

a

to be

pushdown on node

did in our previous example: the 4 in node

1

is

289

{

Array Representation

4 I

shown

Ordinary array

swapped with the 12 now swapped with the 7

we can

Finally,

in

node

in

I

57

heap and array form

in

node 6;

10| 8 |12| 6

I

I

0

Figure 17.5

7

9

12345

7

node 3

3;

is

tested; the

and we have created

4 we

loop to swap the

put in node 3

heap from our original

a

is

array.

construct a heap sort algorithm that sorts an array in place by,

using one loop to build the heap from the bottom up.

first,

just

and

first

Then we

use a second

nodes of the heap, to reduce the heap

last

size,

and

to

restore the

heap property on the new smaller heap that remains. The algorithm,

which

the pushdown

calls

(

shown above,

routine

)

sub heap_sort { my $heap = \@_; unshift @$heap, scalar @_; for (my $i = int $heap-> 0 pushdown $heap, $i) [

(

]

$i >= 1;

2);

/

is

$i--)

{

;

(

}

for{my $i = $heap->[0]; $i >= ($heap->[l], $heap->[$i]) $heap->[0] --; pushdown $heap, 1)

$i--)

2; =

(

$heap-> $i] [

-

;

(

,

-

$heap->[l]); -

.

.

}

shift @$heap;

,,

,

,

.

,

'

'

'

}

'

Earlier

I

said that this sorting routine has a

N

log(base2) N). Well, log(base 2) it

runs in

N

*

h time, where h

is

roughly the height of the heap, so

,000,000, h

is

only 20. Thus,

it is

more underlying work than other tainly not

doing

it

as often:

of benchmarks for various

1000

sizes

*

we can

*

say

the height of the heap. You'll notice that the

is

height of the heap grows slowly relative to N. For 1

running time based on (N

N

= 1000,

h

is

10,

and

easy to see that even if the heap sort

routines, as

1000

is

far

N grows

large, the

more than 1000

for is

heap sort

* 10.

N=

doing is

cer-

A quick series

of arrays of random numbers produced the follow-

ing rough timings:

290

CHA P TER

17

ALGORI THMS A ND DA TA

S TR UC TURING

N = Bubble Took: Insert Took: Select Took: Heap Took:

5

N = Bubble Took: Insert Took: Select Took: Heap Took:

5

N = Bubble Took Insert Took Select Took Heap Took:

0.05 seconds

seconds 0.01 seconds 0.03 seconds 0

.

02

0.18 0.0 6 0.0 6 0.05

seconds seconds seconds seconds

N = 000 19.27 seconds Bubble Took: Insert Took: 4.84 seconds Select Took: 5.2 6 seconds Heap Took: 0.7 6 seconds

0

seconds seconds seconds seconds

4.79 1.21 1.31 0.34

00

N 5000 Bubble Took (way too long) Insert Took 122.11 seconds Select Took 132.48 seconds Heap Took:

4.43 seconds

Of course, The

the goal here was not to create the fastest sort routine in Perl code.

built-in sort

shown

(

function can sort a good deal faster than any of the routines

)

here. (For example, the built-in sort routine sorts a

list

of

size

5000

in

about 0.29 seconds on the same machine that produced the above timings.)

No, the goal of

chapter was to use sorting algorithms as a context to

this

introduce you to alternate ways of structuring your data that can lead to improved algorithms. ter

in

heaps

We

have also touched upon some basic terminology you will encoun-

further studies of abstract data structures



as well as

—nodes,

trees,

binary

trees,

an informal introduction to comparing the order of growth of

running times in different algorithms.

We

will

encounter some of these concepts

again in the following chapters.

17.4 1

Exercises

Rewrite the pushdown

(

)

routine to use recursion rather than a while loop to

push an element down the heap. Will the heap sort or 2

3

7\n alternate

way

sorted

built

its

work

in a

make

little

help or hurt the running time of

difference?

to build a

list.

this

heap

is

by insertion



recall

how

insertion sort

Write a routine that builds a heap by insertion. This will

bottom up

fashion.

Can you

think of other problems besides sorting where a heap or heap-like

structure

might be

EXERCISES

useful?

291

CHAPTER

18

Object-oriented programming

and abstract data structures OOP?

18.1

What

18.2

OOP in Perl

18.3

Abstract data structures

18.4

Stacks, queues,

18.5

Exercises

is

293 295 301

and linked

314

292

lists

301

— Perhaps you've heard of object-oriented programming (OOP) sometime during the past decade.

You might have heard

haps that

it

is

indeed have

its

that

bunch of horse hooey, or

just a

it is

the greatest productivity advancement since caffeine.

and

share of vocal proponents

detractors. Fortunately,

per-

OOP

does

we can hap-

and make up our own minds.

pily ignore the extremists

This chapter will begin with a brief overview of OOP in general, to give you a feel for

the subject

will settle right

down

OOP

features to begin building classes

our programs. Then

in

we'll use classes

and

We and

objects to cre-

a few well-known abstract data structures.

you

If

are already familiar with the concepts

gramming, you may want

on

few new fancy terms to your vocabulary.

to introduce a

into using Perl's

and using them

objects ate

and

Peri

OOP

(straight to the

references, so, if

POOP

as

were). Perl's

it

you had any trouble understanding

review chapter 8 and the perlref pod-page.

5.00503, there

is

and jump

to skip the next section

makes use of

you might want

to

In recent releases of Perl, version

Entire books could be written about object-oriented

might use

effectively.

it

tutorial

on

Perl's references.

many such books have been written. book you are now reading is also publishing a book

on object-oriented programming using this

book.

rather abbreviated

I tell

programming and how one

In fact, a great

Indeed, the publisher of the

started,

capability

What is OOP?

18.1

time as

OOP

pro-

straight to section 18.2

references,

which provides a short

also a perlreftut,

OOP

and terminology of

you

this

Perl that should

come out around

only so you understand that

and informal introduction

to

OOP

here

I

the

same

can only present a

— enough

to get

you

but not the whole messy enchilada.

To begin

with,

let's

take a look at

some

the terminology that

is

commonly

used in OOP-speak: abstraction, encapsulation, inheritance, and polymorphism. Big words, but not difficult to grasp. We've already touched upon the

when we

discussed functions and subroutines in chapter 7.

turn as they

come up



Typically

we

We will discuss

each in

at least so far in this call

book

—we approach

a

programming problem

procedural or one of algorithmic decomposition

identifying the tasks that need to be accomplished to solve the problem. If

is,

are given a

problem, we tend to focus on the verbs in the problem statement

in other words, the actions that

In

some

cases, as

look more closely before

two back

again in the following discussions.

from a perspective we can that

first

we

need

to be

performed on the

data.

with the heap sort routine in the previous chapter,

at the data to consider

how

it

we may

might be organized or structured

decide on a processing approach.

WHAT IS OOP?

293

— )

In an

OOP

approach to a problem, dividing and structuring the data space

a central concern.

We want

to identify the things

is

with which we will be working.

we tend to focus on the nouns in the problem statement and how we might model them using different data structures and functions.

In this case,

back to our childhood and use a few well-known sentences

Let's take a trip

from

book

children's

OO

to illustrate the

who don't remember, you should know that Spot is a

approach.

who

run." For those

or

books,

dog).

program

to

do animation

is

Now let's

say that

we want

to write a

move

start

sketching out a run

(

a picture of Spot around the screen while animating

would need

to

know

quite a lot about

how

represented graphically. Consider that the next page of our story says, "See

Dick run" (Dick two

never read this particular series of

we might immediately

Spot's legs. Obviously, this subroutine

Spot

begin with "See Spot

for an online version of this book.

In a traditional approach,

procedure that would

Let's

legs to

is

a

human). Our run

animate while Spot has

spot_run() and dick_run() or run (spot) messier

still if



(

)

won't

four.

Now we

has only

need either two subroutines

or one big run() subroutine called as run (dick)

that contains the code to

we had

work with Dick because he

do

either animation.

Things would get

other creatures that were going to run.

OOP

we might first identify the nouns. Spot and Dick, and their behaviors. In figure 18.1, we show one way we might begin to think about our data types. Notice that in this diagram we have also Looking

at this

from an

perspective,

added category names, "human" and "dog."

Human

Dog

name: Dick

name: Spot

number

number

of legs: 2

runO

barkO runO

jumpO

jumpO

taIkO

Figure 18.1

If we

of legs: 4

Modeling Dick and Spot

have modules that define such data types, then in our main book anima-

we need only use these modules to create a Spot object and a Dick object. Then we can ask each object to run on the appropriate pages of the book. What we have done is created abstractions for our data humans and dogs. We call these abstractions classes. When we use an abstraction to create a specific instance of a thing for example, creating an instance of a Dog, called Spot we tion program,





call that

We have also used encapsulation here. Each class encapbehaviors of the objects defines. We usually refer to the

instance an object.

sulates the data

294



and the

CHAPTER

it

18

OOP AND ABSTRACT DA TA STRUCTURES

data as the attributes or properties of the object, and the behaviors as methods. latter are really just

We

The

ordinary functions.

could have created a more general

tained the "name" and

"number of legs"

could have defined the other two

approach would use inheritance



that

is,

and a run

attributes

classes

Mammals

class called

method. Then we

)

Mammals. This

types of

special

as

(

that just con-

each special subtype inherits the proper-

of its parent class. This approach would also involve polymorphism because each

ties

subtype's

run

)

(

method would have

Polymorphism means that children

to be redefined for that particular subtype.

classes

can

alter their properties

and methods so

they are not identical to their parents.

You

can't really appreciate

fiirther ado, let's get

creating a class,

This

until

you

class defines

methods)

it,

so without

back to programming.

this

you

how

are defining a

the data

is

whole new data type

stored,

data type can perform.

how it

using array indices.

You

also

is

accessed,

Think of the

to assign data to particular places in the array later

actually start doing

OOP in Perl

18.2 By

OOP

know several

to use in

and what functions

array data type

and how

hash and access those values

tions defined to

work on

know how important and

The

18.2.1

later.

(i.e.,

—you know how

to access their values again

functions defined to

know how You also know how

shift, push, pop; unshift, splice. Similarly, you pairs to a

your programs.

arrays:

to assign key/value to use several func-

hashes: exists, each, keys, values. useful these data types can be for

work on

And you

already

many purposes.

basics

how do we create a class in Perl? We begin by using the package declaration to define a new namespace, the same way we did when building an ordinary module. So,

This package/ module will contain the definition of our in a

we

file

of the same name

as the

class.

We will store all

this

package but with a .pm extension added, just

as

did for ordinary modules.

package Student;

Here we have

started the definition of a class

vide us with a Student data type.

when making The

first

classes

because

OOP

We

do not have

named to

Student. This will pro-

worry about any exporting

modules should not export anything.

thing this class has to define

is

a

method of

creating a

new Student

we can use in our programs. Such a method is called a constructor method. We can name this method anything we want, but most people prefer to call it new This method will return a reference to the underlying data structure it object that

(

)

.

OOP IN PERL

295

creates.

bless

(

Not just any )

been specially tagged by

reference, but a reference that has

function. This

^/fi-^/w^

mechanism

sub new { my $class = shift; my $self = { } = undef; $self -> {name $self-> { courses } = bless $self, $class; return $self;

OOP capability.

of Perl's

,.

.

.

at the heart

is

Perl's

;

}

[

]

,.

;

'

}

:

When

a constructor

familiar with

the arrow

is

is

invoked using an arrow syntax that

— {Student->new(

'

argl

'

'arg2'))

,

actually passed to the function as the

arguments back one position in the argument have shifted off the vided a couple of this

first

initial

values for a name

hash reference, and returned

it.

By

the package

to a

particular hash reference belongs to the package Student object. If

you

argument

leave off the second

we

before

we

hash, pro-

in this hash, blessed



is

it

bless

to the

default to the current package/class, but using the second for other benefits, as

name

new anonymous

Perl will always

it,

soon be very

argument, moving the given

and course key

blessing

you'll

In the constructor above,

list.

name, created a reference

class



,

(

know that this now a Student

function,

)

argument

will

it

leaves

room

will see shortly.

We said that a class also defines the behaviors of the object in question, so let's give our Student object the ability to

The

tell

us

who

it is

and what course

it is

taking.

following two functions go in the Student package:

sub name { my $self = shift; $self -> {name} = shift if @_; return $self->{naine} "

'-

'



.



'.i

^ ,

;

'

'

}

'

' '

sub courses { my $self = shift; = @_ if @_; (a{$self->{courses} return @ { $sel f-> courses }} }

{

-

;

,

}

In Perl,

when you

call

an

object's

method, using the arrow syntax shown

above for the constructor, the object reference first

argument

argument into

to the function. So, in the a variable that

wanted. That variable

296

now

we

usually

itself is

automatically passed as the

two functions above, we

name $self but could

holds the reference to the hash.

CHAPTER

18

shift off the first call

We

anything

we

can access any

OOP AND ABSTRACT DA TA STRUCTURES

;

;

;

value in this hash reference in the usual ways. Above, fields if the

functions are passed arguments, or

ments were passed. Some people prefer

we

we

set the values for these

just return the values if no argu-

and

to create separate functions for setting

retrieving object attributes. If we

have the above package saved in a

statement of just 1

final

chapter 16,

we can

to return a true

;

use this

new

named Student.pm and included a value as we did with our modules in file

data type in a program:

# /usr/bin/perl -w use strict use Student; my $student = Student->new $student->naine Bill Jones'); $student->courses Math 'English'); print $student->naine "\n"; print join(' ', $student->courses !

{

)

"A :

;

'

{

{

'

'

(

)

,

{

We because

)

)

could have also accessed the

we know

#

,

that a student

is

just a

,

"\n";

name

#

prints: Bill Jones prints: Math English

or courses of the student directly

hash reference.

We

$student->{naine} to get the student's name, but this

WTien you use an tions.

object,

you should

access

it

could have said print

is

only through

not a good practice. its

documented func-

This way, the underlying structure of the object can be changed in the future

without affecting your program. Perhaps in the future we Student its

•'

will decide to

an array reference rather than a hash reference.

class to use

change the

We will change

methods accordingly:

package Student; sub new my $ class = shift; my $self = $self->[0] = undef; $self->[l] = []; bless $self, $class; return $self; {

[

]

,

' .

^

}

sub name { my $self = shift; $self->[0] = shift if @_; return $self->[0]; }

sub course { my $self

= shift; (a{$self-> [1] } = ©_ if

return @{$self->

[1]

(a_;

}

}

1;

OOP IN PERL

297

;

;

,

)

)

our programs had been accessing the name and courses of the student

If

by

objects

directly accessing the

hash keys, the programs would no longer work.

However, by only using the documented accessor functions, our programs can continue to work correctly regardless of how

we change

the underlying data struc-

ture in the class:

print $student-> {name} print $student->naine {

)

,

"\n";

#

,

"\n";

#

no longer works still works as advertised

Notice one additional thing about using objects that can be demontrated

with a simple script using our original, hash-based Student

class above:

"

/usr/bin/perl -w use strict; use Student; my $student = Student->new $student->name Bill Jones'); $student->courses Math English my $h_ref = {name => 'Bill Jones', course => 'English'}; #

.

!

.

(

,

)

(

'

(

'

'

print print print print

ence

ref

(

'

'

"$student\n" ref $student "\n"; "$h_ref\n"; ref($h_ref), "\n";

#

;

{

)

#

,

#

#

)

Student=HASH 0x80c44cc Student HASH 0x8 OdlbbO HASH

prints: prints: prints: prints:

(

(

we know the underlying data structure of a Student is a hash reference, we also know that it is not just an ordinary hash referit's a hash reference that also knows what class (package) it belongs. The

You can class

,

see that although

— )

function returns the type of reference for an ordinary reference as well as

the class-name of a blessed reference to create

more general constructor

an object).

(i.e.,

We

can use this information

functions:

package Student; sub new { my $type = shift; my $class = ref ($type) $type; => undef, my $self = name courses => |

|

{

[ ]

'

>

return bless $self,

$class; -

} ,

#

other methods.

.

.

1;

We

can

call this

constructor in the normal fashion or as an object

method

using an existing object:

298

CHAPTER

18

OOP AND ABSTRACT DATA STRUCTURES

;;

my $studentl = Student->new $studentl->name Bill Jones'); my $student2 = $studentl->new (

)

;

-j..,n---r::^[i.:^,

.,

'

{

,

(

important to

It is

of $studentl.

It is

)

realize that

a completely

$student2 in the above example

new

is

(and empty) Student object of

its

$studentl object was only used to access the constructor method.

The

provide any additional parameters to the constructor.

now

tests

whether

existing object. If

ref

(

)

was called using the

was

it

from an

object,

name

or

constructor

if it

method

was called from an

name by

gets the class

it

does not

It

using the

Inheritance

you have

a general class such as our Student class above,

specific classes directly

new

create entire also

called

class

own. The

function.

18.2.2 If

it

not a copy

want a

from

it

you can derive more

using inheritance. This means you do not have to

classes that duplicate parts

of existing

classes. Let's say that

This student

special type of Student to represent a part-time student.

can only be registered for a In Perl,

we implement

maximum

we

of three courses.

inheritance using a special array called the ©ISA array.

This array must be a package global variable, not a

lexical variable so, if

you

are

using the strict pragma inside your class modules, you will have to declare this variable using the

of the

class calls a

use vars pragma

method and

that

will also search start

any

The @ISA

When

©ISA array to

class definition,

array holds the

names

an object of your derived

exist in that object's

classes listed in those packages'

our new Student

from the parent

to inherit.

method does not

Perl searches the classes within the

To

(see below).

from which you want

classes

try to locate that

©ISA arrays

package, then

method.

Perl

as well.

we can simply

inherit everything

class:

package PT_Student; use strict; use Student; use vars '@ISA'; @ISA = qw(Student);

,

1;

We now Student

have a

class.

new

Because

classes listed in the

class called

to find

method

# /usr/bin/perl -w use strict; use PT_Student; my $pt_stud = PT_Student->new

that

is

exactly the

same

as

our

does not define any methods, Perl searches the

this class

©ISA

PT_Student

calls:

!

OOP IN PERL

(

)

299

$pt_stud->naine

'

(

John Smith');

$pt_stud->courses (qw/Math English Biology Chemistry/); $pt_stud->courses print join(' "\n"; '

(

,

You can class,

ance ent

and we



we

we want

parent;

course

didn't have to rewrite

But,

don't

,

class

behaves just the same as our Student

new

the code in this

all

PT_Student

want our PT_Student

by creating

this

class.

This

class inherits its functionality

a

new courses

from

inheritits

most three courses

par-

as its

in

its

function for the part time

)

(

is

be exactly the same

class to

to allow a part-time student to have at

We can do

list.

)

PT_Student

see that this

in other words, the

class.

)

student.

sub courses

my $self if

•'; '



-

{

=

(@_ > 3)

-



shift;

/

,

'



{

die "part time students can only have

courses. \n";

3

}else{

@{$self->{course}

}

@_ if @_;

=

"

\

'

}

return @{$self->{courses}

Now

you

,.

.

everything about part-time students

dents, except that if

'

.

;

}

' -

try to set the course

is

the

list

to

part-time student will die with an error message.

same

as that for regular stu-

more than

three courses, the

You might want

thing other than having the student die just for trying to take

more

haps you would only want to issue a warning message and return a

By

redefining the courses

(

)

function,

to

do some-

courses. Per-

false value.

we have used polymorphism

derived class has a slightly different shape, or functionality, than

its

parent

— our

class.

Figure 18.2 shows the relation between the Student and PT_Student classes.

A PT_Student IS A Student with a modified Student

courses

(

)

method.

PT_Student

name: ^ ISA

courses:

nameO coursesO

courses!)

Figure 18.2

300

Relation

between Student and PT_Student

CHAPTER

18

OOP AND ABSTRACT DATA STRUCTURES

Abstract data structures

18.3 In the

last

chapter and the previous sections, you've learned a

think about your data in a more abstract

way

(heaps, trees, students)

we

resenting your data as objects with behaviors. In this section,

Stacksy queuesy programming book

Virtually any

will

both are useful to

is

Even though

know so you

no exception. Stacks

their functionality

structure in

your everyday

life.

perhaps you took

computer programming

free.

is

much

the same as a stack in

Perhaps you have a stack of books lying next to your desk, and this

very book off the top of that stack.

from reading, you might place

When

book back on top of the

this

top. In real

life,

lifting several

you might

books

at a

ing your back or both. single items

also give a

Out

risk

stack

from only the top of the

a break

This describes

things from the

from the bottom, or

toppling your stack or injur-

A stack in the programming world limits you to

placing or

stack.

the operation of adding to a stack as pushing an element

and the removal operation

name

book out of the

time off the top, but you

We generally refer to stack,

try slipping a

you take

stack.

—you only add and remove

the fundamental property of a stack

First

is

Stacks

A stack data

onto a

continue our

can see examples of building

simple objects and better appreciate what Perl gives for

removing

rep-

that even mentions abstract data structures will

are elementary data structures.

largely built into Perl,

18.4.1

and about

to

and linked lists

almost always give examples of stacks and queues. This book

and queues

how

OOP techniques while exploring basic abstract data structures further.

practice with

18.4

about

little

to this ordering

as

popping an element from the

of placement and removal with a

(LIFO). Figure 18.3 depicts the basic stack and push(itenn)

its

stack.

We

stack: Last In,

operations.

popO

stack

Figure 18.3

Graphic representation of a

STACKS, QUEUES,

AND LINKED

stacl
now back to plain, now some bold and bold italic < em> text . And lastly, here is emphasized text containing < em> even more emphasized text, in which case you would probably want the word containing in the previous phrase to be in plain fonts again. :

:

:

you were parsing such

If

when you

hit a tag.

would you switch into nested states

pushing any new

text,

you would not want

What would you do when you

to?

You need

to be able to

state

you enter onto the

hit the

remember

and work your way back out

is

now your

a stack,

Whenever you

hit

is left

state

you move

you can keep

an end

tag,

you

on the top of the

current state.

You may suddenly wonder what already allow this kind of behavior?

many

What

tag?

earlier states as

can simply pop the current state off the stack. Whatever stack

ending

With

again.

stack.

to simply switch states

languages are nothing at

arrays in other languages

is

all

You

all like

this fuss

is

about. Don't

are right; they do.

But

Perl arrays. Usually, all

Perl's arrays

arrays in a great

you can do with

allocate a size for the array, then store

and

retrieve val-

ues using array indices only (no pushing, popping, splicing, or dicing). Perl's arrays are different

and convenient because they can grow or shrink on demand, and they

have the functionality of stacks built right in

as well as

and the functionality of

queues, as you will see shortly.

So do we need to bother with creating our it is

easy to

do and, by doing

so,

viding a size limit to the stack

we

we can add

if

desired

a

own

little

stacks in Perl?

Not

often, but

extra functionality, such as pro-

and automatically producing warnings

if

reach the bottom or top of the stack. Besides, creating stacks gives us a chance

to demonstrate the use of inheritance

302

CHAPTER

18

when we

create our

queue

object.

OOP AND ABSTRACT DATA STRUCTURES

To keep

things simple,

we

use

Perl's arrays to

implement our stack on the

but on the outside we just have a stack object that might have an optional

inside,

and provides only the following functions: push, pop, top, is_empty,

size limit

and is_full. The top function merely returns the top element out removing

it.

This

is

when you want

useful

to

compare the current

the previous state

without having to pop the previous

them, and push

back on again.

We

it

module

use another

function



module

as you'll see shortly.

To begin

ple constructor to return a reference to

()

line in

function

—from

as

inside our object's

our main program that called these

we start our package and create a siman anonymous array as the blessed object: with,

package Stack; use Carp; sub new my $tYpe = shift; my $class = ref($type) $type; my $max_size = shift; my $self = [$max_size]; return bless $self $class;

,

^

{

|

with

compare

allows us to issue warnings using the

or errors using the croak

methods. These warnings point to the methods,

state

state off the stack,

our stack object, the Carp module that comes

in

part of the Perl distribution. This

carpO

in the stack with-

,

-t

,

..

" . _

|

...

,

' •

,

_

>

We now have an anonymous array as an object with a maximum size attribute stored in its first position. We can construct our two test methods to test if the stack empty or full. We assume that, if the maximum size 0 (no size was given is

when always #

is

the object was created), fail

we want

a limitless stack so the full test should

in that case.

$stack->is_empty

(

)

;

returns true if stack is empty

sub is_empty { my $self = shift; return !$#$self; }

# $stack->is_full returns true if stack is full sub is_full { my $self = shift; return 0 unless $self->[0]; return {$#$self == $self->[0]); (

)

;

}

Wliat the heck which,

is

when used on an

$self, our object,

is

STACKS, QUEUES,

!$#$self? Well,

!

array, gives us the

a reference to

AND LINKED

an

array.

LISTS

is

just the logical not operator,

index of the

last

element of that

Thus, is_empty

{

)

$#,

array.

simply returns the

303

{

;

of the

logical negation

value),

!

{ {

;

;



index of the array

last

0 returns true. Similarly,

value and

!

With

if

the last index

is

is,

if

the last index

is

greater than 0, then

0

(a false

it is

a true

true return false.

these simple tests in place,

tions easily

that

by

first testing

we can now implement our remaining

func-

our stack for the appropriate condition, then using the

tures already built into Perl's arrays to

do the

rest.

The

of module looks

rest

fea-

like this:

pushes $item onto stack if stack not full # $stack->push $item) sub push { my $self = shift; my $itein = shift; if ($self->is_full carp "Stack is full:"; return ;

(

{

)

)

'

}

„,^,

,

. ,

push @$self, Sitem; }.

$stack->pop pops the top item from the stack if not empty sub pop { my $self = shift; ; if $self ->is_empty { carp "Stack is empty:"; return; #

(

)

;

'



{

{

)

)



-

}

,

return pop @$self; }

returns the value of the top element if not empty # $stack->top sub top { my $self = shift; if ($self->is_empty() carp "Stack is empty:"; return; (

)

;

)

}

^



return $self-> [$#$self

]

-

;

-;

.'

-• ,

'

}

"

'

1;

'

END

Now we to test

its

save this in a

file

named

Stack.pm,

and we write a simple

little

script

functionality:

#! /usr /bin/perl -w use strict use Stack; my $st = Stack->new 4 #test push to overflow .



-•

;

{

(

for(3, 5,2,

,

,

,-.

)

)

9, 11)

print "pushing: $_\n" if $st->push $_) (

}

304

CHAPTER

18

OOP AND ABSTRACT DATA STRUCTURES

;

:

;

if $st->pop print "popped: 9\n" print "pushed: 42\n" if $st->push 42 \n" print 'top is: ', $st->top (

)

(

"

(

)

" ,

\n"

-i

;

-

.

,

:

.5)

,

'

""

;

,

#test pop to underflow

ford.

?

.

)

,

;

,

,

,.

-

{

print 'popped:

'

,

$st->pop

" {

,

)

\n"

'i'

'

;

'

;

••

)

print $st->top

,

{

)

"

\n"

This script produces the following output: perl stack.pl pushing: 3 pushing 5 pushing: 2 pushing: 9 Stack is full: at stack.pl line 7 popped 9 pushed: 42 top is: 42 popped: 42 Popped: 2 popped: 5 popped: 3 Stack is empty: at stack.pl line 16 Popped Stack is empty: at stack.pl line 18

'

$

, ;

,

:

running?

how

notice

Had we

the carp

(

)

-

!

,

,

(

)

_

_

.

,

.

.

.

messages point to the line in the script

used ordinary warn



^

:

Now,

,,

calls,

the

first

we

are

message would have been

stack is full: at Stack. pm line 26.

which wouldn't help us lem

lies

because

modules' croak

that's

{

)

locate the

where we

is

a

accepts, parses,

little

our

script,

which

and then

round and square,

than our

we can

test script.

One

to check for

let's

assume that

to be used

this

die

(

)

The Carp function.

example of using a

Consider a program that

program allows two kinds of paren-

—presumably

so that the person entering the

them from mis-entering

as a first test for a valid

equation

is

braces:

3+[(4*[9-2]-l)/2] 3+[(4*[9-2)-l]/2) STACKS, QUEUES,

where the prob-

evaluates simple mathematical expressions involving

of things you might want to do

mismatched

Perl's

give an

expression can alternate between brace types to help keep equations.

is

are attempting to overflow the stack.

a full parser here,

trivial

less

basic arithmetic. Further, theses,

in

function provides a similar alternative to

While we won't write stack that

problem

AND LINKED

is correctly nested is incorrectly nested

LISTS

305

;

;

A simple subroutine

;

;

{

{

When

stack.

a closing bracket

from the stack

to see if it

will

encountered,

it

in

brackets) encountered onto the

can pop the

last

opening bracket

of the correct type. Such a routine also catches instances

is

of missing brackets. For

is

(left

;

work through each token

using a stack can be used to

the statement and push any opening brackets

;

readability, these equations are nicely

spaced out, but

we

split on nothing to allow for equations that are not spaced out. The following

script provides the subroutine plus a helper routine for printing errors:

#

!

/usr/bin/perl -w

use strict; use Stack;

my $expressionl my $expression2

='3+[{4*[9-2]-l)/2]'; ='3+[(4*[9-2)-l]/2)';

check_braces $expressionl) check_braces $expression2 (

)

(

sub check_braces { my $expr = shift my $st = Stack->new my $pos = 0 " my $valid = 1; #assume validity my $ token; foreach $token (split //, $expr) { if ($token =~ m/\ \ [/) $st->push($token) }elsif ($token =~ m/\) \] /) { die not_valid{$token, $expr, $pos) if $st->is_empty my $prev = $st->pop{); {' unless $token eq ') && $prev eq or && $prev eq $ token eq " die not_valid($token, $expr, $pos) ( )

'

'

-

"

'

'

'

(

,

,

|

..

I

'

(

)

'

(

[

]

'

'

'

'

)

}

:

-

}

.,.,„!'.,;,,.

,

;

^

/

$pos++ .'I

}

$token = $st->top(); die not_valid $token, $expr return 1; (

sub not_valid { my $token, $expr $pos = @_; my $ptr = x 19 x $pos return new() $queue->is_empty $queue->is_full

,

.



(

(

-

'

)

'

' ,

)

i

$queue->enqueue $item) shoves $item into queue function rather we are merely renaming Stacks push than inheriting it: ;

(

{

sub enqueue { shift->push shift {

)

"c'*.

)

;

,

.

-

:

,

}

#$queue->dequeue removes and returns front of queue sub dequeue { my $self = shift; if $self ->is_empty carp ref($self)," is empty:"; return; (

)

;

;

(

{

)

)

;

,.;

'

{



-

-

?

.

,

,

.

,

,

.

,,^.j.„



'

,.

.

,

'

'

}

return splice (©$self

,

1

,

1)

'^j

;

.

;

"

}

,

# $queue->f ront returns front of queue sub front { my $self = shift; if $self ->is_empty carp ref($self)," is empty:"; return; (

)

^

{

)

,-

(doesn't remove)

;

(

;

,

^

)

'

} ,

return $self->[l];

,

'

=

i"^

}

1;

END

'

/

The only ing "Queue

'

thing really different here

is

empty",

it

is

the carp

says ref ($self)

is

(

)

empty.

function on an object returns the object's class name.

308

CHAPTER

18

statement. Rather than say-

Remember

We

that the ref

()

have to go back to the

OOP AND ABSTRACT DATA STRUCTURES

{

;

;

;

;

;

;

Stack module and change those carp

word stack

instead of hard-coding in the

$self being used in Stack's functions

is

carp

(

making

we can now run

statement,

)

in the messages because

no longer

ing the example above in Queue.pm and classes

statements to also use this technique

)

(

sometimes the

a stack but a Queue. After sav-

the

minor changes

to the Stack

the following test script:

#! /usr /bin/perl -w use strict; use Queue;

my $q = Queue->new for(3,2,4,9,ll) $q->enqueue $_)

(

4

)

(

}

print $q->dequeue {),"*** \n" $q->senqueue 42 " print $q->front ***\n" {

)

,

(

ford.

.5)

)

{

print $q->dequeue

, (

"

)

*\n";

}

print $q->f ront

(

)

which produces the following output

(the asterisks are only to help differentiate

different print statements):

Queue is full: at queue.pl line

7

3*** 2*** 2

*

4

*

9

*

42

*

Queue is empty: at queue.pl line 15 Queue is empty: at queue.pl line 17

18.4.3 It is

Linked lists

hard to appreciate

Linked

lists

and

that Perl gives

are another freebie. In

static in nature.

array

all

You

that's the

state at the

needed

many

for free

only array you get to use. isn't

a

at least

within the

STACKS, QUEUES,

What

built-in data types.

how

if

you

LISTS

big

you want your

aren't sure

problem because

how

big your

Perl itself takes care

of

grow and shrink

as

array that can

memory limits of your

AND LINKED

its

languages, structures such as arrays are

you by providing you with a dynamic



with

beginning of the program

array will need to be? In Perl, this this for

you

computer.

309

new memory on demand, linked lists can be used to solve such dynamic problems by providing a way of pointing to this new memory. Obviously, Perl can allocate new memory on demand during runtime. You can always create new variables in Perl, and Perl can point to this new memory through references to new variables, or anonymous structures. So, while As long

as the

language

is

able allocate

not often needed in most Perl programs, you can create linked Also, while the linked

list

may

structure

a linked

is

list?

in Perl readily.

not be that useRil, the same techniques

can be used to create other structures such

So what

lists

as trees.

Consider that you have to read in a variable number

of simple inventory records (part-number:name:quantity) and you want to be able to search the

do

list

for certain fields. Easy, right? Well,

without using an array or hash to hold the entire

it

use small

anonymous

say that

let's

list

now you

although you can

record as hash

record as array

or #

p_num => 144, name => pencil, quantity => 7,

[

144 'pencil',

#

,

#

#

,

7,

#

}



]

:

Graphically these records could be represented as in figure 18.4, where

you have some

fields, regardless

still

arrays or hashes to hold individual records as in

{

ply assume

have to

storage device with slots

of whether you also store the

Generic Record

Specific Record

part_number

144

part_name

pencil

field

you can use

names

we

sim-

to hold the different

[keys]

along with them.

quantity

Figure 18.4

The

Graphical representation of inventory record

slots that

contain the field data could hold any scalar data.

one of these structures structure,

why

^

is

just a scalar value. So, if we

can have three

fields in

each

not another, or even two more?

In figure 18.5, each structure holds two additional reference to the next structure (if any)

ous structure

A reference to

(if

any).

The

fields,

one containing a

and one containing a reference

to the previ-

small black circles are just our reference depiction from

chapter 8, and the arrows are meant to point to the whole next (or previous) structure,

not just one

field.

"link" (a reference) to

310

its

Each node, or individual record-structure, contains a next node and

CHAPTER

18

its

previous node.

OOP AND ABSTRACT DA TA STRUCTURES



,

Specific Records

Generic Record

part_number

&&

1

part_name quantity

pencil

pen 11



• Records with additional pointer

Figure 18.5

we had

have access to

a reference to the this

first

node and, by

node, and so on, until also

9Q1

7

Next

If

w

undef

Prev

we

node, perhaps stored in a variable, then

reference,

its

field

because

it

called a linked

is

is

We

undefined.

the references in each node's "previous"

only thing holding the different records together as a link them. Hence, this

list.

we

next node, and that node's next-

node whose "next"

hit a

work backwards following

fields

list

Actually, this

could

field.

The

are the references that is

a double-linked

list

has links in each direction.

In the above description, I've only described creating a linked

hold just a key you associate with your data,

ing your record.

items in the

data,

The key

of structures

A more general approach would be to use a key field— to

that contain the data itself

which would hold your

list

most

field

likely as

as

—and

a data field

an anonymous hash or array represent-

then your search

is

with a hash

field

when looking

for particular

list:

A node in the linked list

previous key data next

(reference to previous item),

=> => => =>

:

. ,

,j,,„

(your key) (your data), (reference to next item),

"

)

In will

some

cases, the

key

itself

might be the only

have a key and a record to go with

it.

data, but in

Such a record

is

many

cases,

you

called the satellite data

was not pertinent previously because our stacks and

for the key. Satellite data

queues did not need to support a search operation.

The methods

following package to insert

new nodes

emptiness, and return the

do not need singly linked

node



list

to traverse the list.

is

a partial implementation of a linked

into the

list,

search the

of the stored records

list

in

both directions,

list

(i.e.,

this

list

for a given key, test for

dump

the

list).

STACKS, QUEUES,

Since

we

package just implements a

Each node knows the next node, but no node knows

say that five times

supporting

its

previous

fast.

AND LINKED

LISTS

311

;

package Llist; sub new { my $tYpe = shift; $type; my $class = ref ($type) my $self = { } return bless $self, $class; |

|

}

All our

new

function does

anonymous

representing with an

and returned and

The

real

work

routine here that

my $data

=

This

?

will is

you

done by the insert

(

anonymous hash)

(i.e.,

blessed

list.

method.

)

is

are

One

our insertion

line in

haven't seen before looks like this:

{

operator

node

hash. This

be the base node for our

def ined $_

:

empty node, which we

create a completely

is

is

[

0

]

)

$_[0]

?

$key;

:

called the ternary operator

else clause in an expression.

general form

Its

and operates rather

like

an if /

is ^

(condition) ? (this expression) (that expression) returns: this expression if condition was true returns: that expression if condition was false :

The above line of code tests if $_ $data. If not, we assign $key to $data my $data because false,

=

it is

shift

|

|

0

[

]

is

...

defined. If so,

instead.

We do

we

not simply use

$key;

possible that the value

we

are shifting

is

0,

which would evaluate

and we would wind up assigning $key instead of the

value of 0. In the ternary conditional, to see if

it is

true.

To do our

—and,

assign that value to

Thus we avoid

insertion,

we

test to see if the

perfectly valid data

argument

is

defined, not

problem.

this potential

we need

as

to consider

two

cases:

Our

first

node, self,

is

we need only copy our key and data into it or our first node is not empty. In the second case, we insert by creating a new node and copy the first nodes data and next pointer into this new node. Then, we add our new key and data into the first node and stick the new node into our first node's next field. In this way, the list grows like a reversed stack, with new eleempty



thus, the

list is

empty, and

ments being pushed onto the front of the

node

Each node

actually contains

its

next

object:

sub insert { my $self my $key my $data

312

list.

=

= =

shift; shift; def ined {$_[ 0

CHAPTER

]

)

18

?

shift

:

$key;

OOP AND ABSTRACT DATA STRUCTURES

; {

if

(

{

;

;

;

;

;

} ;

;

$self ->is_empty = $key; $self->{key} $self->{data} = $data; $self->{next} = undef; (

)

)

}else{

$node = $self ->new = $self -> {key} $node->{key} $node-> {data} = $self -> {data} $node-> {next } = $self -> {next (

$self->{key} $self->{data} $self -> {next }

)

$key; $data; = $node; = =

}

return

1;

Testing

if

We do

by

this

the

list is

empty

requires only that

testing to see if the

key

field

is

we

test if

the

first

node

is

empty.

not defined.

sub is_empty { return Idefined shif t->{key} }

able

The search and dump-list methods are similar because they both must be to traverse the list. The difference between the two methods is that the search

will

end and return the data associated with that key

is

looking.

if it finds

The dump method must continue through

the key for which

the whole

it

list:

sub search { my $self = shift; my $key = shift; return 0 unless defined $key;

while (1) return $self -> {data} if $self->{key} eq $key; last unless defined $self -> {next } $self = $self->{next} {

}

return

0

sub dump_list { my $self = shift; my @list = ($self->{data} while (1) last unless $self -> {next } $self = $self->{next} push Olist $self -> {data} )

,

}

return @list; }

1;

END

STACKS, QUEUES,

AND LINKED

LISTS

313

Notice

new

in

that, unlike

data with keys

found item.

the

first

keys

—by

with a hash, there

we have

We

already used.

nothing preventing us from sticking

Our

search

method

will

only return

could change our insert routine to disallow identical

searching to see

first

is

if

the key already exists



or to replace the data of

the key if that key already exists in the list rather than adding a new node. (Again, we would have to search the list to find the right node, if any.) The following exercises allow you to add further functionality to this object.

18.5 1

Exercises

Use perldoc

perl toot to read

object-oriented

what you

Tom

programming with

Christiansen's excellent tutorial

view the

Perl (or

Also take a look at the perlobj and perlbot pod-pages.

3

The

4

linked

list

object could benefit from a

rather than the values.

The main this,

you

Add

need to search the

list.

method

a dump_keys

thing missing from our linked

first

out of the

if that's

list

(

)

list is

that

method a

dumps

to this object.

method

to find the node.

the keys of the

to delete a node. For

Then you must

splice

it

This requires that you keep track of the previous node because

to splice the current

current node's next

314

page

prefer to use to read the documentation).

2

list

HTML

on

node you need

to set the previous node's

next

field to the

field.

CHAPTER

18

OOP AND ABSTRACT DATA STRUCTURES

CHAPTER More

19

OOP examples

19.1

The heap

19.2

Grades: an object example

19.3

Exercises

as

an abstract data structure

330

315

320

316

,

we

In this chapter,

heap

class,

and

begin by returning to our heap data structure and implement a

discuss

some of it

example of creating a few example we used

make

From

classes.

We

then turn to a more practical

these classes,

beginning of chapter

at the

from

reports

applications.

we can

revisit

and extend the

building a system to query and

8,

a data base of student assignment

and exam

grades.

The heap as an abstract data structure

19.1

may wish

Before tackling this section, you structure

and the algorithms we used

to familiarize yourself with the heap

in chapter 17 to create a

heap and pull items

off the top of the heap.

Heap

For our use the

first

will again use

We



also

in this case, the

an anonymous subroutine we can use to compare two

as well as

keys in the heap.

We will

an array to store our heap.

element of the array to store additional information

of the heap,

size

we

class,

begin by defining our package and setting up a private hash

of comparison functions.

package Heap; use strict;

my

%coinp =

str rstr num rnum

{

)

:

=> => => =>

$_[0] $_[!] $_[0] $_[1]

cmp cmp

$_[!]}, $_[0]}, $_[!]}, $_[0]}, •

;

This hash

private to the

is

expect the caller of our new

comparison routine to

and

{return {return {return {return

sub sub sub sub

.

Heap

package, no other package can access

We

constructor to supply an argument indicating which

( )

str and rstr are for string comparisons in normal

use.

num and mum.

reverse sorted order, similarly for

default to using

it.

If

no routine

is

specified,

numeric comparisons:

we

«

sub new { my $this = shift; my $class = ref ($this) |$this; my $comp = shift 'n\im' my $self = size => 0, { comp => $comp{$comp} |

(

|

)

|

;

[

}, ]

;

return bless $self $class ,

'

;

_



}

This constructor has with a hash reference in

set

up our

its first

the heap and a reference to the

316

initial

empty heap

as

an anonymous array

position. This hash reference contains the size of

anonymous subroutine

CHAPTER

19

for comparisons.

MORE OOP EXAMPLES

;

;

;

Unlike our heap model in chapter the heap

We

on demand

will first create a

—and then

ment

sub insert { my $self my $key my $data my $node

call

1

we wish

7,

to be able to insert items into

rather than building a heap out of an existing

node out of the arguments passed

in



of items.

list

and a data

a key

ele-

another method to do the actual insertion:

""

shift; shift; = def ined $_ 0 ? shift = { key => $key, data => $data,

"'

=

,

=

[

(

]

)

$key;

:

'

(.i

)

''

'

{

-

.

,

);

$self->_insert_node $node)

,

,

'

.r;

,

'

-

;

, _

}

V',.

tp^"'

Using an underscore the class. Such

method

to

start

actual insertion

has the effect

moving up the

element

if it is

just a

convention to mark some functions

methods should never be

do the

heap, which

is

leaf

We first increase the size of the tree. We then

onto to our binary

down one

node, copying each parent node

this final

smaller than the current key.

than the current node,

from outside the package. Our

as follows.

new

of adding a

from

tree

works

called

as private to

we have found

WTien the next parent node

is

larger

the place in the heap to insert the node,

which we do. sub _insert_node { my ($self, $node) = @_; my $i = ++$self-> [0] {size} my $comp = $self-> [0] {comp} while $i > 1 and $comp-> $self-> [parent $i) {key} $self->[$i] = $self-> [parent {$i) $i = parent $i

*"

;

(

(

(

]

,

$node-> {key } [0] {size} < return;

our

don't put the top element at the

We still need to put something in

take the last element of the heap

in

and push

it

down

the top posi-

using the push-

17.

'

"

1)

'

:'"

"

""

{

}.

318

CHAPTER

19

MORE OOP EXAMPLES

;

;

my $top = $self-> [1] {data} my $node = pop @$self; $self-> [0] {size}--; if

($self->[0] {size} > $self->[l] = $node;

0)

;

;

;

^

{

$self->_pushdown(l) ^

}

return $top; }

The

routine to pusli an element

down

the heap

same

the

is

as

we used

in

chapter 17, only modified to use the comparison routine.

sub _pushdovm { my ($self, $i) = my $size = $self-> [0] {size} " my $comp = $self-> [0] {comp} while{ $i ($self->[$lcid] {key},$self->[$kid +l]{key}) ,

.

.

,

'

'

;





.

.

""

'

)

(

{


= 0; last if $comp-> ($self-> [$i] {key} $self-> [$kid] {key} = ($self->[$i] $self->[$kid] $self -> $kid] $self -> $i $i = $kid; )

,

)

,

(

[

]

[

,

)

-

}

}

The all

three class

methods we use

to return the parent,

left,

and

right indices are

extremely simple one-line fianctions:

sub parent sub left sub right

{ {

{

return int($_[0] return $_[0] * 2 return $_[0] * 2

/

2)

}

}

+

1

}

1;

.

END

The

following

data and insert as the key.

We

is

a simple test script to read in a few lines of colon-separated

anonymous hashes of these want

to process these records in alphabetical order, so

heap constructor the string rstr to order.

Our heap normally puts

this idea

records into a heap using the

tell it

to

compare using

the largest element

and put the smallest element on top

on the

name

we

field

pass the

reverse alphabetical

top, so

we need

to reverse

to get proper alphabetical ordering:

THE HEAP AS AN ABSTRACT DA TA STRUCTURE

319

{

;

{

;

{

;

# /usr/bin/perl -w use strict; use Heap; !

my $heap

Heap->new

=

( '

rstr

'

)

while () chomp mY($name, $age, $beer) = split $heap->insert $name {name => age => beer =>

/:/;

$name, $age, $beer,

,

(

"

'

}

while ($_ = $heap->extract_top foreach my $key (keys %$_) print " $key :$_->{ $key} \n" {

)

)

}

print

\n"

"

; '

}

DATA

' '

Brad: 37: ale Andrew: 3 5 ale Susanna 4 0 lager John 33 stout :

:

:

:

:

A heap you can

can be used to implement what

known

as a priority

queue, where

store a Hst of things to process according to their priority.

These queues

are usually

is

dynamic; new elements are inserted while elements from the queue are

processed. In the real world,

you could consider

a hospital

emergency room

as

using a priority queue to process patients. Patients are not treated in order of

but according to the severity of their physical condition. In the computing

arrival,

world, you ing

on

may

have heard of priority queues being used to manage job schedul-

a multi-user

Different users

computer or

may

as the

have different

means of holding jobs

priorities

entered into the queues based on their priority

19.2

on the system, and

Grades: an object example

In this section,

more

we

will

no longer be dealing with standard computer science types

own

structures to use as objects

familiar kind of data processing task.

Recall the

problem given

at the

beginning of chapter

report of assignment grades for each student in a

generalized version of the

same problem.

class.

8.

We

Here we

needed to create a

will consider a

more

We want to be able to store grades for stu-

dents in a given course and retrieve current

320

their tasks are

level.

of data structures. Instead, we will formulate our for a

in a printer queue.

summary information

CHAPTER

19

for each student

MORE OOP EXAMPLES

5

-

As

in the course. ticular

before,

we

want

just

assignment in a plain text

file

(Perhaps the exams are performed

appended

to this

one course. In

To

we

want

we

as these

assignments

on

web and

the

come

in

and

are graded.

automatically graded and

to be able to use our

program

more than

for

consider two courses: Math-101 and Math-201.

define a configuration

The format

course.

also

example,

this

begin,

We

file.)

to be able to enter a student's score for a par-

format that can be used for each

file

will specify the total score possible for

each assignment and the

value of each assignment's contribution to the final course grade. For example, in

our Math-101 course,

we may

decide that

we

will give three assignments, each

scored out of 50, but counting for only 25 percent of the final mark; and one

exam marked out of 75 and counting our configuration

file

25 percent of the

for

fields for the

Assign 1 50 25 Assign:2:50:25 Assign: 3 50 25 Exam 1 7 5 2 :

create

(Assign or of,

and the

number out of

like this:

>-

*

-

"

"

'Cf

.

-

;

.

..

:

:

:

When we math-101

marked out

:

:

:

it is

contributes toward the final grade (expressed simply as a

100). Thus, our math-lOl.cfg^Xe. looks

:

work

type of graded

Exam), the assignment or exam number, the raw score it

We

math-lOl.cfgzs, colon-separated records, one for each type of

graded work. Each record contains

amount

mark.

final

record the grades in our grade

—we need

to record the student

file

for the course

—named here

name, the assignment number

(1, 2,

or

3 for assignments, or El for the 1st exam), and the raw score the student obtained.

So part of the data Bill Anne Sara Sara Bill Bill Anne Sara Anne Bill Anne

file

for this class

might look

like this:

Jones:2:35 Smith: 3: 41 Tims 2 45 Tims 3 39 Jones 1 42 Jones:El:72 Smith: 1:42 Tims 1 41 Smith: 2: 47 Jones: 3:41 Smith:El:69 :

:

:

:

:

:

Now,

'

'

:

:

let's

look at the types of things

we have

in

have courses, students, and assignments (or exams). eral

category

contains a

GRADES:

first,

list

of

the course category. its

students. Thus,

AN OBJECT EXAMPLE

A

we

our problem statement Let's

—we

consider the most gen-

given course has a course name, and also create a

file,

with a

it

.std extension.

321

{

;

containing

the

all

we have

course,

Bill Jones Anne Smith Sara Tims Frank Worza

;

;

in a given class. So, for

'

'

,

to use this course class in our reporting

Well, keeping things simple and assuming

whole

class,

our Math-101

math- 1 01 .std containing

file

how might we want

So,

,

names of the students

the

'

;

.

we can imagine

argument:

program that

a

we

just

want

program?

to create a report for the

invoked with the course name

is

as

an

'

# /usr/bin/perl -w use strict; use Course; die "You must supply a course name.\n"; @ARGV my $class = Course->new $ARGV 0 while () chomp $class->add_student_record (split /:/) !

I

I

[

{

]

)



;

.

1

;

}

$class->print_report END

(

)

Remember, our data data

file is

in $argv[0]

name

of grades has the same

file

when

the program

course, iterates through the data

file

is

called.

as the course, so the

This program creates a

adding student records to the course, and

then prints out a report of the course grades. Presumably, the Course

how

to

add student records and print the

Student class

list.

class

Let's

and have

it

store a

list

new

report.

We

make our Course

of student objects

begin building our Course

class

— one

for each

knows

class use a

name

in the

class:

package Course; use Student; use strict; sub new { my $type = shift; my $class = ref ($type) |$type; my $course = shift; => $course, my $self = { course => 0, number students => { } |

}

bless $self, $class; $self ->_conf igure_course return $self;

322

(

)

CHAPTER

19

MORE OOP EXAMPLES

{

;

;

;

;

This constructor creates a Course object that contains

number of students

fields for the

name of

contains an

empty

hash reference that holds a hash of student names and the student objects.

Much

the course, and the

of the

work

real

is

done within the configuration routine we

the constructor. This routine

and the student

as well as a field that

list

is

responsible for reading in the configuration

name.

for the given course

It

list

and

creates a

new student

Thus

my %cfg; open (CFG, $cfg_f ile) while ()

list.

'

'



"

*

'.cfg';

.

:

die "can't open $cfg_file: I

the routine reads the

object for each student in the

sub _conf igure_course { my $self = shift; my $cfg_file = $self->{course}

file

builds a configuration hash that

holds the information about each assignment or exam.

student

near the end of

call

.

>,.,,,



!

$

" ;

-I

I

chomp;

my ($type, ©data) $cfg{$type} [$data $cf g{ $type" "

.

'_no

split /:/;

= [0] '

}

]

=

[

(adata[l,2]

];

++

}

.

-

-

-

close CFG; my $stud_file = $self-> { course} '.std'; open (STD, $stud_f ile) || die "can't open $stud_file: $!"; while () chomp (my $name = $_) $self-> {students }{ $name} = Student->new \%cfg, $name) $self-> {number }++

'

'

.

;

.,

.

.

.....

(

close STD; }

We

can

now add

a few accessor type fiinctions to retrieve the data in our

course object. These are pretty straightforward: # course returns the course name sub course { my $self = shift; return $self-> {course} (

)

:

}

number returns the number of students in the course sub number { my $self = shift; return $self->{number} #

(

)

:

}

#

#

student (name) returns the student object associated with the given name

GRADES:

:

AN OBJECT EXAMPLE

323

;

;

sub student my $self = shift; my $name = shift; return $self-> { students }{ $name }

;

;

;

;

{

undef;

||

}

list(): returns the sorted list of student names sub list { my $self = shift; my ©list = map { $_->[0] } sort{ $a->[2] cmp $b->[2] } map { [$_, split] } keys %{ $self-> { students )} return @list; #

}

We

need to add the functionality we saw in our main program

need to be able to add student records and print a

we

class report.

— namely, we

For these functions,

assume that the student objects themselves have functions for adding assign-

ments and printing

their

own

reports:

'

sub add_student_record my $self = shift; my $name = shift; my $student = $self->student $name) $student->add_assignment(@_) {

(

}

sub print_report { my $self = shift; my $course = $self ->course my $number = $self ->number print "Class Report: course = $course: students = $number\n" $self ->list foreach my $name ){ $self->student $name) ->print_report '

"

(

)

(

)

(

(

• ,

'

)

(

(

)

}

'

1

"

\

;

"

_END

.'

-

'

Don't forget to end your

module with a

we know that we Student constructor, and we know

shown above. At hash to the

class

this point,

will

'

true statement such as the 1

be passing the configuration

that class will need a

method

adding assignments to the student and printing a report on the student. Just

Course class

we

class

used the Student

our Student

class

our

makes use of an Assignment

will define later:

package Student; use Assignment; use strict;

324

class,

as

for

.

_

.

,

CHAPTER

19

MORE OOP EXAMPLES

{ ;

;

;

sub new { my $type = shift; my $class = ref ($type) my $cfg = shift; my $name = shift; my $self = {config name assignments

;

;

;

;

'!'' |$type;

'

|

'-h;

:

,.

,

=> $cfg, => $name, => 0,

bless $self $class; return $self; ,

'

• .

}

much new

we have fields for a reference to the configuration hash and the students' names passed when each new student is created. We also have a field to hold the number of assignments this student has There

is

nothing

in this constructor;

completed.

ing

The method to add an assignment creates new fields in the object; one holdan anonymous array of assignments, and the other holding an anonymous

array of exams.

These arrays

are

populated with

new Assignment

objects,

and the

assignment count incremented: '

sub add_assigninent { my $self = shift; my($nuin, $score) = @_; my $cfg = $self-> {config} my $type; if ($num =~ s/"E//) $type = Exam else { } $type = 'Assign' '

'

?

'

'

}

$self->{$type} [$num] = Assignment->new($cfg, $type, $num, $score) $self-> {assignments} ++ }

We

also

add a couple of accessor

fiinctions to this class to return

assignment objects (either assignments or exams). discard possibly

empty elements

We

use the

lists

grepO function

of to

in either array:

sub get_assigns { my $self = shift; return grep{$_} @ { $self-> {Assign}

}

}

sub get_exams { my $self = shift; return grep{$_} (a{$self->{Exam}

}

}

GRADES:

AN OBJECT EXAMPLE

325

;

Our

{ ;

;

function for printing a student's report

assumes that a given assignment knows to maintain a



mark

{

running

how

is

;

{

;

longer, but not complicated.

to print

its

own

and only needs

report

of each assignments contribution toward the

total

It

final

$f score:

sub print_report { my $self = shift; my $cfg = $self -> {conf ig} \n" print $self-> {name}

;

r

,

"

:

$self ->get_exams unless $self ->get_assigns print "\tNo records for this studentXn" return; {

(

)

(

|

)

)

|

my $f total, $a_count, $e_count) = (0,0,0); foreach my $assign $self ->get_assigns print \t $assign->print_report $f total += $assign->f score {) $a_count++; (

(

"

"

(

)

)

(

)

r



;



;

'

;

-

.

"

/

;

>

foreach my $assign $self ->get_exains print \t " $assign->print_report $ftotal += $assign->f score () $e_count++; (

"

(

)

)

{

^

;

(

)

,

'

;

.

\

' '

'

;

'

.

,,

„ -

.

}

.

.

;

.

, ^

== $a_count and $cfg->{Exam_no} == $e_count) print "\tFinal Course Grade: $f total/100\n"

if ($cfg->{Assign_no}

}else{

print print

\n"

"

"

\tIncomplete RecordXn" '



;

.

}

'

1;

END

'

-

'

r'^^'-

' :

;

we have our Assignment class. The constructor for this passing off the hard work to its own _assign{ function:

Finally,

simple,

is

fairly

)

package Assignment; sub new

class

.

^

{

my $type = shift; my $class = ref ($type) |$type; my $cfg = shift; my $self = {config => $cfg} bless $self, $class;

'

|

326

-

'

CHAPTER

-



19

:

MORE OOP EXAMPLES

;

$self->_assign return $self;

(@_)

;

;

;

;

;

.{score} print $self->{ fscore}

raw

,

:

"

,

,

/

=

";

$self -> {raw} " "/", $self->{ final} ,

"

,

:

Adjusted

=

";

," \n"

}

1;

END

GRADES:

AN OBJECT EXAMPLE

327

:

:

Now we pass

can run our original program given at the

name of our

the

it

:

same name

as well as the

and data

files

in a

$ perl report.pl math-101 Class Report: course = mathFrank Howza No records for this Bill Jones Assignment 1 raw = Assignment 2 raw = Assignment 3 raw = Exam 1 raw = 72/75 Final Course Grade Anne Smith: Assignment 1 raw = Assignment 2 raw = Assignment 3 raw = Exam 1 raw = 69/75 Final Course Grade Sara Tims Assignment 1 raw = Assignment 2 raw = Assignment 3 raw = Incomplete Record =

:

=

:

named

file

same directory

in the

We

of

as

report.pl,

of the

created.

files

and we have

file

all

three

and

We

mod-

our program:

students

101:

this section

already have the data

math-101. cfg and the math-101. std

assume we save the program ules

Math-101.

course,

start

=

4

student

Adjusted = 21 00/25 Adjusted = 17 50/25 Adjusted = 20 50/25 Adjusted =24.0 0/25 83/100

42/50 35/50 41/50 :

Adjusted = 21 00/25 Adjusted = 23 50/25 Adjusted = 20 50/25 Adjusted = 23.00/25

42/50 47/50 41/50 :

88/100

:

You'll notice that the student

in

our

file.

class list file,

Similarly, Sara

Adjusted Adjusted Adjusted

41/50 45/50 39/50

20 50/25 22 50/25 = 19 50/25 = =

name Frank Howza

has no records.

appears

but there were no assignments or exams for him in the data

Tims

has no record for her exam, so her report

incomplete. If

He

is

marked

as

,

we had another

course with a completely different set of assignments (and

perhaps different students),

we could simply

class list file for that course's

create a

new

configuration

data and run the same program on

it.

file

and

For example,

our Math-201 course might have only two assignments, both out of 50 points, but with the

first

final grade.

one making up 25 percent and the second only 10 percent of the

There

are also

two exams, both out of 75, with the

20 percent and the second exam contributing 45 percent configuration

Assign: Assign: Exam 1 Exam 2

328

file

1

:

50 25

2

:

50

:

:

:

:

75

:

20

:

:

75

:

45

would look

first

contributing

to the final grade.

The

like this:

•. ,

.

10

CHAPTER

19

MORE OOP EXAMPLES



,

For simplicity's sake, grades

Bill Sara Sara Anne Anne Bill Sara Anne Bill Bill Sara

now

we assume

the same class

list

applies.

The

data

of

file

appears as

Jones: 1:43 Tims 2 32 :

:

Tiins:l:44 Smith: 1 44 Smith: 2 39 Jones El 75 Tims: El: 69 Smith: El: 70 Jones 2 40 Jones :E2 75

.,

:

:

:

:

:

:

,



.

-

:

,

.

.

,,,



.

:

\.

:

Tims:E2:69

.

Anne Smith: E2: 70

Running our same program with an argument of math-201 now produces full

report for this

new

a

course:

$ perl report.pl math-201 Frank Howza: No records for this Bill Jones: Assignment 1: raw = Assignment 2 raw = Exam 1: raw = 75/7 5 Exam 2: raw = 75/75 Final Course Grade: Anne Smith: Assignment 1: raw = Assignment 2 raw = Exam 1: raw = 70/75 Exam 2: raw = 70/75 Final Course Grade: Sara Tims: Assignment 1: raw = Assignment 2 raw = Exam 1: raw = 69/7 5 Exam 2 raw = 69/75 Final Course Grade: :

:

:

:

student

Adjusted = 21.50/25 Adjusted = 8.00/10 Adjusted = 20.00/20 Adjusted = 45.00/45

43/50 40/50 :

:

:

:

94 .5/100

Adjusted = 22.00/25 Adjusted = 7.80/10 Adjusted = 18.67/20 Adjusted = 42.00/45 90.47/100

44/50 39/50

:

:

:

:

Adjusted = 22.00/25 Adjusted = 6.40/10 Adjusted = 18.40/20 Adjusted = 41.40/45

44/50 32/50 :

:

:

:

88 .2/100

Because the reporting functionality

is

built into the objects themselves,

we

can modify the program to provide an interactive query for individual students or even individual assignments



if

we

desired.

Here

is

a version that allows

you

to

query the data for individual student reports: # /usr/bin/perl -w use strict; use Course; !

@ARGV I

I

GRADES:

die "You must supply a course name.Xn";

AN OBJECT EXAMPLE

329

{ ;{

;

{

;

;

my $class = Course- >new $ARGV 0 ]) while () chomp $class->add_student_record split 1:1); [

{

(

}

while (1) print "Enter a student name [or '1' for list; 'q' to quit]: chomp (my $name = ) last if $name =~ m/'^qS/; \n" and next if $name eq print join " \n" $class->list if $class->student $name) $class->student $name) ->print_report } else print "no student by that name\n"; (

,

(

,

'

"

)

1 '

)

{

{

)

"

{

(

)

;

}

}

By

using a configuration

file

to define the

marking scheme for the assignments

and exams, rather than hard-coding them into

a script,

we not only

achieve the

generality to use these programs with other courses, but to change our course configuration as well. Occasionally, a teacher will decide that a particular assignment

shouldn't count for as

much

can simply reduce the

final

the values of one or

more

as originally intended.

contribution

amount

1

for that assignment

and

program

to see the

new

increase file.

The

results.

Exercises

Another handy feature to have in our grade-tracking system would be the ability to read in data

2

this system, the teacher

of the others respectively in the configuration

teacher can then simply rerun the reporting

193

With

retrieve

and print

student

is

on multiple courses and then query

a record for a given student for

registered.

Try adding

all

the data base to

the courses in which that

this feature.

Examine any programs you may have written while working through

book

to see if your data could be

modeled

in a

more object-oriented

this

fashion.

Create classes to represent the data and behaviors, and rewrite your program to use these objects.

530

.

i

,

/

CHAPTER

19

MORE OOP EXAMPLES

CHAPTER What's What's the

left?

20

left?

A lot! You have nearly reached

the end of this book, but you're not at

end of the road by any means. As Larry Wall quotes

"The road goes As

ever

on and on

stated in the introductory chapter, Perl

I

have covered a great deal in a short space, there

One we

thing

we

in Perl's

own

source code,

"

never discussed

haven't left that out altogether.

is

how

is

is

a large language, and although

to use Perl as a

Appendix

more

quite a bit

A provides

we

Perl to discover.

command

line tool.

a brief overview of

But

some

common command line switches with a few examples of running Perl command line. Appendix B provides a brief reference on a few of Perl's

of the more

from the

special built-in variables. (See the perlvar ^od-^di^c for a

We

did a

little bit

complete

of network programming using the lwp: simple module in :

chapter 14. Perl has low-level socket programming with the socket several other

ming

network

related functions, as well as a

interface via the lO

gramming, you can

Did you

ever

:

:

want

to have

it

(

)

function and

more convenient socket program-

Socket module. To find out more about network pro-

see the perlipc

time and then execute

listing.)

pod-page

(ipc for inter-process

communication).

your program read in raw source code during run-

within your program?

331

Perl's

eval

(

)

function can be used

to

do

tines.

of thing.

this sort

Read about eval

(

It

can also be used to trap errors raised by other subrou-

Advanced Perl Programming by

in the perlfunc pod-page.

)

Sriram Srinivasan^ has a very good chapter on the uses of eval Perl

can also do more than just open read and close

0 and unlink ()

can rename

ownership (chown

With

Perl,

forking), use I

keep

{

)

and

),

you can

(delete) files,

create

also

DBM databases

stressing,

CPAN

{

)

directories (mkdir

children processes

(several different flavors),

— never be

solutions. Indeed, if

(if

(

)

,

—you

and mndir

(

)

and even more

yet.

And,

for

what

it left

CPAN

you can do with

To help you

further your learning.

other Perl resources. This

landscape that

My

I

is

hope you

original plan

not an exhaustive

Perl isn't

others

—and

sider sacred. to

in the sense that

Appendix

Perl, the list

much

332

C provides a brief list of

as

I

do. I

think are

and coding of your programs.

style,

about following the rules and rigidly adhering to the conventions of I

know

So

if

I

I've

broken a few

have any wisdom

do whatever works

Srinivasan,

you never stop

was to leave off by reviewing some of what

for

you

—and

rules in this

at all to

book

impart in

to have fun doing

,

1

Perl,

merely a few signposts in a vast

list,

will enjoy exploring as

important rules or guidelines in the design,

But

all

out.

"You don't learn a natural language even once, it."

as

for easy

Larry Wall once said while writing about the natural language aspect of

learning

).

your system supports

afraid to look to

tried to catalogue everything

I

directories

contains hundreds of additional modules providing

manners of additional functionality

would be notable only

and

change their permissions (chmodO) and

and remove

fork

files

as well.

)

(

that others might con-

my final

remarks,

it is

just

it.

I

Snram. Advanced Perl Programming. Sebastopol, CA: O'Reilly and Associates, 1997.

\V'



CHAPTER 20

WHAT'S LEFT?

Command line switches command line switch, the -w switch, which turns on warnings. Quite a few additional command line switches exist, many of which are designed to facilitate using Perl as a command line tool. By command line tool, I mean invoking Perl directly from the command line of

You

with the most important

are already familiar

the shell

and supplying one or more statements

to perform. Consider the follow-

ing one liner:

perl -pi.bak -e

'

s/red/blue/g

'

filenames

This combines several switches and takes every occurrence of red with blue. sion. In this appendix,

switches. (See

we

It

all

the filenames given

also saves the original files

will briefly describe the

most

perldoc perlrun for additional switches and

and

with a

useful

.

replaces

bak exten-

command

line

fiirther information.)

-C

When

used, Perl will compile the script

program. This

is

a useful

out actually running $

first

and check the

step in testing your

it.

perl -c programname

333

syntax, but will not run the

program

for syntax errors with-

-e commandline may be

Takes commandline, which $ perl -e argument

'

$blah

several statements, as the script to run.

shift @ARGV; print "$blah\n"' argument

=

-P This switch causes Perl to construct the following loop around your script (whether your script

while

()

is

in a

file

or given in a -e argument):

{

your script continue { print; #

}

"

'

.

V

^

-

}

^

Hence, the following three examples are perl -p -e

$

'

s/red/blue/g

the same:

all

filename

'

';.

# /usr/bin/perl -p s/red/blue/g;

-

!

'

'

.

:

/usr/bin/perl while () { s/red/blue/g; #!

-.

-

}

continue { print; } ,

, .

:

,

.

,

.

" •

^

,

,7-;;

;

,

.^_.J,.

^

;

:

.

-n

iv

'J

:

perl -n -e 'print if m/foo/'

This prints

all

lines

^

.J.

\

-,.„r>

:

"

Like -p except without the continue block. $

;:/^

^

-

.

;

-

>

l,

filename

containing the pattern

f 00 in the file filename

-±[ extension] .

This means that

all

the

files

given on the

command

In other words, changes are written to the extension, then the original

$

files

files

line are to

be edited in place.

themselves. If given the optional

are saved with that extension:

perl -p -i.bak -e 's/red/blue/g'

filenames

'

or with the switches combined: $

perl -pi.bak -e 's/red/blue/g'

334

filenames

COMMAND

LINE SWITCHES

;

-a Used with -n or -p performed

to turn

as the first

on

mode. This causes a split

autosplit

('

')

to be

statement in the imphed loop, and the results of the split

are assigned to the array @f

.

The

following three examples are equivalent and will

print the second field in each line of data in the given

file:

perl -a -n -e 'print "$F[l]\n"' filename perl -ane 'print "$F[l]\n"' filename

$ $

# /usr/bin/perl while () { @F = split print "$F[l]\n"; !

'

/

'

-Fpattern Allows you to supply an alternate pattern to using -a).

To

perl -an -F/

split

:

/

split

on

in autosplit

mode (when

each line on colons instead of the default whitespace:

-e

'print "$F[l]\n"'

filename

This invokes the Perl debugger on the script

(see

chapter 15).

-Mmodule Allows you to use a module from the $

command

line:

perl -MCPAN -e 'shell'

Invokes Perl and

calls

use CPAN; before executing the statement shell, which,

under the CPAN pm module, puts you into an interactive .

shell

mode.

-V Prints the Perl version information.

-V



Prints a detailed

and

summary of the

configuration details used

when compiling

Perl

prints out the value of the @INC array.

COMMAND

LINE SWITCHES

335

9^

perlvar

Table B.1

for further information.)

Special variables

Variable

Description

$_

Default variable for input and pattern matching.

$.

The current

$/

number

of the current or last

Input record separator. Default

;

is

Output record separator. Default

$\ $"

line

' ,

V

a newline.

List separator.

Value printed between items a

double-quoted

handle read.

a newline. is

interpolated

in

file

in

string. Default

an array is

when

it

is

a space.

$0

The current program name.

$'^W

Current warning value. You can set this within a script to turn warnings off and on for particular blocks of code.

$ARGV (iARGV

.

Current

file

Command

being read from . line

arguments.

336

Table B.1

Special variables (continued)

Variable

Description

@INC

Search paths for

&F

The

%ENV

Hash of current environment variables, may be set to change the environment for spawned processes.

%SIG

Hash

ARGV

File

use and require statements.

autosplit array.

of signal handlers. (See

perldoc perlipc.)

handle that iterates over @ARGV, also specified with the empty

input operator:

STDIN

Standand input

file

STDOUT

Standard output

DATA

Special

file

DATA

SPECIAL VARIABLES

'

handle.

file

handle.

handle referring to data following an

END

or

token.

337

A

F

P

E

N

Additional resources Your

first

line

of investigation should be the documentation and FAQs that are

included with your distribution of Perl, but here are a few additional resources for learning

more about

Perl.

Newsgroups comp.lang.perl.misc.

The primary forum

for discussions

and questions regarding

the Perl language. comp.lang.perl.modules.

A forum for discussions and questions

relating to the copi-

ous existing modules as well as issues surrounding creating your comp. langperl. moderated.

modules.

A moderated forum for Perl discussions.

comp.lang.perl.tk. Discussions involving using Perl

comp.lang.perl.announce.

own

Announcements

with theTk graphical

relevant to the Perl

interface.

community.

Web pages www.perl.com. Your starting place for exploring the world of

can find links to the Perl documentation,

CPAN, and

338

lists

Perl.

From

here you

of other resources.

reference.perl.com.

A

reference

list

of modules, tutorials and other Perl-related

things.

www.perl.org.

The

Perl Institute's

and information. At the time of and was being passed on

homepage, another good place this writing, the Perl Institute

to the Perl

to find Perl

had

news

just dissolved

Mongers, but the web address

will

probably

remain the same. (If not, check the Perl Mongers page given below.) theory.uwinnipeg.calsearchlcpan-search.html.

CPAN

A

engine for searching the

search

archives.

www.pm.org. The Perl Mongers homepage. Visit here to find a Perl Mongers group in

your area or

to start

your

own

if

one doesn't

exist near you.

Books and magazines Christiansen,

Sebastopol,

Tom, Randal Schwartz, and Larry

CA: O'Reilly and

Wall. Programming Perl,

Associates, 1996. Also

book

(because of the animal on the cover), this

somewhat out of date

is

language.

It is

the text

duplicated in the included documentation.

is

Christiansen,

is

an

asset for

any

Perl

rithms.

Cambridge,

book

programmer's

Cormen, Thomas, Charles

Leiserson,

MA: MIT

Press,

as "the

is

camel book"

book on

(current to Perl version 5.003),

Tom, and Nathan Torkington. The

O'Reilly and Associates, 1998. This

and

known

the reference

2nd ed. the Perl

and much of

Perl Cookbook. Sebastopol,

chock-full of recipes

CA:

and examples

library.

and Ronald

1990. This

is

Rivest. Introduction to Algo-

an outstanding introduction to

algorithms and data structures. If you aren't interested in the analysis of algorithms, the data structures presented later are

The Perl Journal. This quarterly journal

is

still

easily

understood.

an excellent resource with

articles rang-

ing from beginner to advanced to the simply whimsical. See http://tpj.com/ for subscription details

and contents of pervious

BOOKS AND MAGAZINES

issues.

339

Numeric formats Our

standard

such

as 42,

it

number system really

means 4

and each position leftward

uses base- 10 numbers. If

are three other

binary, octal,

base 8,

number

in a

number counts

common

342

is

units 10 times (or base times)

3 hundreds and 4 tens and 2 ones.

bases used for numeric data in

16, respectively.

count up to 15 using single

a

and 2 ones. The rightmost position counts ones

and hexidecimal (usually

and base

you consider

tens

greater than the previous position. Thus,

There

-

Hexidecimal

digits, so

stand for the numbers 10 to 15.

just called hex).

The

we

is

a

little

computer

science:

These represent base

2,

odd because we cannot

use letters instead



the letters "a" to "f"

following table shows several numbers writ-

ten in each of these base formats.

340

R

Table D.1

Numeric formats

Decimal (base 10)

Binary (base 2)

0

0

Octal (base 8)

0

Hex (base 0

1

1

10

2

2

3

11

3

3

4

100

4

4

5

101

5

5

6

110

6

6

7

111

7

7

8

1000

10

8

9

1001

11

9

10

1010

12

a

11

1011

13

b

12

1100

14

c

13

1101

15

d

14

1110

16

e

1

1

2

1111

17

16

10000

20

10

17

10001

21

11

1

32

1

00000

40

20

42

101100

52

2a

NUMERIC FORMA TS

16)

341

glo.

absolute path.

The

the root of the

system.

alias.

file

When

full,

unadulterated directory path to a

file,

beginning with -

one variable represents another

variable,

it is

said to be an alias for

that variable.

In regular expressions, an anchor

anchor. ter,

is

a special character, or metacharac-

that matches a particular location in a string as

opposed

to

matching

a particu-

lar character.

What you

argument.

give your spouse (or child, or parent)

person what to do. Similarly, an argument routine that

array.

tells

is

when

telling that

data that you give to a program or sub-

the program or subroutine what to

work with and how

to proceed.

A type of variable that holds an ordered list of data.

autovivification.

being used

as if

it

When

something comes into existence automatically

just

by

has always been there. For example:

my $foo; $foo->{bar} ='baz';

In the second statement, $foo

is

did not, an anonymous hash

automatically created and stored in $foo so that

is

used as

if it

holds a reference to a hash. Since

it

we

can assign a key and value in that hash. backreference.

Used within

a regular expression, a backreference

is

a special

sequence consisting of a backslash followed by an integer that stands for the text

matched by a preceding

set

of capturing parentheses of the same number.

342

More

bless.

like a

baptism

blessing

really,

is

the act of dubbing a reference as

belonging to a particular package, providing a basis for

Perl's

object-oriented capa-

bilities.

A structural segment of a

block.

by curly

that are delimited in Perl

A two-valued

Boolean.

program consisting of one or more statements

braces.

(usually true or false) property or variable. Perl does not

have Boolean variables, but

it

does evaluate some expressions (such

expressions) in a Boolean context. This

sion to

real value, the

its

expression

is

A byte

functionality

from a parent

)

and tell(

class, this is

in

which

it

command

Data

of

is

segments



tool.

usually a

called chunks.

that can be used as an object.

deeply

bound

to the lexical environ-

explicitly passed to a

to build or create

used to create and return an object

anonymous

arrays

is

and hashes can be

program and/or

something

called a constructor.

program when

it

is

its

files.

A class

else.

The

[

]

and

method

{ }

used to

called constructors (or composers).

degree of interdependency

functions) within a

GLOSSARY

all

line

Something used

The

or

programming

Joining two or more things, usually strings or

concatenation.

coupling.

some

was generated.

invoked from the

create



and methods

subroutine that

command line argument.

constructor.

(often the beginning

literate

are presented in relatively small

that defines data

An anonymous

closure.

file

class.

paragraph or two, or a handful of lines of code

A module

functions, a position in a

a class that inherits

Hardcore technical jargon from the noweb

Documentation and code

ment

)

set.

Also called a derived

class.

class.

concerned with only whether the

an 8-bit piece of binary data, often representing

is

a character in the ASCII character

chunk.

while Perl evaluates the expres-

another position in the

relative to

or the previous position).

its

itself is

As used with the seek(

measured in bytes

child

that,

true or false.

te offset. file

condition

means

as conditional

among components

(data structures,

modules.

343

De Volkswagen

debug.

Beetle

sometimes called de bug. In programming,

is

Debugging

errors in syntax or logic are referred to as bugs.

the task of isolating

is

the problems and fixing the code. Telling the compiler/interpreter about something

declure/declardtion.

— Something—

rather than telling

a variable or subroutine

delimiter.

ning and end of something

To undo

A

dispatch table. (i.e.,

do something

else,

such

that

with a statement).

demarks the begin-

as a record, a field, or a string.

The quotation

Perl are that string's delimiters.

the reference in order to follow

the corresponding key

is

it

to

where

it

points.

used.

Wrapping up data and functions

encapsulation.



(as

as

hash table of subroutine references that can be called upon

when

dispatched)

to

usually a character or string

marks you use around a string in dereference.

it

—such

(or

methods) into a neat

little

package that has a simple interface compared with the code that actually does the

work.

The

evaluate.

act of

Any

expression.

computing the

literal,

result

of an expression.

variable, subroutine, operator, or

combination of these

that evaluates to a value.

flag.

A marker or switch

depending on

value.

A named piece

function. someplace if

its

that can be set to cause different actions to take place

else (or

of code defined in one place that can be called from

many places)

the code in question

isn't

in the

used for

its

program. Also called subroutines, especially return value.

hash.

A type of variable that holds an

heap.

A data structure that dynamically maintains a partial ordering of the data

it

unordered

list

of key/value

pairs.

contains.

here-document (here-doc). inheritance.

input.

344

multi-line blocks of text.

In object-oriented programming, acquiring characteristics (data

and/or methods) from a parent initialize.

A form of quoting large,

The

act of giving

Data that comes into

class.

an a

initial

value to a variable.

program or subroutine.

GLOSSARY

Replacing a variable with

interpolate.

its

value within double-quoted strings.

In terms of programs, an interpreter reads a program (or a compiled

interpret.

form of a program) and executes the statements contained ating strings, backslash interpretation

of characters

sequentially step through a

In a hash, the key

key.

Any

keyword.

the act of reading certain special sequences

standing for some other (usually

as

To

iterate.

is

named language

is

what you use

built-in function

list

printable) character.

of values (or a

to look

name

non

up

of key/value

pairs).

values.

(such as print, index, open) or other

A

method

or style of

emphasis on source code documentation. Various tools

programming exist to assist

that places

an author in

more human readable programs.

writing

A function that

method.

A flow of control

loop.

A package,

module.

and methods)

The

construct which can cause a statement or series of state-

some number of times.

defined in

to a

its

program

own

file,

that uses

that provides data or functions (or

it.

often represented by one or

more

special symbols.

data that comes out of a program or function.

A namespace where you can define variables

package. interfere

part of a class.

A built-in function,

Operator.

output.

is

a block) to be repeated

(i.e.,

objects

list

construct (such as while or if) in the Perl language.

literate programming (LP).

ments

therein. In terms of cre-

and functions

that won't

with the main:: program's namespace or other packages/modules that

it

uses.

parameter.

parent

See argument.

A class,

class.

functionality to

POD. programs,

The

its

standard

POD

The

from a parent

ferent shapes

GLOSSARY

markup syntax

for

embedding documentation within

Perl

stands for plain old documentation.

polymorphism. inherited

usually designed to be general in form, that provides basic

children classes.

(i.e.,

ability to class.

be different. Child classes can redefine behaviors

Thus, a parent

class

can have children of

many

dif-

they are polymorphic).

345

A

quantifier.

regular expression term referring to the special symbols that

denote some number (possibly zero) of occurrences of the previous character or subexpression.

A

queue.

data structure providing a

rather like a line at the

tion,

something in terms of

then recursion should be thought of as a

means

recursively

to

first

(FIFO) processing order,

itself

makes

spiral definition.

rigidly define at least

nism that can be repeatedly applied

one

for a circular defini-

Defining something

special case

to turn other cases into

and then a mecha-

one of the

special cases.

A pointer to the real data. Or the address where some piece of data

reference.

is

memory.

stored in

regular expression. a variety

first-out

DMV, except you'll want to process your queue faster than that.

If defining

recursion.

first-in,

A way of specifying a set of strings

of pattern-describing symbols and

The

final resulting

satellite data.

The

organization of chunks of data a key.

value of an expression or function.

A

type of value that

string, or a single character.

The

scope.

is

is

often based

on

just a small

The remaining chunk of data may be

to as the satellite data for that key.

scalar.

by using

text.

return value.

segment or piece of data called

(or substrings)

referred

.

singular in nature, like a single number, a single

A type of variable in Perl that holds a scalar value.

range or limits within which a variable

is

active

(i.e.,

exists).

Also

referred to as the visibility of the variable.

sentinel value.

when

A value

used

as a flag

or switch that can be used to determine

a process or loop should stop iterating.

Common name for the

shebang. first line

of a

Perl script



#

!

character pair, also called

or shell script and

contains the path to the executable interpreter

slice.

Stack.

A

many is

data structure providing

of dishes next to your

sink.

bang.

other interpreted scripts

The



that

often referred to as the shebang line.

(possibly non-contiguous) subset of a

A

pound

list

first-in last-out

of values. processing order, like a stack

Presumably they were stacked there one

at a

time on

top of each other, and you will then wash them one at a time starting from the top

346

GLOSSARY

of the

pile. If

you

something more

are like

like a

Standard input.

STDIN

file

me

heap or

The

however, your dishwashing structure

may

resemble

just a pile.

input stream for a program, accessible in Perl via the

handle.

Standard output.

The output stream

for a program, accessible in Perl via the

STDOUT file handle. Statement.

compare two

Telling the

computer

to

subroutine.

add two numbers,

possibly

empty

(e.g.,

the null string).

See function.

A

subscript.

as

values, or print a string.

A sequence of character data,

String.

do something, such

syntactic construct for accessing or referring to elements of an

array or hash. $array[2] uses a subscript of 2 to refer the value stored in the third

element of the array (subscripts count from

key to

refer to the value stored

substitution operator.

zero).

$hash{key} uses a subscript of

under that key in the hash.

Like the match operator but also able to replace matched

portions of the target string with replacement text.

syntax.

How various symbols can be legitimately put together in a given language.

tangle.

In literate programming, this

the

program code into a format ready

set

In literate programming, this

is

program documentation including any

the source

the process of extracting and assembling

compiled and/or interpreted.

A named location of memory where data may stored and retrieved by name.

variable.

weave.

is

to be

the process of creating the formal typecross reference information specified in

file.

GLOSSARY

347

1

Symbols !

operator 65

-> operator 145

$ARGV

@_

@ARGV

II

operator 89

123, 128

ARGV arrays 5

100

$"

71,336

-a

access

'

abstract data structures

102

heap 316

$.

101,336

linked

$/

102,336



'

list

301

,

69 69

associative 71

309

elements 70

queues 307

interpolation 71

336

stacks

301 multi-dimensional 142,

ActiveState 10

$1 110

algorithm 6

+ operator

327

147 nested 142

running time 286, 290

71

subscripts

searching 281

^

69

swapping elements 70

operator 65

** operator

accessor function

slices

% operator 66 && operator 89

.

111,337

assignment 69

$_ 53,88,101,104,111,336 $0 336

*

1

335

$,

$AW

103,

arithmatic operators 65

$ 71, 102,336 53, 54,

51, 102, 103,

111,336

operator 52, 89

operator 104, 198

$!

111,336

103,

alias

88, 128

arrow operator 145 anchor 106, 193

65 65

assembly language 6

$ 106, 193

operator 66

assignment 64

\A 194

7

associative array (see hash) ..

operator 94

\B 107, 194 autovivification 173

operator 65 operator 218

\b 106, 193

103

\Z 194

/

= 64,

47

=> operator 72

43,53

B

^ 106, 193

266

= operator 47, 5 = vs.

\G 194

1

,

64

\B (non-word

and operator 89

anonymous array anonymous hash arguments 123

348

boundary) 107 1 1

56 56,

\b

296

106

backreference 109, 195, 197 backslash escape 62, 105,

1

12

1

backslash interpretation 62

backslash operator 75, 143

D

E

\D 106

-e

-d 335

each 73

backstepping 188

227

backtics

\d 43, 106

backtracking 197 binary 340

DATA

binary search 282

114

blocks 79

60

numeric 61

%ENV eof

data types 60 list

data

scalar data

212

child class

307

cmp

332

exec

226

commands 265

exit

120

D

exponentiation operator 65

265

267

@EXPORT 275 @EXPORT_OK

266

q 264

expression 43, 65

R

external

s

X

267

command 227

F

264

debugging 257-268

154

the perl debugger

-F 335

262

@F

337

factorial

operator 218

debugging steps 262

135

false values

code 243

defined 101

FAQ

comments 24-28

design 33, 35

pseudo-code 36

development cycle 33 coding 40-45

structure

20-21

command line options

171,

design 35-40

maintenance 47-49 specification

34-35

file

99 handle 52,99-103

File::Basename 238

filename 99 filetest

concatenation 66

die 53, 100

concatenation operator 66

documentation 160

continue 85, 87 control statements 36

POD

I

Fibonacci 154

file

239 complement 212

80,90

33

faqgrep 49, 230

debugging 45-47

19-31

style

276

264

228,251

closures 153,

Eratosthenes, sieve of 93 error 53

watch expression 265

53

closedir

60-63

evaluated 91

1

chmod 332 chomp 42 chown 332 close

j

invoking 262

^

337

eval

chaining functions 221

characters

84

1 1

debugger 262, 264 c

105,188

, ,

Date::Manip 242, 278

333 CGI.pm 249 character class

.

eq 43

67-68

bubble sort 284

-c 45,

114

see infinite loop

strings 61

Boolean 94

c

47

endless loop

literal

296

;

_END_

data

bless

:

enclosing 105

114,337

binding 104

binding operator 43, 53

334

elsif

_DATA_

,

161-164

source code 164

operators

229

for loop 85, 101

foreach 52, 88

fork 332 fiinction

cos

278

CPAN

documents 12

235

dot 66

see subroutine

122

double-quoted strings 61

INDEX

349

input record separator 102

LWP::Simple 243

install

194

/g

loop variable 88

modules 236

Getopt

Perl 10

Long 241 Std 239 GIFgraph 244 global variables

41

int

-M 335

interpolation 42,

62

interpreted vs. compiled 9

125,127

lO Socket 331

grep 217

machine language 6 maintenance 33

275, 299, 308

@ISA

m// 43, 103 machine code 6

map 216

H h2xs 277

markup language 161 match operator 43, 103,

lom 114

,

203

hash 71

adding elements 72 deleting elements

/g 113,

key 71

keys 73, 118

key/value pairs 72

Knuth, Donald 165

73

72

values

_

heap 286

last

287

array representation as a class

/s

106,204

/x

205

\A 204 \Z 204

87

and context 206

lexical variables

hexidecimal 61,340

high

205

length 116

316

heap property 287 here-document 118

level

/o

language 12,49

:

167

linear search

282

list

context 74

list

separator 71

matrilineal

mixed

334

/i

203

if

38. 80

"

as

"

'

programming (LP) 36, 164-170

getting

example 166

installing

else 38, elsif

weaving 165 -

local

loop 36,84-89

infinite

for 85,

43

see endless loop

101

fo reach 52,

loop

84

.

infinite

input operator 65, 101, 102

until 41

context 100

-n

334

ne (not equal) 85 next 87, 118

88

indeterminate 84

inheritance 275, 299

89-91

\n 62

determinate 87

index 208

31,63, 124, 125

N

lookbehind 198

337

using 237

my

125, 126

lookahead 198

39,84

indicator

-

logical operators 43,

82

50, 237, 272,

236

purpose 278

tangled 165

import 274

@INC

235

h2xs 274

:

statement

modifier 92

149

274

creating

:

V

structures

mkdir 332 modules 272

literate

-i

286

data 60, 61

literal I

Math::Trig 278

124

line directive

language 7

190

/m 204

nested 149 slices

105

/i

K

73

84

not operator 89 notangle 166

noweb code chunks 165

while 41, 84

INDEX 350

1

continued chunks 166

precedence 65, 89

documentation

print

chunks 165

hne

168

numeric data 61

lookbehind 198

68

priority

directive

queue 320

shortcuts 191 relational operators 81

pseudo-code 36

rename 332 repetition

66

i

repetition operator

Q

require reverse

qw 68

object-oriented

programming 293 classes

294

295

polymorphism 293, 295

340

s \s

106

redo 87 references 74, 134,

53

142-158

anonymous arrays 143 anonymous hashes 143

opendir 228, 251

^

autovivification

146

or operator 43, 52, 89 ,

1

02

dereferencing 144, 158

74

scalar context

Schwartzian Transform 221, 222, 251

scope 79, 123-127

creating 143, 158

output record

63

scalar 41, 60,

assignment 75 operator 185

output field separator 7 1

1

recursion 135, 154 sill

open 52,99, 100 pipe 227

223

rmdir 332

running Perl

228,251

readdir

inheritance 293,

:

range operator 94

example 321

octal 61,

Norman 165

rand 41

encapsulation 293, 294

.

rindex 208

Ramsey,

constructor 295

275 223

scalar context

R

293

66

context 223

list

abstraction

,

-.'t

prototypes 137

push 71

205

.

108

quantifiers

o /o

1

and subroutines 125 searching 281

146

implicit

separator 102

seek 173, 175 to a variable

75

shebang

1

regex 103

P

shift

71

sieve

of Eratosthenes 93

see regular

-p

334

expressions 103

package 273, 295

%SIG 337

regular expressions 43, 103,

184-201

parameters 127

278

sin

slices

and

references

1

34

alternation 107, 185 sort

pattern 103, 104 Perl 29, 45, 207,

POD

49,

anchor 106, 193

242

and prime numbers 196

70

217

binary search 282 linear search

161-164

backreference 109, 195 sorting

formatting tags 162

backstepping 188

minimal pod-page 163

capturing 109, 195

pod2html 162

character class 188

pod2latex 162

concatenation 185

structural tags

1

62

dot

(.)

283-291

bubble sort 284

heap

sort

insertion sort

283

283

106, 185

pop 71 pqtangle 170

INDEX

33

greedy quantifiers 191 split

paragraphs 163

286

selection sort specification

verbatim

282

113, 116

grouping 185, 195 sqrt iteration

185

lookahead 198

95

square root 95

351

standard error 99

declaring 63

'

standard input 99

165-168

tangle

standard output 99

interpolation

.

;

tell

statements 79, 99

STDERR STDIN

top-down design 35 tr// 211

42, 52, 65, 99, 99,

173 list

99

STDOUT

hash 71-73

337 /c

212

/d

212

337

strict

29-31,40, 127

string

number

Is

naming 21-23, 63 scalar 60,

212

vars

286

true values 80,

90

63

60

127,275

SVERSION

conversions 67 strings

loop 88

types

tree structure

62

assignment 42

275

61 truth table 90

W

u

\W

double-quoted 61 single-quoted 62

sub 122 subroutine 122-123

and context 130

106

-w 29-31,67

undef 101

\w 106

90

unless

arguments 123, 128

unlink 332

chaining 221

unshift 71

Wall, Larry 7

wantarray 130

weaving 165, 168 closures 153

;

until

90 while 101

defining 122

names 133

-V

103,112,207 It

-

.

73

variables

assignment to 209

352

335

values

207 208

system 226

continue 85

-V 335

substitution operator 53,

substr

and input operator 101

V

invoking 122

41

..

X X operator 66

array 51, 69-71

assignment 64

.

INDEX

VilPiliitf |No 1019999 03896 770

/^'"'^'^

the.Ubrary Sale of this m«.«s «k oenef lis .

Bosion Public Library

COPLEY SQUARE GENERAL LIBRARY The Date Due Card in the pocket indicates the date on or before which this book should be returned to the Library, Please do not remove cards from this pocket.

www.mdnning.com/johnson

PROGRAMMING LANGUAGES,

PERL ...

straightforward enough for use by

the casual reader but complete

Elements of

Programming with

to stand alone as

Dg^ w- 1

YkX

an excellent

enough

first

learning tool."

—Jim Esten

I

Lead Developer

L Johnson

Andrew

WebDynamic "I

H

ere's a

complete introduction to programming using

written so

Perl,

their first

it's

accessible to those learning Perl as



found myself saying, Aha

THAT'S what

damn

that

"

doc was trying

Perl

so

obscvire

to say!'

—Brad Fenwick

programming language.

Software Engineer

With examples ranging from a useful utility to FAQs to a web client for tracking and charting

stock quotes to

an object-oriented student grading system,

book

a practical,

this

search the Perl

offers

you

hands-on approach to learning programming the

Xview Solutions "...

very useful to a

Perl

newcomer

to

both

and programming."

—Randy Kobes

Perl way.

Professor of Physics

What's inside

The

and design

and the software development

®

Style



Full descriptions of Perl's data types, variables, operators,

and control *

issues

University of Winnipeg

cycle

"Johnson this

structures

very

clear,

and that

book apart from the



In-depth coverage of Perl's regular expressions

is

sets

others."

Patrick Gardella

Vice President *

References and nested data structures

®

Documentation and

*

Text and

list

Literate

Whetstone Logic,

Inc.

Programming "...

manipulation



Debugging techniques

®

Using and writing modules



Object-oriented programming and abstract data structures

book I've seen so far that programming through Perl.

the only

teaches

I would certainly recommend newcomer to programming."

it

to a

—Brad Murray Senior Software Analyst

Andrew rience.

L.

Johnson has

He

more than

1

5 years

of programming expe-

Alcatel

Canada

has published articles in scientific journals and the

Linux Journal, and has taught and tutored students

in

both

beginning and advanced programming topics. Andrew has a master's degree in anthropology.

Author responds on the Web to questions from our readers -V-

Squree code available online

53495 9 '781884"777806

MANNING

$34.95 US/$50.95 Canada

ISBN l-flflM777-flD-S