Computer Applications in Chemistry 9789350243114


248 116 38MB

English Pages 405 Year 2009

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Computer Applications in Chemistry
 9789350243114

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

COMPUTER APPLICATIONS IN CHEMISTRY

"This page is Intentionally Left Blank"

"

\

GJIimalaya GJlublishing GJIouse MUMBAI. DELHI. NAGPUR. BANGALORE. HYDERABAD

©

Authors: No part of ~his book shall be reproduced, reprinted or translated for any pur~se whatsoever without prior permission of the Publisher in writing.

ISBN

: 978-93-5024-311-4

REVISED EDITION: 2010

Published by

Mrs. Meena Pandey for HIMALAYA PUBLISHING HOUSE "Ramdoot", Dr. Bhalerao Marg, Girgaori, Mumbai - 400 004. Phones: 2386 01 70 I 2386 38 63, Fax: 022-2387 71 78 Email: [email protected] Website: www.himpub.com

Branch Offices Delhi

Nagpur

Bangalore

Hyderabad

Printed by

"Pooja Apartments", 4-B, Murari Lal Street, Ansari Road,-Darya Ganj, New Delhi - 110 002. Phone: 2327 03 92, Fax: 011-23256286 Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440018. Phone: '272 12 16, Telefax: 0712-272 12 15 No. 16/1 (Old 1211), 1st Floor, Next to Hotel Highlands, Madhava Nagar, Race Course Road, Bangalore - 560 OOL Phones: 2281541, 2385461, Telefax: 080-2286611 No. 2-2-1167/2H, 1st Floor, Near Railway Bridge, Tilak Nagar, Main Road, Hyderabad - 500044. Phone: 55501745, Fax: 040-27560041 Globe offset, New Delhi

LEGENDS FOR FIGURES' Fig. 1.1

Integrated Circuit - 2

Fig. 1.2 Fig. 1.3 Fig. 1.4 Fig. 1.5 Fig. 1.6 Fig. 1.7 Fig. 1.8 Fig. 1.9 Fig. 1.10 Fig. 1.11 Fig. 1.12 Fig. 1.13 Fig. 1.14 Fig. 1.15 Fig. 1.16 Fig. 1.17 Fig. 1.18 Fig. 1.19 Fig. 1.20 Fig. 1.21 Fig. 2.1.1 Fig. 2.2.1 Fig. 2.2.2 Fig. 2.2.3 Fig. 2.2.4 Fig. 2.2.5 Fig. 2.2.6 Fig. 2.2.7 Fig. 2.2.8 Fig. 2.2.9 Fig. 2.3.1 Fig. 2.6.1 Fig. 2.6.2 Fig. 2.6.3

Micro Processor - 3 3 Yz" Floppy Disk - 8 Hard Disk- 8 CD-ROM-9 Scanner - 10 Visual Display Unit (a) Cathode Ray Tube (b) Flat Panel (c) Laptop with LCD Monitor - II Representation of Pixels (640 x 480) on the VDU in Graphics Mode - 11 Mouse - 11 Laser Printer (a) Mono (b) Colour - 12 LCD Projector - 12 Modem - 12 Mother Board - 14 Desktop Icons in Windows OS - 16 Object in Microsoft Power Point - 17 Microsoft Excel - 18 SPSS - 19 Neural Networks -19 Data Acquisition with UNICAM Spectrophotometer - 20 MA TLAB - 20 Microsoft Word - 21 Memory Allocations for Variables - 32 Flow Chart Symbol of IF - 45 Absolute Value of a Variable - 46 IF Statement with Null Process - 46 Range of a Variable - 47 Comparison of Two Numbers - 49 Nested DO Loop - 50 Multilevel Nesting - 51 Expert System Approach for Comparison of Numbers - 52 IF Statements using Logical Variables - 54 DO Statement for Summation of Three Numbers - 61 Execution Profile in a Function Subroutine - 104 Correspondence of Variables of Mainline and Function Subroutine - 109 Program Execution with Subroutines - 113

Fig. 3.1 Fig. 3.2 Fig. 4.1 Fig. 4.2 Fig. 4.3 Fig. 4.4 Fig. 4.5 Fig. 5.1 Fig. 5.2 Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. Fig. Fig. Fig.

6.5 7.1 7.2 7.3

Fig. Fig. Fig. Fig. Fig. Fig.

7.4 7.5 8.1 8.2 8.3 8.4

Fig. 8.5

Fig.8.6

Fig. 9.1 Fig. 9.2

Geometric Representation of Two Vectors - 128 Minimum of Two Numbers - 144 Roots of a Quadratic Equation - 164 Calculation of pH in a Titration of Strong Acid vs. Strong Base - 173 Real Roots pf Quadratic Equation - 176 Flow Chart for QUAD4.FOR Exhaustive Testing of Coefficients of Quadratic Equation - 182 Profiles of Functions and Derivatives - 191 Traces of Functions with the Independent Variable x - 193 Linear Interpolation - 214 Choice of Data Points for Quadratic Interpolation - 215 Errors Due to Piece Wise Linear Interpolation of a Non-linear Function - 215 Zooming Effect 00 the Path of Pathological Function in Different X Ranges - 216 (a) Smooth [3.12 to 3.16] (b) Valley [3.135 to 3.145] (c) Breakdown [3.141 to 3.142] Effect of Data ~ange on the Interpolated value - 217 Chromatographic Elution Profile - 234 Area under the Curve of y=x 2 - 235 Integration by Trapezoidal Rule ofa Function - 237 (a) Non-linear (b) Linear (c) Constant Effect of Step Size on the Error in Area by Trapezoidal Rule - 238 Profiles of Test Functions and Areas in the Range 0 and 1 - 239-240 ColIinear Vectors - 256 Eigen ElIipse for Non-orthogonal Vectors - 257 Eigen Vectors for Orthogonal Vectors - 258 (a) Absorption Spectra (b) Variation , of Absorbance with pH for Methyl Red - 259 ChloraniJinic Acid - 260 (a) 3D Surface ofChloranilic J\cid. (b) Contour Plot (c) Spec~ra at Different Concentrations (d) Spectra at Different Wavelengths Hf-Chloranilic Acid - 261 (a) 3D Surface ofHf-chloranilic Complex (b) Contour Plot (c) Spectra at Different Concentrations (d) Spectra at Different Wavelengths Demonstration of Precision and Accuracy - 269 (a) Replicate Measurements of Weight ofa Substance - 270 (b) Potentiometric Titration (c) Conductometric Titration (d) ~bsorption Spectrum

Fig. 9.3

Statistical Distributions - 271 (a) uniform distribution (b) normal distribution (c) log normal distribution (d) exponential distribution (e) t-distributions (f) f-distribution (g) X2 -distribution (h) normal distribution with different means (i) normal distribution with different variances

Fig. 9.4

Skewness is (a) Greater than zero (b) Less than zero (c) Zero - 278

Fig. 9.5

Kurtosis (a) Leptokurtic (b) Mesokurtic (c) Platykurtic - 280

Fig. 9.6

Confidence Interval for Univariate Data - 282

Fig. 9.7

Rejection and Acceptance Regions for Standard Normal (z) Variable - 283,284

Fig. 9.8

Calling Sequence of Subprograms in UNIV AR.FOR - 288

Fig. 9.9

Box Plots (a) With and (b) Without an Outlier - 293

Fig.9.10 Comparison of Means of Two Samples - 297 Fig. 10.1 Pictorial Representation of Correlation Coefficient with Typical Data Sets - 315, 316 Fig. 10.2 Effect of Data Range on Correlation Coefficient - 317 (a) Smooth Function (b) Transcendental Functions Fig. 10.3 Modular Development of Least Squares Algorithm - 320 Fig. 10.4 Least Squares Fit and Residuals of a Simulated Data Set - 322 Fig. 10.5 Least Squares Fit and Residuals ofa Data Set with Noise - 323 Fig. 10.6 Least Squares Fit of Kinetic Data Set and its Residuals - 329· Fig. 10.7 Least Squares Fit of Spectrophotometric Data - 330 Fig. 10.8 Hammett Model for Variation of Rate Data - 331 Fig. 10.9 Linear and Quadratic Fit of Weakly Quadratic Data - 334 Fig. 10.10 Linear and Quadratic Fit of Strongly Quadratic Data - 335 Fig. 10.11 Polynomial Fit of Log Activity with 1t - 337 Fig. 10.12 Effect of Position of Outlier on Ligand LMS Regression Lines and the Residuals in LMS Fit342,343,344 Fig. 11.1 Optimization of Numerical Factor Space in Experimental Design - 356 Fig; 11.2 Two Factor-FuFD with (a) 2-level, (b) 3-level, (c) 4-level, (d) 5-level- 358 Fig. 11.3 Response of2 factor-2 level Simplex (a) Factor space (b) Response with Simultaneous Variation ofFl and F2 (c) 3-D Response Surface (d) 2D- Contour of (c) - 359 Fig. 11.4 Central Composite Designs - 360, 361 Fig. 11.5 Non-Central Composite Design - 361 Fig. 11.6 Coordinates and Geometries of a 3-factor -3-level Design - 363, 364 Fig. 11.7 Factor Space (a) I-factor (b) 2-factor and (c) 3-factor Simplices Response profiles of (d) I-factor and (e) 2-factor designs - 370 Fig. 11.8 Progress of (a) Factor Value and (b) Response in One-factor Simplex - 372 Fig. 11.9 Progress of2-factor Simplex - 374 Fig. 11.10 Oscillation of2-factor Simplex - 374

"This page is Intentionally Left Blank"

CONTENTS

,

Chapter 1 Hardware and Software ..................................................................... 1 - 26 1.1 Hardware

Chapter 2 FORTRAN Statements ................................................................... 27 -125 2.1 Sequence ..,. Assignment & Replacement ..,. Precision in Arithmetic Operations 2.2 Transfer Control Statements ..,. Conditional IF THEN ELSE, Logical IF ..,. Unconditional GOTO 2.3 DO Statement 2.4 Input 10utput ..,. READ/ WRITE Statement ..,. List Directed I/O ..,. Formatted I/O Input/Output through external files ..,. Files ~ Sequential & Random Access ~ Formatted & Unformatted ..,.

Statements ~ OPEN, CLOSE ~ BACKSPACE, REWIND ~ INQUIRE

2.5 Dimension ..,. Type Statement ~ REAL, INTEGER 2.6 Subprogllam ..,. Intrinsic/Library function ..,. SUBROUTINE 2.7 DATA 2.8 STOP, PARAMETER

Chapter 3 Software Method Base ................................................................ 126 -162 3.1 Matrix Operations 3.2 Sorting 3.3 Permutations and Combinations

Chapter 4 Roots of an Equation .............. _................................................... 163 -189 4.1 Quadratic Equation .. Hydrogen Ion Concentration .. Strong Acid-base Titration 4.2 Roots of Cubic Equation .. Weak Mono-protic Acid .. Van der Waal's Equation

Chapter 5 Optimization ...................................................:........................... 190 - 212 5.1 Minimization Algorithm .. Taylor's Infinite Series .. Gauss Newton Method .. Newton Raphson Method 5.2 Gauss-Newton Method for Two Variables Functions .. Simulation of Alkalimetric Titrations .. Distribution of Iodine between Water and Organic Solvent

Chapter 6 Numerical Interpolation .............................................................. 213 - 232 6.1 Linear and Quadratic Methods 6.2 Lagrange and Modified Procedures

Chapter 7 Numerical Integration .................................................................. 233 - 253 7.1 7.2 7.3 7.4

Trapezoidal Rule Simpson's 1/3 Rule Gauss Legendre Quadrature Integration of Ordinary Differential Equations > Euler Method > Range Kutta Method



Chapter 8 E!gen Analysis ..................................................•........................... 254 - 263 8.1 Eigen Values 8.2 Eigen Vectors

Chapter 9 Univariate Analysis ...................................................................... 264 - 309

..

9.1 9.2 9.3 9.4 9.5 9.6

Errors Statistical Distributions Moments of Data Robust Methods Comparison of Univariate Data Sets. Analysis of Variance •

Chapter 10 Bivariate Analysis ...................................................................... 310 - 355 10.1 Covariance, Correlation Coefficient ,A Linear Least Squares )- First Order rate Equation )- Beer's Law )- Hammett Equation 10.2 Polynomial Regression 10.3 Robust Regression ,A Least Median Squares 10.4 Residual Analysis 2 ,A X test for Analysis ,A Exner and Ehreusen Parameters ,A Crystallographic R-test. 10.5 Multiple Linear Regression ,A Taft Equation ,A Hansch Model

Chapter 11 Experimental Design ...................................................•.............. 356 - 376 11.1 One Variable at a Time 11.2 Parallel designs )- Factorial Designs 11.3 Seque~tial Designs )- Simplex Design ,A ,A

For Further Reading Advanced Reading

References .................................................................................. 377 - 380 Appendices ..................................................................•.............. 381 - 392 Appendix Appendix Appendix Appendix Appendix Appendix Appendix

1: 2: 3: 4: 5: 6: 7:

8-bit ASCII Characters and their Decimal Values Object Oriented Representation of Hardware Z-table One-tail t-table Two-tail t-table F-table l-table

..

"

"This page is Intentionally Left Blank"

computer system comists of hardware and software. The hardware can be felt by touch and comists of physical components like monitor, keyboard, CDROM drjve etc. Software, on the other hand, performs several tasks but is invisible. It is like a song recorded on an audio recorder that is heard. The hardware and software technology is at a matured state and dealing it breadth- or depth-wise is a formidable task. An attempt is made, in this book, to introduce the vocabulary in a nutshell This will be usefol to choose the computer configuration and to reduce the communication gap during discussion with information technologist. (i

Origin of Computer . Charles Babbage and his sponsor Lady Augusta Lovelace showed that a machine could be programmed with a sequence of instructions producing results. Babbage depicted his analytical engine with 10 state-gear wheels on a paper. This concept of stored program was realised after 100 years, but Augusta is considered as the, world's first programmer. The number representation dates backs to ninth century. Leonardo Fibonacci introduced the first mechanical calculator in 1175. It was succeeded by analytical engine in sixteenth century and .the concept of automatic computer was introduced in around 1888. The first electronic digital computer ENIAC (Electronic Numerical Integrator And Calculator) was unveiled in 1946 at the university of Pennsylvania. It had 18000 vacuum tubes and weighed 30 tonnes, occupying a space of two garages. It failed for every seven minutes on average and costed about half a million dollars at 1946 prices.

2

J. J Hardware

COMPUTER Apl;'LlCATIONS IN CHEMISTRY

John Bardeen, Walter Brattin and WilJjam Shackley at Bel1 laboratories. developed the first transistor in 1948. After a decade, Texas Instruments mounted six transistors on the same substrate material (base) and were connected without wires. It was the birth of integrated circuit (IC) that revolution:i.sedthe computer industry. The Ies (Fig. 1.1) are classified as medium (MSI) to ultra large (ULSI) scale 'integrated circuits depending 'upon the \ number of transistors (Chart 1.1). Fig. 1.1: Integrated Circuit

Chart 1.1 Abbreviation

Actonym

Number of transistors

MSI

Medium Scale Integrated Circuit ·

102

LSI

Large Scale Integrated Circuit

103

VLSI

Very Large Scale Integrated Circuit

.105

ULSI

Ultra Large Integrated Circuit

106

The computer of 1950s was co~sidered to be a~ obedient servant and was used to relieve the drudgery of repetitive tasks. It resulted in the master-slave frame to explain the function of a computer. But the computer available today is equipped with multi-fold capabilities and intelligence.

1.1 HARDWARE , Everyone knows that a' computer contains a Pentium processor, mega bytes of RAM, giga bytes of hard disk and. multimedia. The information can be fed through a keyboard, voice- or video-recorder. The technical details are necessary in configuring a system, but the manufacturing details are not interesting. Scientific information that promoted the technology is highly involved and is worth knowing for research scientists only. Classification of Computers The computers have been classified under several categories based on mode of processing, size, memory, components employed, architecture, speed as • •

Analog, digital and hybrid computers Micro, mini,' work stations, main frame, super computers and super micros

• •

I, II, ill, IV, V generation computers Van Neumann machines, pipeline and parallel architecture

Analog Computer A computer is called analog computer if it measures' physical parameters like pressure and temperature as voltage in analog mode. Digital Computer In digital computers the input and processing is in digital mode. They are more accurate than analog computers.

1.1 Hardware

HARDWARE AND SOFTWARE

3

Hybrid Computer Computers having both analog and digital processes are called hybrid computers and are useful in process technology. They receive analog signals and convert the data into digital form. After numerical calculations, the digital output is converted into analog signal and mechanical operation is performed. Mainframe Computer .IBM 360 and 370 were sophisticated computing systems -of 1960s and occupy roomful of humming metal boxes, switches etc. Air conditioning and utmost care were prerequisites. Digital Electronic Corporation (DEC), International Computers Limited (TCL), Control Data Corporation (CDC) and Norton Data (ND) were other popular mainframe computer manufacturing companies. Upper case· text with a few special characters was the input through punch cards, magnetic tapes and hard disks. Hard copy, a print out with the same character set on a line printer, was t!le standard output. It was not possible to get even simple line graphs. Special Tectronics terminals were used for graphical display. A monitor program and Job Control Languag~ (JCL), later called operating system, was used to run the computer and the individual jobs. Microcom puter The microcomputer era brought renaissance in Electronic Data Processing (EDP). In 1971 Intel Corporation produced a microprocessor containing the central processing unit on a chip. It contained 2300 equivalent transistors with 640 bytes of memory, which is little more thaI} a single-spaced page. IBM personal computer (IBM PC) released in 1981 was due to the concerted efforts of different manufacturers listed in Table 1.1. The components of a microcomputer are Central Processing Unit, memory and input/output (UO) devices. Table 1.1: Manufacturers of Modules of IBM PC

Component

Manufacturer

Floppy drive

Tandon magnetics

Dot matrix printer

Epson

Monitor

Tatung

PC DOS

Microsoft

Central Processing Unit (CPU) Central Processing Unit (CPU) is the heart of the computer. It performs arithmetic operations (addition, subtraction, multiplication and division) and logical operations (OR, AND, NOT). These operations are essential for system programs to control the computer. The application programs written in high-level languages perform the operations on very large and small numbers. In microcomputers a microprocessor takes care of these tasks. Micro Processor A microprocessor is a large-scale integrated circuit (Fig. 1.2) on a silicon chip consisting of tens of thousands to millions of transistors. It is used in PCs and workstations as CPU. Micro Processors manufactured by different companies are listed in Chart 1.2.

Fig. 1.2: Micro Processor

4

1.1 Hardware

COMPUTER ApPLICATIONS IN CHEMISTRY

Chart 1.2 Manufacturer

Processor 8086,8088,80286,80386,

Intel

80486, Peritium, Cyrix 686

Zilog

Z80,ZSOOO

Motorola

68000,68020,68030

National Semiconductors

NS

All these chips have upward compatibility. It means a program developed for a lower end processor runs on all higher versions but not the other way. If a program developed for a higher end processor, rons even on all lower versions, it is referred as downward compatibility (for example, CYRIX 6X86 chip). The number of transistors Yt different micro processor chips of Intel (Table 1.2a) and systems using Motorola processors (Table 1.2b) are described below. Table 1.20 : No. of Transistors in Intel Processors .

Micro Processor

Number of Transistors

MIPS

8080

6,000

0.64

8086

29,000

0.33

80286

134,000

1.2

80368DX

275,000

6.0

1,250,000

20.0

1,185,000

16.5

80486DX ,

80486SX

* MIPs: Million Instructions per second Table 1.2b : Computer Systems with Motorola Micro Processors

Micro Processor

Number of Bits

6502

8

Speed (MHz)

1

64K

8

16MB

32 (data) 68000

16 (address)

Memory

System Apple II Macintosh HP laser jet printers

68020

32

16 to 33

4GB

MacintoshU

68030

32

20 to 50

4GB

Macintosh II

I

Coprocessor It is a hardware device dedicated for 'transcendentaVtrigonometric calculatiods, addition, subtraction, multiplication and division. It performs calculations concurrently with the regular CPU operations. The coprocessors 8087 and 80287 perform all calculations (Table 1.3) with 80-bit precision on seven different

data types: (a) three integets, (b) one decimal and (c) three floating point. 64-bit floating-point ·range extends to 10308 •

1.1 Hardware

5

HARDWARE AND SOFTWARE

Table 1.3 : Coprocessors for Intel Micro Processors

Processor 8086 80286 80386 80486 Pentium

Coprocessor 8087 80287 80387 Built in Built in

Today personal computers with Intel 486 or a lower processor has become obsolete. Pentium is comparable to 1988 Vintage CRA Y Y-MP super computer. Pentium II, III and IV are destined for high performance server and multimedia systems.

Array Processor A group of micro processors controlled by another CPU is called array processor. It is used to speed up 3D graphics, video or mathematical calculations.

Parallel Processor In parallel processor computers, several micro processors run simultaneously and another CPU coordinates the results. Here, Multiple Instruction Multiple Data (MIMD) and Singie Instruction Multiple Data (SIMD) type of computations are performed. Example: CRA Y X-MD 48 (four processors).

Workstation Workstations are used for high computational power, graphics and animation. The speed, memory and throughput are higher compared to PCs. A few typical systems are described in Table 1.4. Table 1.4 : Features of Few Workstations

Name IBM Intelli-station HP kayak XM 600 Professional 3D AGP 4X

Processor Pentium III Xeon (866 MHz) Pentium III 733 MHz (Dual processors) Pentium III

Memory 18.2 GB 1 GBRAM 32MB 48X IDE DVD-ROM

On-line System

An instrument connected to the compute~ or an instrument with micro processor inside is a computerised system. The system with in situ processing and real-time response is called on-line system.

Special Purpose Computers Dedicated computer systems developed for a specific application are called special purpose computers. Some special purpose computer systems and the fields of their applications are given in Table 1.5. Table 1.5 : Typical Special Purpose Computer Systems

System LISP Machine Database System CAD/CAM Machine

Application Artificial Intelligence Electronic Data Processing Engineering Design

LISP: LISt Pr()cessing; CAD: Computer Aided Design; CAM: Computer Aided Manufacturing

- - - - - _._------

6

1_1 Hardware

COMPUTER ApPLICATIONS IN CHEMISTRY

High Perfonnance Computing It is required for desktop vi

23

HARDWARE AND SOFTWARE

Input

Pseudo Code A=l, B=2, C=I

Flow Chart Symbol

(

(

1------

Print ).-



Output

0 Magnetic Tape

Print to File

1'--.._/

,

,.

Hard Disk

0

Connectors

0

off-page

>

Termination

End

(

)

Edit-Compile-Run Cycle Step I Text editors arc used to key in a data or Fortran program. Edit, Notepad, NE or NE2 are some of the popular editors. The sequence of user actions and display are as follows. At the DOS rrompt key in C:\> Edit followed by pressing the ENTER key. The display of editor screen appears. Now the program is keyed in. The contents are saved in a file named ASIGNOOl.FOR qS follows.

24

COMPUTER ApPLICATIONS IN CHEMISTRY

1.2 Software

Click the mouse at File and a pop-down menu appear~. Click at Save As and enter the tile name ASIGNOO1.FOR and press OK. By clicking on Exit in the file pop-down the control returns to DOS prompt. The contents of the file can be displayed on the monitor by the command. C:\MSF>Type ASIGNOO1.FOR

*

*

ASIGN001.FOR

*

951

ABSORB = 0.356 CONC = 0.000178 EPSI ~ ABSORB/CONC W~ITE (*,951)ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END

Step 2 Translation of a Fortran source code into an executable form cOlrprises of checking the code for syntax errors, conversion to a binary object module and linking with Fortran library, C:\MSF>FORI ASIGNOO1.FOR prompts

Object filename [asign001.OBJ]: Source listing [NUL.LST]: Object listing [NUL.COD]: If there are errors they are displayed as ***** Error 50,line 5 -- invalid symbol in expression Pass One 1 Errors Detected 12 Source Lines The editor is again invoked and the required correction (removing one "=" sign) is made. On giving the command sequence of FORI again

Pass 1 No errors detected 12 Source lines is displayed. Compiler develops intermediate tiles PASIBF.SYM and PASIBF.BIN C:\.t'v1SF> FOR2 Pass two No errors An object module with the name ASIGNOO1.obj is stored in the current directory and the message Code Area Size #OOAC ( 172) Cons Area Size = #0014 20) Data Area Size = #002C 44) Pass Two No Errors Detected is displayed on the monitor.

1.2 So/i\\"are

HARDWARE AND SOFTWARE

25

C:\MSF>LlNK8086 Microsoft 8086 Object Linker Version 3.D/' (C) Copyright Microsoft Corp 1983, 1984, 1985 Object 110dules [.OBJ]: ASIGN001 Run File [ASIGN001.EXE]: List File [NUL.MAP]: Libraries r .LIB] :

The file with an extension .EXE is assumed by default. Here one can choose another name. The path of Fortran library is needed and pressing the ENTER key assumes that they are in the default current directory. Now ASIGNOOl.EXE is stored on the disk.

Running an .EXE program ASSIGN OOl.FOR does not require any input and the result is displayed by either of the following command lines C: \MSF>ASIGNOOl C: \MSF>ASIGN001. EXE

gives results . . 356000000000000000 .000178000000000000 200D.000000000000000000 Stop - Program terminated.

PHHCL.EXE program requires numerical values of variables. They can be given through keyboard on demand or through a text file containing the data.

Procedure to run the program (PHHCL) At the DOS prompt Stepl C:

\>

Kcy in cd MSF-CAC & press ENTER key C: \MSF-CAC> Step 2 To execute the program PHHCL, key in PHHCL and press ENTER C:\MSF-CAC> PHHCL

Step 3 The program prompts Give cone of HCl

26

COMPUTER ApPLICATIONS IN CHEMISTRY

1.2 Software

You should enter 10 e-2 M as 1.0 e -02. Then another prompt is displayed as Give value pKw:



Enter 14.00 and the pH of the solution is outputted pH = 2.000 The above sequence can be viewed as user action and output on the screen (Table 1.13). Table 1.13

Step Step 0

User Action

Step 1 Step 2

cd MSF-CAC PHHCLI

Output on Screen C: \ .

0.02 14

C: \11SF-CAC> C: \MSF-CAC> PHHCLI Give HCI - can : Give HCI - can : 0.02 Gi "0: pkw : Give pkw :14 HCI - can = .2000E-Ol pKW := 14.00 PH := 1. 699

Batch File One or many DOS commands can be stored in a file with extension .bat. It is like a shell program in UNIX. For example, the compilation and execution of asignOOl.for can be saved as ASIGNl.BAT

* ASIGNl.BAT * * FORI ASIGNOOl.FOR; FOR2 LINK8086 ASIGNOOl,AISGNOOl"D:\F77-UGC\\ ASIGNOOI When ASIGNI is given at the DOS prompt, the result is obtained. C: MSF-CAC\> ASIGN1 .356000000000000000 .000178000000000000 2000.000000000000000000 Stop - Program terminated.

12.1 SEQUENCE STATEMENTS' Assignment and replacement statements belong to sequence statements. Their components are constant, variable and operator. The program execution starts from the top and each statement is executed sequentially up to END .

Constants Numerical constants are an integral part of chemical calculation and used to estimate several other values of chemical significance. Constants can be of different types based on their accuracy and variation with ambient conditions. The valucs of Avogadro number, Faraday, Ideal gas constant and atomic weight are known accurately. On the ' other hand, density, dielectric- and auto-ionization- constants are determined precisely, but they change with temperature. Similarly, precision of equilibrium and rate constants and extinction coefficient depends on the equipment, method of calculation and external factors . A constant is invariant in a compliter program. It may be literal,. numerical or logical. Literal constant is a • character or a string of characters used in arithmetic operation, keywords, continuation of a FORTRAN statement in more than one line, carriage control of a printer and input/output statements. The numerical constants are subdivided into integer, real and complex.

Integer Constant Integer constant is a whole number with or without sign including zero. The maximum value of an integer constant is not infinity in computer terminology. For 16- and 32- bit computers, it is 2 16 - 1 (or 32767) and 2 32 - I (or 2.147 x 10\ respectively. The plus sign is optional but minus sign is mandatory. Some examples are 5, -25675, +30675 . Zeros preceding the first significant digit (-DOn " are ignored. The number of ionizable protons in oxalic acid (2), number of electron change in the reduction of ceric to cerous ion (1), charge on sulfate (-2) and number of neutrons in carbon (6) are some examples of integer constants. The stoichiometric coefficients [-1, -2, 1] of the chemical reaction Cu(ll) + 2 en Cu(en)2 are also integcrs. The. coefficients are positive for products and negative for reactants. Some of Don'ts are given in Table 2.1.

28

COMPUTER ApPLICATIONS IN CHEMISTRY

2.1 Sequence Statements

..

Table 2 1, Don'ts of Integer

Invalid 21;475

Reason Comma

2M

M, units of concentration is a character

37A

A, a character

Real Constant A real constant is signed or unsigned number with a decimal point (Table 2.2). It is convel)ient to express these numbers in exponent notation. A constant 0.025 is normally expressed as 0.25 x 10. 1, I~ the exponent notation E replaces " x 10 " found in normalized form. The mantissa and exponent are the numbers before and after E, respectively Cfable 2.3). The value of a constant is calculated by multiplying the number preceding'E by 10 raised to the exponent. Table 2.2 : Real Constants in Chemistry

Real Constant

Parameter

0.025

Absorbance

0.0000472

Rate constant

0.0000 0000 00001

Ionic product of water

47632.365

Epsilon

800000.0

Beta

Table 2.3 : Real Constants in Different Forms

Constant

Normalized Form

Exponent Forn,t

0.025

0.25 x 10. 1 •

0.25 E-Ol

0.0000472

0.472 x 10-4

0.472 E-04

476.32365

0.47632~65 x 103

0.47632365 E03

Computer stores real constant of normalized form up to a fixed number of digits after decimal point. A real constant is stored up to eight and sixteen decimal places in single and double precision modes, respectively (Table 2.4). The rules for invalid constants are depicted in Chart 2.1.

Table 2.4 : Real Constants in Single and Double Precision Real Constant

Single Precision

Double Precision

0.025

0.2500 OOOOE-Ol

0.2500 0000 0000 OOOOE-O 1

0.123456789012

0.12345679EOO

0.1234 5678 9012 ooOOEOO

FORTRAN STATEMENTS

2.1 Sequence Statements

29

Chart 2.1 If Any character other than E or D present Then Invalid [0.1 x (10)-23] If More than one decimal point Then Invalid [6.0.3E+23] If More than three digits after E or D Then Invalid [ 1.00E6725] If Decimal point occurs after E or D Then Invalid [3.0 E 10.0]

Any character other Ihan + or - aner E or D Then Invalid [1.4E-I,0]

If

If Any character other than + or - before first digit Then Invalid [$1.2E+ 10]

Variable An alphanumeric character is an alphabet (A-Z) or numeral (0-9). A variable is an alphabet optionally followed by any alphanumeric character with a maximum length of six. Anv name whose length is greater than six characters is truncated. For example, EXTINCTION is truncated to EXTINC and ABSORBANCE to ABSORB. The choice of variable name is arbitrary. However, it is advisable to choose a relevant name so that anybody in a given discipline can easily comprehend its meaning. For a data set of wavelength versus absorbance, the variable names WAVEL and ABSORB are more appropriate than Zl and Z2. Similarly for volume versus pH data, PH and VOLUME are preferable to X23 or Z4671. Do's of VARIABLE ./ Keywords can be used as' -iable names

* * *

VAR001.FOR8 DO = 0.123456T890123456789 IF = 981 )54321 WRITE(*,*)DO,IF END

VAR001.FOR 1.234568E-001 987654321

~~---------

30

COMPUTER ApPLICATIONS IN CHEMISTRY

2.1 Sequence Statements

* *

VAR002.FOR

* STOP = -123 END

=

0879

REAL == 12.23 INTEGER DATA

=

=

2

122

DOUBLE PRECISION == 1.D-3 WRITE(*,*)STOP,END,REAL,INTEGER,

VAR002.FOR 879.0000000

-123.0000000

12.2300000 2

./ Library functions (SIN, COS, ALOG, EXP, SQRT, ATAN) can be used as variable names (VAR003.FOR)

* * *

VAR003.FOR DIMENSION SIN(200) DATA SIN(180)/180.0/,SIN(90)/34./ WRITE(*,*)SIN(180) ,SIN(90) END

Don'ts of VARIABLE ® Special characters not valid in variable names (V ARlOl.FOR)

* *

VAR101.FOR

* BETA' 1.0e8 N ? N +1 END

* Unrecognizable statement (Error 89, Lines 5,6) * The variable & constant are to be separated by

2.1 Sequence Statements

FORTRAN STATEMENTS

31

® Hyphen" -" is an invalid character (VAR I02.FOR)

* * *

VAR102.FOR NO'S '" 17 NMBE-NEMF A"2 2*A*C C**C END

2

4

6 9

® First character of variable name should not be an integer. Some typical invalid variables given in

VAR103.FOR are explained in Table 2.5.

*. * *

VAR103.FOR 3MKCl '" 2.9897 EXTINCTION-COEFFICIENT p-Cl methane'" 345.0 END

16900

Table 2.5 : Reasons for the Invalidity of Variables

Variable

Reason

3MKCl

The first character is a numeric figure

EXTINCTION-COEFFICIENT

Special character '-'

p-CI methane

Special character '-'

Integer Variable Integer variable name can be started with any of the characters I, J, K, L, M or N. Examples for valid integer variables are KVOLTS, NEXP, NP, NSAMP, LANDA etc. A variable name starting with A-H or o-z can be made an integer variable with the statement INTEGER WAVEL, OUT, EMF

Real Variable A real variable name starts with any character A through H or 0 to Z. A few valid real variable names are PH, ABSORB, EPSI, RATE, WAVEL, BETAS etc. By default, a variable st~rting with any character I to N is integer. It can be made real, if desired, by declaring it as REAL KW, K12, LANDA, MU

32

2.1 Seqllellce STatements

COMPUTER ApPLICATIONS IN CHEMISTRY

Assignment Statement ASSIGNMENT is an executable statement and performs three tasks, viz.: (1) calculation of the expression on the RHS of '=' sign, (2) converting the numerical value to the proper type and (3) storing it in the variable on LHS. It should not be before type, array specification and data statements. Some typical assignment statements are given in ASIGNOOl.FOR * * *

ASIGNOOl.FOR

ABSORB = 0.356 CONC = 0.000178 EPSI = ABSORB/CONC WRITE(*,951)ABSORB,CONC,EPSI 951 FORMAT (lX,F25.18) STOP END

Name Class

Type

Offset. P

ABSORB CONC EPSI

REAL REAL REAL

20 24

28

10 11

Name

Type

MAIN Pass One

Size

Class

PROGRAH No Errors Detected 11 Source Lines

Functioning of Assignment In the computer, memory allocations are numbered and the Ilumerical values are stored in them. The language compiler performs one-to-one correspondence hI" ,\ cen names (Chart 2.2) and sequence of memory allocations (Fig. 2.1.1). ABSORB CONt' EPSI ,

23

20

----I

1 - - - - 1

24

27

31

28

0.356

0.356

0.000178

0.356

0.000178

2000

Fig. 2.1.1 : Memory Allocations for Variables

Chart 2.2 Step 01 Step 01 Step 03

A numerical value of 0.356 is placed in the memory allocation starting with 20 and up to 23. A Value of O.1 78000E-3 is placed in the memory at 24 to 27. Now, the values of absorbance and concentration are available to CPU. ALU performs division and the value is placed in memory allocated for EPSI, i.e., 28 to 31.

FORTRAN STATEMENTS

2.1 Sequence Statements

33

Replacement Statement Replacement is an executable statement. The flow chart symbol, position and functions are same as those of ASSIGNMENT statement. The variable on LHS and one of the variables on RHS are same in REPLACEMENT statement. A listing of REPLOOl.FOR and its algorithms (Chart 2.3) follow.

* REPL001.FOR

* *

LANDA = 360 WR1TE(*,951)LANDA

LANDA

360

LANDA

370

LANDA = LANDA + 10 WR1TE(*,951) LANDA 951

FORMAT(' LANDA STOP 'REPL_01' END

=

',14)

Chart 2.3

Step 01

Integer constant 360 is placed in the memory locations (16 to 19) corresponding tu iht: variable LANDA. This is an assignment statement.

Step 02

WRITE statement displays the value of LANDA as LANDA = 360 according to the Format 951.

Step 03

Numerical value of LANDA(360) is added to integer constant 10 and the resulting value, 370, is placed in the same memory allocations 16 to 19. Now the old value (in Step 01) is replaced by the new value (Step 03). At this stage the value 360 is not available. Thus LANDA = LANDA + 10 is a REPLACEMENT statement.

Step 04

WRITE statement displays the current value of LANDA. LANDA = 370

Step 05

STOP statement terminates the execution of the program with the message. STOP - Program terminated.

The advantages of REPLACEMENT -statement are decreasing the number of variables and saving program memory.

Exchange of Values Stored in T~o Variables In the programs EXCOOl.FOR numerical values 370.0 and 0.35 are assigned wrongly to absorbance and wavelength, respectively.

34

2.1 Sequence Statements

COMPUTER ApPLICATIONS IN CHEMISTRY

* *

EXC001. FOR

*

951

ABSORB = 370. WAVEL = 0.35 WRITE(*, 951)WAVEL,ABSORB FORMAT(/' WAVE LENGTH = ',F10.3,5X, 'ABSORBANCE STOP END

[WAVE LENGTH = .350

',FlO. 3)

ABSORBANCE = 370.000

To correctly assign 370 to W A VEL and 0.35 to ABSORB, exchange of the contents of memory allocations (corresponding to ABSORB and WAVEL) is necessary. A set of algorithms (Chart 2.4) and corresponding FORTRAN program follow. Chart 2.4

ABSORB 370.0 0.35

Initial state Expected final state

WAVEL 0.35 370.0

Algorithm I : (XALO I) ABSORB = WA VEL 0.35 0.35 W A VEL = ABSORB 0.35 0.35 As the values of both ABSORB AND W AVEL are 0.35, it is an incorrect algorithm. Algorithm 2: (XAL02) Wf\VEL = ABSORB 370.0 370.0 370.0 ABSORB = WA VEL 370.0 The value of W AVEL is correct but ABSORB is wrong. It is also an incorrect algorithm. TEMP I TEMP 2 Algorithm 3: (XAL03) 370.0 TEMP I = ABSORB 370.0 0.35 0.35 TEMP2 = WAVEL 0.35 370.0 370.0 0.35 0.35 370.0 ABSORB = TEMP2 0.35 0.35 370.0 W AVEL = TEMPI 370.0 0.35 The expected result is possible using two temporary memory allocations (TEMP I, TEMP2) Algorithm 4: (XAL04) TEMPI = ABSORB ABSORB =WAVEL WA VEL =TEMPt

370.0 0.35 0.35

0.35 0.35 370.0

370.0 370.0 370.0

* * *

35

FORTRAN STATEMENTS

2. J Sequence Statements

EXCHANGE OF CONTENTS OF MEMORY ALLOCATIONS (VARIABLES) ABSORB WAVEL

370. 0.35

C

C

INCORRECT ALGORITHM (XAL01)

C

ABSORB = WAVEL WAVEL = ABSORB WRITE (*,9_01) WRITE(*,951) WAVEL,ABSORB

* * *

INCORRECT ALGORITHM (XAL02) 370. ABSORB WAVEL 0.35 ABSORB WAVEL ABSORB WAVEL WRITE (*,9,01) WRITE(*,951) WAVEL,ABSORB

* *

CORRECT ALGORITHM (XAL03)

* 370 .. ABSORB 0.35 WAVEL ABSORB TEMP1 WAVEL TEMP2 TEMP2 ABSORB TEMP1 WAVEL WRITE(*,902) WRITE(*,951) WAVEL,ABSORB

*

*

CORRECT ALGORITHM (XAL04)

. *

901 902 951

370. ABSORB 0.35 WAVEL ABSORB TEMP1 WAVEL ABSORB TEMP1 WAVEL WRITE(*,902) WRITE(*,951) WAVEL,ABSORB FORMAT ( / " INCORRECT ALGORITHM '.) FORMAT(/,lOX, 'CORRECT ALGORITHM ') FORMAT(' WAVE LENGTH = ',F10.3,5X, 'ABSORBANCE STOP END

',FlO .3)

36

2.1 Sequence Statements

COMPUTER ApPLICATIONS IN CHEMISTRY

INCORRECT ALGORITHM

=

WAVE LENGTH

.350

ABSORBANCE

.350

INCORRLCT ALGORITHM WAVE LENGTH

=

370.000

ABSORBANCE

370.000

CORRECT ALGORITHM WAVE LENGTH

=

370.000

ABSORBA"NCE

.350

CORRECT ALGORITHM WAVE LENGTH

=

370.000

ABSORBANCE

.350

Algorithms 3 and 4 achieve the final target. However algorithm 4 is optimum compared to algorithm 3 from the software engineering point of view, as the latter uses only one temporary variable (TEMPI). Although algorithms I and 2 do not have any syntactic errors, they output undesired results. Such errors are known as logical errors.

Expression An expression is formed by the combination of operands and operators. An operator is a constant, variable, function subprogram or an expression enclosed in a set of parentheses: The general form of expression is

[Operator] Operand [Operator Operand] [Operator Operand] An expression has no independent existence except in IF statement. It occurs on the RHS of '=' sign in assignment or replacement statements. There are three types of expressions, viz., arithmetic, relational and logical.

Arithmetic Expression Arithmetic expressions are formed when arithmetic operands and arithmetic operators are written in juxtaposition to each other. The arithmetic operators used in FORTRAN are +, -, t, *, ** where two stars in succession are considered as a single operator representing exponentiation. The simplest expression is a constant or variable [e.g., 3, X, Sin (0.5), ABS (-5)]. A unary or monadic operator has the form OPERATOR OPERAND the examples being -3, +X. An operator sandwiched between operands [OPERANDI OPERATOR OPERAND2] is a dyadic operator.

Example: NIP + NAP, SO

* SO, B - CF, SO ** 3, MOLWTINEC

Evaluation of a valid arithmetic expression results in a numerical constant. FORTRAN arithmetic operators, equivalents in human (algebraic) domain and their function are given in Chart 2.5.

37

FORTRAN STATEMENTS

2.1 Sequence Statements

Chart 2.5

Algebraic Operation

Arithmetic Domain

Computer Domain

A+B A-B AxB A.B AlB

A+"B A-B A*B

Addition Subtraction Multiplication Division

AlB

A

-

B

AB

Exponentiation

A * *B

Don'ts of ARITHMETIC EXPRESSION ® Two or more operators in succession are invalid PH = B + - CF VAR = SD *** 2 B = A ** -1 The correct forms of above expressions are PH = B - CF VAR SD ** 2 B = A ** (-1)

Addition and Subtraction The algebraic sum (addition or subtraction) of two operands involves comparison of exponents, shifting of mantissa, summation of mantissa and normalization to a prefixed number of digits. The algorithm and examples for addition and subtraction of two operands are given in Charts 2.6 and 2.7, respectively. In FORTRAN they are performed as binary operations. Chart 2.6

Step 1

Express the. operands in normalized form

Step 2

If

Step 3

Then

Exponents of the two operands are equal Mantissa are added & Exponent of the result is equal to that of augend/addend

If

Exponent of the two operands are not equal

Then

Mantissa of operand with lower exponent is expressed such that its exponent is equal to that of the other operand & Mantissa are added only to the prefixed number of digits such that its exponent is equal to that of the other operand & Exponent of the result is equal to that of operand with larger exponent

38

COMPUTER ApPLICATIONS IN CHEMISTRY

2.1 Sequence Statements

Chart 2.7 : Illustration of Addition/Subtraction

AUGEND= ADDEND= RESULT=

.123456800E-06 .876543200E-06 .100000000E-05

AUGEND= ADDEND= RESULT=

.123456800E+34 .876543200E+14 .123456800E+34

AUGEND= ADDEND= RESULT=

.123456800E+06 .876543200E+14 .876543200E+14

AUGEND= ADDEND= RESULT=

.123456800E+06 .210456800E+OO .123457000E+06

AUGEND= ADDEND= RESULT=

.123456800E+06 .543210800E+03 .124000000E+06

MINUND= SUBTRAHEND= DIFFERENCE=

.100000000E+36 .888888900E+05 .100000000E+36

MINUND= SUBTRAHEND= DIFFERENCE=

.100000000E+36 .888888900E+34 .911111200E+35

MINUND= SUBTRAHEND= DIFFERENCE=

.100000000E+36 .888888900E+30 .999991100E+35

MINUND= SUBTR.ZI.HEND= DIFFr:RENCE=

.100000000E+36 .888888900E+25 .100000000E+36

MINUND= SUBTRAHEND= DIFFERENCE=

.100000000E+36 .888888900E-06 .100000000E+36

Multiplication and Division These operations are common in all chemical computations. For example, normality of a solution (NOR) is the product of molarity (MOLAR) and number of ionizable protons (NIP). NOR = MOLAR

* NIP

2.1 Sequence Statements

FORTRAN STATEMENTS

39

Similarly, extinction coefficient (EPSI) is the ratio of absorbance (ABSORB) of a coloured compound to its molar concentration (CONC) (see ASIGNOOl.FOR). Division by zero results in infinity and overflow error is displayed in FORTRAN.

Exponentiation The variance (VAR) of an un grouped data from standard deviation (SD) can be calculated as shown in the following box.

101 VAR SD ** 2 102 VAR = SD ** 2.0 106 SD VAR ** 0.5 Thc statements 101 and 102 appear' to be same. But they are calculated by separate algorithms as given in Chart 2.8. Chart 2.8

11 Then If Then

Exponentiation symbol is succeeded by an integer constant/variable Expression is calculated as successive multiplication SD *SD Exponentiation symbol is succeeded by a real constant/variable Expression is calculated in logarithmic mode EXP(2.0 * ALOG(SD»

Obviously, the statement 106 is evaluated as only EXP(O.5 * ALOG(VAR)) ALOG1O (SD) is the FORTRAN equivalent to 10glO SD

Arithmetic Operations on Heterogeneous Constants/Variables When the two operands of the expression are not of the same type, i.e., one is an integer and the other real, evaluation is not trivial. Integer is considered as weaker or simpler object and the real as the stronger (Chart 2.9). The heuristic is to convert the weaker into the stronger. Thus, the result of the arithmetic operation is of the strong type. The result of arithmetic operations on integer (I), real (R) and complex (C) variables is given in Table 2.6. Chart 2.9

If Then If Then If Then If Then

Both operands are of the same type Arithmetic operation is performed directly Operands are of different type Convert weaker operand into the stronger & Perform arithmetic operation Variable on LHS and the constants on RHS are same Constant is assigned to the variable Variable on LHS and the constant on RHS are of different type Transform the constant to the variable type & Assign the constant to the variable

Let us see the calculation of TC (pi) value in real and integer mode. In real mode (TC = 22.017.0), the arithmetic operation results in 3.142857. On the other hand, execution of the statement TC = 2217 results in 3.0000 since the division is performed in the integer mode as 22 and 7 are integer constants. So, the fractional part of the result is truncated. The integer value 3 is to be assigned to a real variable pi.

40

2. J Sequence Statements

COMPUTER ApPLICATIONS IN CHEMISTRY

Hence integer 3 (fixed value) is floated (converted into real number) resulting 3.0000. The other two possible expressions are 1t = 22.017 and 1t = 2217.0 One of the operands (constant) is real and the other is an integer. Real being the stronger, the integer constant is floated. 7 and 22 become 0.7000EO! and O.2200E02, respectively. Now the arithmetic operation (division) is performed in real mode resulting in correct value 3.142857. In another example x = II3, both the operands are integers and the integer division results in integer zero. The variable on LHS of "=" is a real one. The integer constant zero is transformed into a real one 0.0000 OOOOeO and is assigned to x. Thus, the value ofx is 0.0 but not 0.3333333333. Table 2.6

Operand II ---->

Operand I I

R

C

I

I

R

C

R

R

R

C

C

C

C

C

Relational and Logical Expressions A relational, logical or a combination of these expressions results in either true or false. The symbols of operation in the human domain and those used in FORTRAN are given in Table 2.7. The hierarchy of relational operators is in the order EQ .. NE . .GT . .GE. .LT .. LE. and that for Logical Operators .NOT., .AND. and .OR. Table 2.70 : Relational and Logical Operators Human Domain English Form

FORTRAN Symbol

Domain .EQ .

Is equal to Less than




.GT.

Not equal to

.NE.

Greater than or equal to

.GE.

Less than or equal to

.LE.

Table 2.7b : Logical Operators

FORTRAN

Human Domain

Symbol

Domain

And

&

.AND.

OR

I

.OR.

NOT

-

.NOT.

2.1 Sequence Statements

FORTRAN STATEMENTS

41

Arithmetic Operations with more than Two Operands When more than two operators are present in a valid expression, heuristic~ to perform operations are mandatory. Otherwise different end results are obtained depending upon the order of evaluating the expression. Tn the algebraic expressions, parentheses ( ), braces { I and square brackets [ ] are used to pin point the hierarchy of operations. However, in FORTRAN only sets of parentheses are used to any level of nesting. Consider the calculation of root of a quadratic equation ROOTl

=

(-B + SQRT(B**2-4.*A*C))/(2.*A)

The expression on RHS consists of three sets of parentheses, square root function, exponentIatlQn, multiplication, addition and subtraction. The heuristics or rules to perform the hierarchy of operations are given in Chart 2.10. Chart 2.10

*

When more than one set of parentheses occur, the expression in the inner most parentheses, the next in the innermost and so on is evaluated

( ( ( (

) ) ) )

LJ * *

* *

Library functions/function subroutines are evaluated from left to right. Exponentiation is performed. If there is more than one exponentiation, the order of calculation is from right to left. For example, A**B**(C**D) is calculated as::::} A**B**E::::} A**F where E = C**D and F = B**E Multiplication or division have the same hierarchy and are evaluated from left to right Addition or subtraction is performed from left to right.

Considering the expression in the innermost parentheses of the above statement, B**2 -4.0*A*C, the constant (4.0) and the variables A, B, C are all in the real mode. The arithmetic operations performed are in the order exponentiation, multiplication and subtraction. Since, there are two multiplication operations, the expression is scanned from left to right and thus 4.0*A is followed by the multiplication of the result with C. The order of performing arithmetic operations, intermediate results and corresponding one in the human domain are given in Table 2.8. The calculation of standard deviation (SD) from variance (VAR) using apparently similar formulae is implemented in VAR2SD.FOR.

42

2.1 Sequence Statements

COMPUTER ApPLICATIONS IN CHEMISTRY

Table 2.8 : Hierarchy of Operations

Variable

b2

Cl2

**2 4. * A

Cl3

C12

Cl4

Cll-Cl3

~b2 - 4ac

C21

SQRT (C14)

(b 2 - 4ac)

C22

-B+C21

-b +~b2 - 4ac

C31

2

C41

C22/C31

CII

* *

Algebra

FORTRAN

B

*C

*A

VAR2SD.FOR

* DIMENSION SD(6)

4a 4ac

2a -b + ~b2 - 4ac /2a

VAR = 1.0000 1. 000

1. 000 1. 0000 1. 000

1.0000 .500

VAR = 1.4142 1. 414

2.000 1. 4142 1. 000

1.4142 1. 000 .

VAR = 1.7321 1. 732

3.000 1. 7321 1. 000

1.7321 1. 500

DO 10 1=1,3 501

VAR = I SD(l) VAR * * (1. 12 )

502

SD(2)

VAR * * (1/2.)

503

SD(3)

VAR ** 0.5

504

SD(4)

SQRT(VAR)

505

SD(5)

- VAR ** (1/2)

506

SD(6)

VAR ** 1/2

WRITE(*,901)VAR 901

FORMAT (/6X, 'VAR ='FB.3) WRITE(*,902) (SD(J), J =1,6)

902 10

FORMAT(2X,3F10.4/1X,3FI0.3) CONTINUE END

The statements 501 to 504 give the same and correct result. However, statements 505 and 506 produce incorrect values. The statement 505 gives a value of 1.0 irrespective of the value of VAR. Statement 506 always gives SD as half the magnitude of variance. This is due to the hierarchy of the operations. The program SD2V AR.FOR converts standard deviation into variance again by different formulae but the end result is same.

FORTRAN STATEMENTS

2.1 Sequence Statements

* * *

SD2VAR.FOR DIMENSION VAR(4) DO 40 I = 1,3 SD

SD = 1.000 1.0000 1.000

1.0000

1.0000

SO = 1.414 2.0000 2.000

2.0000

2.0000

SO = 1.732 3.0000 3.000

3.0000

3.0000

SQRT (FLOAT (I) )

VAR(l)

SD * SD

VAR(2)

SD**2

AL10

ALOG (10.)

VAR(3) VAR(4)

SD**2. 10.**(2*ALOG(SD)/AL10)

WRITE(*,901)SD 901

43

FORMAT (/5X, 'SD =' ,F8 .3) WRITE(*,902) (VAR(K),K=l,4)

902 40

FORMAT(2X,3F10.4/1X,3F10.3) CONTINUE END

Precision in Arithmetic Operations The word precision in arithmetic operations denotes the number of digits considered for a real constant expressed in the normalized form. In FORTRAN, reaVinteger variable! constant is in single precision by default. The maximum valid integer is 32767. A real constant occupies two 16-bit words (four 8-bit bytes). In ASIGN002.FOR, ABSORB and CONC are stored as 0.3560 DOOOE 00 and 0.1780 0000E-03 and EPSI as 0.2000 0000 E 03 in memory allocations 20-23, 24-27 and 28-31, respectively.

* * *

951

ASIGN002.FOR REAL*4 ABSORB,CONC,EPSI ABSORB = 0.356DO CONC = 6.000178DO EPSI = ABSORB/CONC WRITE(-*,951) ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END

ABSORB REAL CONCREAL EPSIREAL

.356000000000000000 .000178000000000000 2000.000000000000000000

Explicitly, the variables can be defined as single precision using the statement REAL*4 ABSOR,CONC,EPSI

20 24 28

44

2. J Sequence Statements

COMPUTER APPLICATION,S IN CHEMISTRY

These real variables are declared as double precision before the first executable statement as DOUBLE PRECISIONABSOR,CONC,EPSI or REAL*8 ABSOR,CONC,EPqI The program ASIGN003.FOR illustrates implementing ASIGN002.FOR in double precision. The variables ABSORB, CONC and ESPI are allotted eight 8-bit bytes (20-27, 28-35 and 36-43) of memory allocations.

* * *

ASIGN003.FOR

951

DOUBLE PRECISION ABSORB,CONC,EPSI ABSORB = 0.356DO CONC = 0.000178DO EPSI = ABSORB/CONC WRITE(*,951) ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END

ABSORB REAL*8 CONC REAL*8 EPSI REAL*8

20 28 36

.356000000000000000 .000178000000000000 2000.000000000000000000

A real constant in D notation (e.g., 0.356DO or 0.000178DO) specifies that it is in double precision. Looking at the results of the two programs, one finds that there is no difference between the values of EPSI. At this stage, one should not hastily conclude that double precision calculations yield the same results as those in single precision. A general notion prevailed among some chemists is that double precision is not necessary' as the readability of a burette is 0.1 or 0.0 I and the chemicals are weighed with a balance of accuracy of four or six decimal places. But an insight into the effect of cumulative and truncation errors during a series of complicated calculations established the need for double precision. The compiler stores the real constants to eight digits only irrespective of the number of digits given in the program. It is an usual practice to give some of the constants to the program by calculating them manually. The number of digits given will be mostly based on prejudice. For example, the value of 7t can be computed in the program itself rather than recapitulating or manually calculating it. The program ASIGN004.FOR gives the values of 7t in single and double precision. The digits after eighth and sixteenth places in single and' double precision are insignificant and are shaded and have no meaning.

* *

ASIGN004.FOR

*

951

DOUBLE PRECISION PIDP PIDP = 22.DO/7.DO PISP = 22.0/7.0 WRITE(*,951) PISP,PIDP FORMAT (lX,F25.18) STOP END

PIDPREAL*8 PISP REAL

20 28

3 .1428 5700 rmItItlillitItltIQ 3.1428571428571430tl

45

FORTRAN STATEMENTS

2.2 Transfer Control...

Even if the values of X and PI are given up to 30 places in the assignment statements (ASIGN005.FOR), they are stored up to sixteen places in double precision.

* ASIGN005.FOR

* *

951

*

REAL*8 X,PI X = 0.6666 6666 6666 6666 6666 6666 6666 6666 6666DO PI = 3.1428 5714 2857 1430DO WRITE(*,951)X,PI FORMAT (lX, 'X ='F40.30/ 'PI ='F40.30) END

X = .6666 6666 6666 6666 0000 0000 0000 00 PI = 3.142 85714285 7143000000000000 000 In numerical analysis, it is a good practice to use double precision to avoid rouHd off errors. The library function SNGL transforms a double precision variable or constant into a single precision one, wh1\e DBL converts single precision entity into a double precision variable.

2.2 TRANSFER CONTROL STATEMENTS

~

The normal sequential execution of statements in a program is altered through transfer control statements. The control is transferred to a user chosen statement by this category, viz., IF, GOTO and Computed GOTO.

If Statement . It is an executable statement and occurs anywhere after-specification statements except as the last statement of the DO domain. The flow chart symbol (Fig. 2.2.1) and pseudo code are given below. Pseudo code IF Then Else Endif

Predicate Process a Process b

Fig. 2.2.1 : Flow Chart Symbol of IF

Selection Between Two Alternate Process Many of the mathematical, statistical and chemical tasks require selection of alternate decisions, calculations or a set of procedures. Some of the widely employed tasks include obtaining absolute value of a variable, calculation of sum of the elements of a vector, comparison of two real numbers, sorting positive and negative numbers, and calculation of pH of a strong acid on successive dilution.

46

2.2 Transfer Control ...

COMPUTER ApPLICATIONS IN CHEMISTRY

Example 2.2.1 The algorithm to take the absolute value of a variable (X) involves a test to decide whether it is negative or not. The relational expression (X < 0) is called predicate and it results in either YES or NO (TRUE or FALSE; ON or OFF; 1 or 0). If X is negative, then it is to be multiplied by -1 to get the absolute value (Fig. 2.2.2). On the other hand, when X = 0 or positive nothing is to be done. Depending upon the answer either process is performed. The pseudo code and the FORTRAN program (IFOO1.FOR) to implement absolute value of a variable are given below.

T

F

Pseudo code (Case 1) X=-5

Predicate (XB Then A is greater than B Else A is less than or equal to B

Step 3

STOP

2.2

Tran,~fer

49

FORTRAN STATEMENTS

Control ...

Steps I and 3 are sequential while Step 2 is a selection statement. The flow chart (Big. 2.2.5) and program are as follows:

F

T .....~>B

A~~"""

Fig. 2.2.5 : Comparison of Two Numbers

*



IF005.FOR

*

* 951

WRITE ( * , 951) FORMAT (5X, ' GIVE A AND B READ(*,*)A,B

, \)

c IF (A .GT.B) THEN WRITE(*,901)A,B ELSE WRITE(*,903)A,B ENDIF C

FORMAT (/' A FORMAT (I ' A

901 903

',FS.2,' IS GREATER THAN B = ',FS.2) ',FS.2,' IS LESS THAN OR EQUAL TO B ',FS.2)

END

GIVE A AND B : A = 2.00 IS LESS THAN OR EQUAL TO B 4.00

-,

GIVEAANDB:

A =6-.00 IS GREATER THAN B = 4.00 GIVEAANDB:

A

=4.00 IS LESS THAN OR EQUAL TO B 4.00

50

COMPUTER ApPLICATIONS IN CHEMISTRY

In order to distinguish h4fween the two cases A = B and A < B, another selection is needed for false condition, i:e., LHS of A > B decision box. When A > B is false, the control is transferred to ELSE condition (LHS of A > B decision box). Before it enters the decision box, A is less than or eqrtal to B is true. The predicate used in this decision box is A = B. When it is true, a message A = B is displayed. On the other hand for the false condition, A < B is outputted. This results in aI].other IF THEN ELSE structure within the first IF THEN ELSE statement (Fig. 2.2.6). One or more IF THEN ELSE structures within an IF THEN ELSE statement is called nesting of IF statement (IFOO6.FOR) .

2.2 Transfer Control ...

T

F

A.:S B ] ........ F

• Fig. 2.2.6 : Nested DO Loop

* * * 951

IFOO6.FOR WRITE(*,951) FORMAT (5X, ' GIVE A AND B READ(*,*)A,B

, \)

C

IF (A .GT.B) THEN WRITE(*,901)A,B ELSE IF(A .EQ.B)THEN WRITE(*,902)A,B ELSE WRITE(*,903)A,B ENDIF ENDIF C

902

FORMAT(/' A FORMAT(! , A

' ,FS.2, ' IS GREATER THAN B = ',FS.2) ' ,FS.2, ' IS EQUAL TO B = ',FS.2)

903

FORMAT(/' A

',FS .2, ' IS LESS THAN B

901

END

',FS.2)

2.2 Transfer Control...

FORTRAN STATEMENTS

51

GIVE A AND B : A = 4.00 IS EQUAL TO B = 4.00 GIVEAANDB: A = 2.00 IS LESS THAN B = 4.00 GIVEAANDB: A = 6.00 IS GREATER THAN B = 4.00

There is no limit for the extent of nesting (Fig. 2.2.7). But the readability and understanding of the program rapidly diminishes with the increased nesting. Expert systems - computer programs mimicking experts knowledge in decision - contain hundreds to thousands of rules and they demand readability and understanding rather than condensed source code. Therefore, nesting using a series of IF-THEN statements is avoided as shown in IF007.FOR (Fig. 2.2.8) to implement the algorithm of IF006.FOR program.

Fig. 2.2.7 : Multilevel Nesting

52

COMPUTER ApPLICATIONS IN CHEMISTRY

F



2.2 Transfer Control ...

T

.---- 3 is false. So, control is transferred again to the beginning of DO loop. SUM = SUM + X(3), i.e., SUM = 6. I

=I + INC =4.

Since, 1 > 3 is true, the control exits the domain of the DO loop.

Generalization of Summation of Linear Array Whenever there is a change in the number of data points to be summed, editing of the DIMENSION and ASSIGNMENT statements is necessary. Instead, the value of the integer variable (NP) can be given by the user and thus the program can be generalized (D0003.FOR). Further, inputting the values of X through a DO loop eliminates NP ASSIGNMENT statements. The two DO loops can be combined in this case as in D0004.FOR.

* *

*

D0003.FOR PARAMETER (MAX = 1000) DIMENSION X(1000) CALL IXO(NP)

*

* *

* 10

20

*

DO 10 I = l,NP READ(*,*)X(I) CONTINUE SUM = O. DO 20 J = l,NP SUM = SUM + X(J) CONTINUE WRITE(*,*)SUM END

10

DOOOIl.FOR DIMENSION X(1000) READ (*,*)NP SUM = 0 DO 10 I = l,NP READ ( * , * ) X (I ) SUM = SUM + X(I) CONTINUE WRITE(*,*)SUM END

2.3 Do Statement

FORTRAN 8T ATEMENTS

63

Summation is used in calculating standard deviation, linear least squares, product of matrices etc. Similarly, the product of elements in the linear array

(.II I

Xp can be calculated as in DOOOS.FOR.

1=1

* * *

10

)

DOOOS.FOR PARAMETER (MAX = 50) DIMENSION X (MAX) READ(*,*)N PROD = 1. DO 10 I = 1,N READ(*,*)X(I) PROD = PROD * X(I) CONTINUE WRITE(*,*)PROD STOP END

A special case is the product of first N natural numbers, where PROD(N) is the factorial (!) of N. This algorithm (D0006.FOR) is utilized in NCR and NPR programs.

*

*

D0006.FOR

*

CALL IXO(N) PROD = 1. DO 10 I = 1,N PROD = PROD * I 10 CONTINUE WRITE(*,*)PROD END $INCLUDE : 'IXO.FOR'

Implied DO Loop It is an abridged form of DO loop and its utility is restricted to READ and WRITE statements. Its general format is READ(*, *) (Var (Scintvar), Scintvar =init, ifinal Linc D WRITE(*,*) (Var (Scintvar), Scintvar =init, ifinal LincD The input part of D0003.FOR to obtain SUM of elements of linear array is given in DOOO7.FOR. The remaining part of the program remains same. The X array can be displayed as WRITE(*,*) (X(I), 1= 1, NP). Then the programs D0007.FOR and DOOO3.FOR give the same results.

64

2.3 Do Statement

COMPUTER ApPLICATIONS IN CHEMISTRY

* D0007.FOR

* *

DIMENSION X(1000) CALL IXO(NP)

*

20

READ ( * , *) (X ( I) , I SUM = 0 DO 20 J = l,NP SUM = SUM + X(J) CONTINUE

l,NP)

* WRITE(*,*) (X(I),I l,NP) WRITE(*,*)SUM END $INCLUDE '\F77\FOR\IXO.FOR'

Infinite Repetition Structure Repetition structure is a boon. But infinite repetition (loop) is a curse when conditional termination of the loop is not planned (D0008.FOR), unconditional transfer control statement (GOTO) is used (D0009.FOR) or the variable in predicate/the index variable of DO loop is not changed (DOOlO.FOR). The only way to interrupt the process is to boot the computer or to kill the job in multi processing systems. * * * 501

DOO08.FOR N = 0 N = N + 1 WRITE(*,*)N IF (N. LE .10) GOTO 501 END

* * * 501

502

D0010.FOR N = 0 IF(N.LE.10)THEN N = N + 1 WRITE(*,*)N GOT0501 ELSE GO TO 502 ENDIF CONTINUE END

* * * 501

502

Output of DOOO9 .FOR

DOO09.FOR

1 2 3

N = 0 N = N + 1 WRITE(*,*)N GO TO 501 CONTINUE END

25 26 2490

Output of DOOlO.FOR

1 1 1

1

2.3 Do Statement

65

FORTRAN STATEMENTS

Do's of DO ./ The last statement of the DO loop can be any executable statement including CONTINUE . ./ Negative increment or decrement for index variable is valid (DOOll.FOR).

*

*

*

901 691

D0011.FOR L=21 LSQUARE=441

DO 691 L = 21,18,-1 LL = L*L WRITE(*,901)L,LL FORMAT (' L = ',13, 3X, 'L SQUARE CONTINUE END

L = 20 L SQUARE = 400

, ,14)

L = 19 L SQUARE = 361

L = 18 L SQUARE = 324

./ Dummy DO loop - non-executable statements - does nothing except spending or whiling away the computer time (DOOI2.FOR).1t is useful to develop time delay loops.

* *

*

100

*

D0012.FOR DO 100 I = 1,100 CONTINUE END DUMMY DO LOOP

./ An integer expression is valid for initial, final and increment of the index variable (DOO 13.FOR).

* * *

675

D0013. FOR INIT = 1 N = 3 PROD = 1 DO 675 K = INIT*2,N**2,2*INIT PROD = PROD*K WRITE(*,*)K,PROD END

2

2.0000000

4

8.0000000

6

48.0000000

8

384.0000000

./ Premature exit from DO loop is valid. It is useful to count the number of negative numbers in an array sorted in ascending order (DOOI4.FOR).

*

*

*

4 5

D0014.FOR DIMENSION X(121) DATA X ( 1) ,X ( 2) ,X ( 3 ) ,X ( 4) ,X ( 5) ,X ( 6 ) / - 2 , -1, 0 , 1 , 2 , 3 / NN = 0 D04L=1,6 IF (X(L) .LT.0)NN=NN+1 IF (X(L) .GE.O)GOTO 5 CONTINUE WRITE(*,*)NN END

66

COMPUTER ApPLICATIONS IN CHEMISTRY

2.3 Do Statement

., The two logical IF statements (DOOI4.FOR) (DOOI5.FOR) or block IF perform the same function.

* *

D0015.FOR

*

DIMENSION X(121) DATA X(l) ,X(2) ,X(3) ,X(4) ,X(5) ,X(6) /-2,-1,0,1,2,3/ L =1 IF(X(L) .LT.O)THEN NN=NN+1 ELSE GOTO 5 ENDIF L = L+1 GO TO 501 WRITE(*,*)NN END

501

5

As long as the value of current element of X is < 0, the value of NN is increased by one, and the number of negative values. When X (L) ~ 0 the counting is stopped. The testing of other numbers is not necessary as X is an ordered array. So using the transfer control statement (GOTO) execution of the DO loop is terminated. The premature exit saves execution time while working with very large arrays or databases. Program DOOI6.FOR also gives same result except when all six X values are tested. Of course this is useful even when the array is not sorted.

* *

*

4 5

D0016.FOR DIMENSION X(121) DATA X ( 1) ,X ( 2) , X ( 3 ) ,X ( 4) ,X ( 5) , X ( 6 ) / - 2 , -1, 0 , 1, 2 , 3 / I DO 4 L = 1,6 IF (X(L) .LT.O)THEN NN=NN+1 ENDIF CONTINUE WRITE(*,*)NN END

Don'ts of DO ® DO label must not precede the DO statement, because DO works in the forward but not in the reverse direction (DO 10 1. FOR). 1* 2*

* *

*

D0101.FOR

10 WRITE(*,*)IZ DO 10 IZ = 1,10 END

DOlOl.FOR 3* 410 'WRITE(*,*)IZ 5 DOIOIZ=I,10 ***** Error 107 - DO label must follow DO statement 6 END ***** Error 132 - DO or IF block not terminated

2.3 Do Statement

67

FORTRAN STATEMENTS

® Index variable of the DO loop should not be redefined in DO domain (D0102.FOR). I* 2*

* *

DOI02.FOR

* N

= 6

DO 100 I = 1,10 I = N*l WRITE(*,*) I 100 CONTINUE END

DOI02.FOR

3*

4 N=6 DO 100 I = 1,10 5 1 6 1= N*l ***** Error 811 - assignment to DO index variable WRITE(*, *) I

1 7 1 8100

9

CONTINUE END

Comment: Index variable is manipulated in I = N*l statement ®The last statement of a DO loop should not be a transfer control statement (D0103.FOR). 1* 2*

* * *

DOI03.FOR DIMENSION Rl(2 0),R2(2 0),R3 (20) CALL RREAD(Rl,R2)

20

DO 10 I = 1,20 R3(I)

10

=

Rl(I)+R2(I)

GO TO 20 END

D0103.FOR

3* 4

DIMENSION Rl(20), R2(20), R3(20)

5

CALL RREAD (Rl, R2)

620

DO 10 I = 1, 20

7

R3(1) = Rl(I)+R2(1)

810 GOT020 ***** Error 120 - GOTO not allowed here 9

END

Comment: GOTO 20, an unconditional transfer control statement is used where CONTINUE is expected. ® The format specification cannot be used as the last executable statement in a DO loop (DOI04.FOR). * * *

DOI04.FOR DO 6 M = 1,4 WRITE(*,6)M

6

FORMAT (15) END

1* 2* DOl04.FOR 3* 4 D06M=1,4 1 5 WRITE(*,6)M ***** Error 165 -label already used as FORMAT 1 66 FORMAT(15) ***** Error 134 - FORMAT label already referenced 1 7 END ***** Error 132 - DO or IF block not terminated

68

COMPUTER ApPLICATIONS IN CHEMISTRY

2.3 Do Statement

® Statement number in DO statement should not be omitted (DOIOS.FOR). * * *

10

D0105.FOR N = 0 DO I = 1,10 N = N+1 ENDDO END

® The control should not be directly transferred to the domain of the DO loop (DO 106.FOR).

*

I

*

41 10

*

2* D0106.FOR 3* 4 NP=4 5 IF (NP .GT. 2) GOTO 41 6 DO 10 1=1,6 I 7 41 CONTINUE ***** Error 102 - jump into block not allowed 1 8 WRITE(*,*)N 1 9 10 CONTINUE 10 END

D0106.FOR

*

NP = 4 IF (NP .GT. 2) GOTO 41 DO 10 I = 1,6 CONTINUE WRITE(*,*)N CONTINUE END

® Subscript value of an array should not be outside the range of array dimension (D0201.FOR). * * *

10

D0201.FOR

1.0000000 2.0000000 4.203895E-045 6.678588E-042 .0000000 9.612743E-026

DIMENSION X(2) DATA X!1.,;;:. / DO 10 I = 1,6 WRITE(*,*)X(I) CONTINUE END

® The scope of IF statement should not be outside DO domain (DO 107 .FOR).

* *

D0107.FOR

*

10

DO 10 I = 1,4 IF (I .GT.2)THEN WRITE(*,*)I CONTINUE ENDIF END

1 '" 2* 3* 4 1 5

D0107.FOR

DO 10 I = 1,4 IF (I .GT.2)THEN WRITE(*,*)I 1 6 CONTINUE 1 710 ***** Error 113 - improperly nested DO or ELSE block 8 ENDIF 9 END

2.3 Do Statement

FORTRAN STATEMENTS

69

® The index variable should not be a real one (OOIOS.FOR).

* *

DOI08.FOR

*

1

SUM = 0 DO 1 X'= 1, 3 . 0 , 0 . 5 ,SUM = SUM + X WRITE(*,*)X,SUM CONTINUE END

Nested DO Loops The presence ofa DO loop within the domain of the first DO loop results in a first level nested DO loop, I he general form being

DO Stnol IndVarl '" Init, !final [,inc] Process I DO Stn02 IndVar2

=

Init;lfinal [,inc]

Process 2 Stn02

CONTINUE Process 3

Stnol

CONTINUE

Stno I, Stn02

Two different or same statement numbers.

IndVarl,lndVar2

Two different scalar integer variables

First level DO loop is used for Input/output, arithmetic operations (except multiplication), logarithmic and trigonometric operatIOns on matrices. However multiplication of two matrices or input/output of a third order tensor requires a two level nested DO loop. Keeping in view of the mathematical or statistical applications of multi-way data structures, the upper limit of nesting DO loop is restricted to 25. However many real life problems do not require. nesting beyond fourth level.

Do's of NESTED DO Loops ./ The same variable either for initial or final values of index variable is valid . ./ The last executable statement for all DO loops in the nested one can be same

(000I7.FOR).

70

2.3 Do Statement

COMPUTER ApPLICATIONS IN CHEMISTRY

* *

D0017.FOR

*

20 951 10

PARAMETER (MAX=5) DIMENSION A (MAX, MAX, MAX, MAX) ZERO = 0.0 DO 10 11 = 1,MAX DO 10 12 = 1,MAX DO 10 13 = 1,MAX DO 20 14= 1,MAX A(I1,I2,I3,I4) = ZERO CONTINUE WRITE(*,951) (A(Il-,I2,13,I4) ,14 FORMAT(lX,5F8.2) COWrrNUE END

1,MAX)

.00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

,(The same statement number can be used for all DO loops in the nested one. However, there should be a single executable statement as the last one (DOOI8.FOR).

* * *

10

D0018. FOR' PARAMETER (MAX=5) DIMENSION ZERO (MAX, MAX, MAX, MAX) 1,MAX DO 10 11 1,MAX DO 10 12 1,MAX DO 10 I3 1,MAX DO 10 14 ZERO(I1,I2,I3,I4) O. END

2.3 Do Statement

FORTRAN STATEMENTS

71

Don'ts of NESTED DO Loop ® DO block should not be left unterminated (D0109.FOR). The statement number of the last executable statement should be that of the DO label.

* * *

D0109.FOR DO 316 11 = 1',7,3 DO 316 12 = 1,4 WRITE (* , * ) I1, 12 END

® Nesting should not be improper (DOllO.FOR). The domain of the inner DO loop should end before that of the outer.

* *

D0110.FOR

* DIMENSION A(20,20),B(20,20) ,C(20,20) CALL RREAD(A,B) DO 10 I 1,20 1,20 DO 20 J A(I,J)+B(I,J) C (I,J) 10 20

CONTINUE CONTINUE END

Comment: 20 CONTINUE should be before 10 CONTINUE.

® The DO label and the statement number of the last statement of DO domain should not be different (DO 111.FOR).

* *

D0111.FOR

* DIMENSION UMAT(10,10) DO 10 I =1,10,1 DO 10 J = 1,10 UMAT(I,J) 1. 100

CONTINUE END

72

2.3 Do.Statement

COMPUTER ApPLICATIONS IN CHEMISTRY

® The same index variable should not be used in a nested DO loop (DOI12.FOR).

* * *

D01l2.FOR

DO 10 I = 1,4 DO 20 I = 1,2 WRITE(*,*)I 20 CONTINUE 10 CONTINUE END

1 * D01l2. FOR 2 * 3 * 4 Dt> 10 I = 1,4 1 DO 20 I = 1,2 5 ***** Error 811 - assignment to DO index variable WRITE(*,*)I 1 6 1 7 20 CONTINUE 8 10 CONTINUE END 9

Applications of NESTED DO Loops (2) Printing of multiplication tables (DOOI9.FOR).

* * *

20 10

11

3

11 22 33

II 11 11 II 11 11 11

4

44

II 11

D0019.FOR DO 10 I = 11,13 DO 20 J = 1,10 • IJ = I*J WRITE(*,*)I,J,IJ CONTINUE CONTINUE END

1 2

5 6 7

8 9 10

55 66 77 88

99 110

(2) Input of matrices of size rnxn.

Reading a matrix row wise (D0020.FOR) and reading a matrix column wise (D0021.FOR)

* * *

D0020.FOR DIMENSION A(20,20) READ(*,*)M,N DO 50 K

40 50

=

* * *

DIMENSION A(10,10) READ(*,*)M,N

I,M

DO 40 L = 1,N READ ( * , * ) A ( K, L) CONTINUE CONTINUE END

D0021.FOR

DO 15 I

15

= l,N

DO 15 J = I,M • READ (*,*)A(J,I) CONTINUE END

2.3 Do Statement

FORTRAN STATEMENTS

o Generation of unit (ONES.FOR), zero (ZEROS.FOR) and identity (EYE. FOR) matrices * * *

ONES.FOR

20

* *

SUBROUTINE ONES(A,ROWA,COLA) INTEGER ROWA,COLA DIMENSION A(ROWA,COLA) ONE = 1.0 DO 2 0 I -=l , ROWA DO 20 J=l,COLA A(I,J) = ONE CONTINUE END

ZEROS.FOR SUBROUTINE ZEROS(A,M,N) PARAMETER (MAX = 20) DIMENSION A (MAX, MAX) ZERO = 0.0 DO 10 I = 1,M DO 20 J = 1,N A(I,J) = ZERO CONTINUE CONTINUE END

20 10

* * *

EYE.FOR

50 10

SUBROUTINE EYE (A,ROWA,COLA) INTEGER ROWA,COLA DIMENSION A(ROWA,COLA) ZERO = 0.0 ONE = 1.0 DO 10 I=l,ROWA DO 50 J=l,COLA A ( 1, J) = ZERO CONTINUE A(1, I) =ONE CONTINUE RETURN END .

73

74

2.4 Input/Output

COMPUTER ApPLICATIONS IN CHEMISTRY

.00

.00

.00

1.00

.00

.00

.00

1.00

1.0

I

Another way of producing identity matrix is depicted in EYE02.FOR

* * *

EYE02.FOR PARAMETER (MAX = 20) DIMENSION EYE(MAX,MAX) CALL IXO(N)

*

20 10

DO 10 I = 1,N DO 20 J = 1,N IF(I.EQ.J)THEN EYE (I, J) 1 ELSE EYE (I, J) 0 ENDIF CONTINUE CONTINUE

* CALL WX2(N,N,EYE,MAX) END $ INCLUDE $ INCLUDE

'\MSF-CAC\STATEMENTS\IXO.FOR' '\ MSF-CAC\STATEMENTS\WX2.FOR'

/2.4

INPUT/OUTPUT'

Read Statement It is an executable statement. Its syntax is

READ (unit-number, S [,err = stno] [,END = intvar])list unit-number: Logical device/unit number (indicates the input from a standard input device like keyboard) S

Format specification (directs the input from keyboard)

list

Set of variables, array elements or array(s)

stno

Label of the statement to which control is transferred in case of an error condition in READ operation. When data are read from a keyboard 'END' is never executed. However, it does not create any error message during execution.

END

2.4 Input/Output

75

FORTRAN STATEMENTS

READ(*, *)PH causes a group of characters representing the value of pH to be read from keyboard. It assigns the value to PH. If the number keyed in is 1.690, it is equivalent to the assignment statement. PH = 1.690 However, READ is used instead of an assignment when the value changes from run to run. The program (READOO I.FOR) is thus independent of the numerical values.

* *

READ001.FOR

* CHARACTER*10 NAME WRITE(*,901) 901

FORMAT(' Give name of compound,NEC,WBT') READ(*,*)NAME,NEC,WBT WRITE(*,*)NAME,NEC,WBT END

The compound name (NAME), number of electron change (NEC) and molecular weight of the substance (WBT) are read from the keyboard. The values can be keyed in as 'Oxalic acid', 2, 126.03 or 'Oxalic acid' 2 126.03 The first value 'Oxalic acid' is as'signed to NAME, 2 for NEC and 126.06 for WBT. It is equivalent to three assignment statements. Quotes are mandatory to enclose a character/string data. A prompt on the screen as given in READOO I.FOR avoids the need to remember the sequence and type of variables (READ002.FOR). An error condition occurs in READ operation (READ003.FOR) when fewer values are available than the variables or due to syntactic errors in the constants. To avoid an abrupt halt ERR option is used in the READ statement (READ004.FOR).

* * *

READ002.FOR

* *

CHARACTER*l CHEMIS(lO) READ(*,*) CHEMIS

READOO3.FOR

*

WRITE(*, *)CHEMIS

READ(*,*)EMF WRITE(*, *)EMF

END

END

* *

READ004.FOR

* 100

READ(*,*,ERR=lQO)WT WRITE(*,*) 'ERROR IN READING WT' END

C:\>READ004 2.123. ERROR IN READING WT

76

2.4 input/Output

COMPUTER ApPLICATIONS IN CHEMISTRY

An implied DO loop can be used in a READ statement as is given in READOO5.FOR. Each data line should have NP data points of UV -visible spectrum (wavelength and absorbance).

* *

READ005.FOR

*

801 901

DIMENSION WAVEL(lOO),ABSORB(lOO) READ(*,*)NP READ(*,801) (WAVEL(I) ,ABSORB(I),I = l,NP) FORMAT(6FIO.3) WRITE(*,901) (WAVEL(I),ABSORB(I),I = l,NP) FORMAT(lX,6FIO.3) END

The number of data points on each line can be varied (READ006.FOR) to avoid taking zero values for some of the variables in case of excess format specification.

* *

READ006.FOR

* DIMENSION WAVEL(lOO) ,ABSORB(lOO) READ(*,*)NP

501

801

NOVAR = 2 N2 0 Nl = NE+1 N2 = Nl + NOVAR IF (N2 .GT. NP) N2=NP READ(*,801) (WAVEL(I) ,ABSORB(I),I

Nl,N2 )

FORMAT(6FIO.3) WRITE(*,901) (WAVEL(I),ABSORB(I),I IF (N2 .LT. NP)GOTO 501

901

FORMAT(lX,6FIO.3) END

Do's ofREADjWRITE Statement ./ The list of arguments may be null in a WRITE statement (WRITEOO1.FOR).

* *

WRITE001.FOR

* 901

WRITE(*,901) FORMAT (' LIST OF ARGUMENTS IS NULL') END

Nl,N2)

2.4 input/Output

FORTRAN STATEMENTS

77

./ The list of arguments can be an array element, set of array elements or the entire array (READ007.FOR).

* READ007.FOR

* *

DIMENSION VOL(4) READ(*,*)VOL(l) ,VOL(2) READ(*,*)VOL(3),VOL(4) READ(*,*)PH END ./ Variables of different types can be read from a single statement. In READ008.FOR, NAME is a character string; NEC is an integer and WBT a real constant. A blinking cursor is an indication that data is to be given from the keyboard. READ statement accepts three values from the keyboard. The first one is character string, while the second and third are numerical values.

* *

READ008.FOR

* 901

READ(*,901)NAME,NEC,WBT FORMAT(A10,I2,F8.3) END

Don'ts OF READ/WRITE Statement ® No Comma after the last variable or before the first variable (READI01.FOR).

* *

READ10l.FOR

* READ(*,*)VOL,PH WRITE(*,*) ,VOL,PH END

List directed Input/Out1?ut (I/O) List directed I/O statements transfer data between I/O device and CPU. In this mode an implicit format is used. A comma, blank or group of blanks separates two data items. The blank spaces are ignored. List directed I/O is used for debugging and quick results. When the algorithm is completely tested for different data sets, output is converted into formatted mode.

Formatted I/O Although reading values of variables witholK a format (free format) is preferred, sometimes data recorded by instruments or that compiled by organizations is to be used. It was the earlier practice not to leave a space or insert comma between successive data items, due to the capability of computer to read the data items using a format.

78

2.4 Input/Output

COMPUTER ApPLICATIONS IN CHEMISTRY

In formatted I/O the variables are specified in READIWRITE. The spacing between values is explicitly spelled out in the FORMAT statement. Formatted I/O in application programs gi ves a professional appearance, makes the data visually attractive and easy to read (READ009.FOR).

*

*

* 851

READ009.FOR CHARACTER*l HEAD1(3},HEAD2(3} READ(*, 851}HEAD1,HEAD2,VOL,PH FORMAT(3Al,3Al,F5.2,F7.2} WRITE(*,851}HEAD1,HEAD2,VOL,PH END

To optimize the use of punch cards data input to the computer was planned using odd format specification. In many cases respecification of format was considered mandatory for running different programs. The READ and FORMAT statements together tell the computer in what columns the data is present. A conflict between the type of variable and the data results in I/O execution error.

FORMAT Statement FORMAT is a non-executable statement. It consists of a statement label in 1 to 5 columns followed by the keyword, FORMAT and specification (Table 2.4.1) which is a list of edit descriptors enclosed in a set of parentheses. Table 2.4.1 : Edit descriptors in format statement

Variable Type Integer Real Real Real NumericalCharacter (or) String Blanks/Spaces Logical

Format Fixed Floating E~onential

Double Precision General String Hollereth Logical

Edit Descriptor

Symbol I F E D G A

aI w aFw.d aEw.d aDw.d aGw.d aAw aX (or) aH aLw

X

L

The syntax is Stno FORMAT([SI,][,S2][,S3]) Sometimes the list is null. The format can also be specified as given in READ008.FOR or in READOlO.FOR.

* *

READ010.FOR

*

951

INTEGER ATNO CHARACTER*20 FMT1,NAME FMTl = '(lX,Al0,I2,F8.3)' WRITE (*,951) FORMAT (lX, 'GIVE NAME N~ WBT READ(*,*}NAME,NEC,WBT WRITE (*,FMT1}NAME,NEC,WBT END

,}

2.4 Input/Output

FORTRAN STATEMENTS

79

Do's of FORMAT Statement ./' FORMAT statement can be used by any number of READ or WRITE statements . ./' FORMAT can follow or precede the TlO statements. -/' All or soine of the FORMAT statements can be grouped and placed either before END or after the first executable statement.

Integer Input The general form of the data specification for integers is aIw, where a is repetition factor and w is field width. It indicates that the next w input columns contain a right justified integer. The variable in the READ statement and the data should be in integer mode. The leading blanks are ignored. But the trailing blanks are interpreted as zeros. a is omitted if the repetition factor is one for all numerical fields (integer, real, complex) of any precision. A negative sign occupies one space and the decimal point another space. In the output plus sign is left as blank for positive values. In READ statement user can omit positive sign. Depending upon the ma~nitude of numerical values, different formats are used. For example, the number of ionisable protons (NIP) may not exceed nine, while the number of experimental points (NP) may be in hundreds. The format specifications II and T3 are appropriate for them (READOll.FOR).

* *

READOll.FOR

* 180

READ(*,180)NIP,NP FORMAT(Il,I3) WRITE(*,*)NIP,NP END

If data are given as 2036, FORTRAN interprets NIP as 2 and NP 036 or 36; on the other hand, if the data

are given as 236 then NIP will be taken as 2 while NP as 360. The compiler interprets blank as zero and thus 36 becomes 360.

F Format F Format is used for I/O operation of floating point constant. The general format of F specification is a Fw.d d : number of digits after decimal point The number of digits after the decimal point in the output is chosen depending upon the accuracy of input data. For example, the visible spectrum of KMn04 was recorded using a total concentration of 0.000486. It can be displayed as 0.000486 using F8.6 format which indicates that there are six digits after the decimal point and the total width is 8. The EMF of electrochemical cells is measured to an accuracy of 0.1 mV and its range is -1000 to +2000. Therefore, the minimum width is 7 places; one for the sign, four for integer part, one for the decimal and another for fractional part. The READ statement in READ012.FOR inputs atomic weights of six elements to the computer. The WRITE statement outputs the values with four spaces between each value. The order in which I/O operation occurs is ATWT(I), ATWT(2), ... ATWT(6).

80

2.4 Input/Output

COMPUTER ApPLICATIONS IN CHEMISTRY

* *

READ012.FOR

* DIMENSION ATWT(6) READ(*,*)ATWT WRITE(*,*)ATWT END As the number of points in a titration is not known a priori, the maximum possible size, say, 10 is to be declared in the DIMENSION statement (READ013.FOR). Although 10 memory allocations are reserved, all of them need not be used in the program. The number of data points read is counted as soon as VOLUME and COND are read. If the number exceeds that mentioned in the DIMENSION statement, an error 'Number of data points exceed dimension' message is printed and the job is terminated .

.

* *

READ013.FOR

* DIMENSION VOL (10), COND (10), CCOND (10)

501

MAX

10

NP

0

VO

50.

NP

NP + 1

READ(*,*)A, B IF (A .LE.O. AND. B .LE.O) GOTO 502 IF (NP .LE.MAX) THEN VOL (NP)

A

COND (NP)

B

CCOND (NP)

COND (NP) * (VO + A)/VO

WRITE (*,*) VOL (NP), COND (NP), CCOND (NP) GOTO 501 ELSE WRITE (*, 951) FORMAT ('NUMBER of data points exceed dimension')

951 ENDIF 502

STOP END

Since the number of points changes from titration to titration, the number of points can also be given to the computer through another READ statement. Then the data array can be inputted using li DO loop (READ014.FOR) or implied DO loop (READ015.FOR).

2.4lnplIt'

81

FORTRAN STATEMENTS

JIlt

* READ014.FOR

*

* *

* DIMENSION X(20) READ ( * , * ) NP DO 10 I = l,NP READ ( * , * ) X ( I ) WRITE(*,*)I,X(I) CONTINUE 10 AVE2 = AVE(NP,X,20) WRITE(*,*)AVE2 END $INCLUDE: 'AVE. FOR' $INCLUDE: 'SUMXl.FOR'

READ015.FOR

* DIMENSION X(20) READ(*,*)NP READ ( * , *) (X ( I) , I WRITE(*,*)X AVE2 = AVE(NP,X,20) WRITE(*,*)AVE2 END $INCLUDE: 'AVE.FOR' SINCLUDE: 'SUMXl.FOR'

1,NP)

The READ statement of TFMTOO I.FOR expects the input record with the structure Column I first number Column 2 - 4 second number Column 5 - 8 third number Any input characters beyond 8th column are ignored.

* * * 81

IFMTOOl.FOR READ(*,81) NEXP,NP,IEMF FORMAT(Il,I3,I4) WRITE(*,*) NEXP,NP,IEMF END

Do's of F Format If the format is Fw.O and there is no decimal point in the data, it is understood to be an integer.

./

901 FORMAT (F5,O) 2671 is interpreted as 2671.0 "

If the format is Fw.d and there is no decimal point in the data then a decimal point is inserted d-digits from the right end. 17 -3762

READ (*,17) EMF FORMAT(F5.l)

A decimal point is inserted one digit from right and the input is interpreted as -376.2 . ./

The explicit decimal point in data overwrites format specification for real variables. Then the trailing and leading blanks have no effect.

82 ./

COMPUTER ApPLICATIONS IN CHEMISTRY

2.4 Input/Output

Tfthe format is Fw.d & there is a decimal point in the input data Then Fw.d Format is overridden. r

READ (*/,18) EMF FORMAT(F5.1)

18 -37.25

as per FORMAT specification the decimal point should be before the digit 5. But, the value of EMF is taken as -37.25 only.

E Specification E format is used for the data in the exponential form. The rules for E specification are similar to those for F. W is the number of digits after decimal point plus eight one each for sign, decimal point, digit before decimal point and five positions for signed exponent. If Then

Input has more digits than predefined for the machine Computer stores the number as accurate as possible.

If Then

Exponent is equal or beyond the limit of the software Exponent overflow or underflow occurs

Space Control The horizontal spacing in input/output is specified by X and the general format is aX. The fields of the input specified by X may be blanks or can contain extraneous characters. The output layout of the tables is affected by X format. There is no gap between the two data items, namely, WAVEL and ABSORB if the format is as shown in WRITE002.FOR. For legibility a horizontal space of three columns is given using X format in WRITE003.FOR. Now by changing 901 FORMAT.

* * *

WRITE002.FOR. INTEGER WAVEL(4) DIMENSION ABSORB(4) DAT~ WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,i.156/ NP = 4 DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I)

901 10

FORMAT(I4,F4.2) CONTINUE END

2.4 Input/Output

FORTRAN STATEMENTS

83

* WRITE003.FOR

*

* INTEGER WAVEL(4) DIMENSION ABSORB (4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4

901

DO 10 I = 1,NP WRITE(*,901)WAVEL(I),ABSORB(I) FORi'IAT (14, 3X, F5. 3)

10

CONTINUE END

Column headings will improve the readability of the output and thus another WRITE statement using quote format is used (WRITE004.FOR).

* *

WRITE004.FOR

* INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/

9 51

NP = 4 WRITE(*,951) FORMAT (' WAVE LENGTH' , 3 X, 'ABSORBANCE' )

901

DO 10 I = l,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT(I4,3X,F5.3)

10

CONTINUE END

The numerical values are not centered with respective columns. So, physically counting the columns or using a T fOJ;mat results in the desired output.

TFORMAT T format indicates the computer to 'TAB' to a specified column. The general format is Tn where n is a positive integer specifying the column needed. The T specification is useful to READ the data in a different order from that used to enter the data (READ016.FOR).

COMPUTERApPLICATIO~IS IN CHEMISTRY

84

* * * 171

2.4 Input/Output

READ016.FOR READ(*,171)EMF,VOL FORMAT(T11,F10.3,T1,F10.3) END

The number from the 11th column is read first and assigned to EMF. Then the number from 1st column is read and assigned to volume. Tn fact in the data file the data are entered in the order VOL and EMF.

Printing Bivariate Data Column Wise UV - visible spectrum and electrometric titration data represent paired or bivariate data sets. The modification of WRITE004.FOR for a better output is given in WRITE005.FOR. A formal display of the table is possible (WRITE006.FOR) with a modification of the previous formats.

*

*

*

951

901 10

WRITE005.FOR INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4 WRITE(*,951) FORMAT (T1, 'WAVE LENGTH' ,T15, 'ABSORBANCE' ) DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT(T5,I3,T17,F5.3) CONTINUE END

*

*

WRITE006.FOR

*

952 951

901 10

INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4 WRITE(*,952) FORMAT(T5,31(lH-)) WRITE(*,951) FORMAT (T5, 'WAVE LENGTH', T22, 1H' ,T25, 'ABSORBANCE' ,T35, 1H) WRITE(*,952) DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT (T5, 1H' ;T10, 13, T22, 1H' ,T27, F5. 3, T35, 1H' ) CONTINUE WRITE(*,952) END

2.4 Inpllt/Output

85

FORTRAN STATEMENTS

Input/Output through External Files The I/O operations can also be performed through disk files. Some terms relevant to disk files are given below.

Field The datum corresponding to a variable is called a field. It represents a unit of information. MOLWT and CHEMIS are the fields in the following programs.

* FILE001.FOR

* *

* * *

CHARACTER*l CHEMIS(40) OPEN(l,FILE='DAT') READ(1,801,ERR=2000,END=2000)CHEMIS WRITE(*,*)CHEMIS 801 FORMAT (40Al) 2000 STOP END

FILE002.FOR REAL MOLWT OPEN(l,FILE='MOLWT') READ(l,*)MOLWT WRITE(*,*)MOLWT END

Record A record consists of data. Records may be of the same or of different type. Further the order of variables is ~mmaterial.

* *

FILE003.FOR

* $DEBUG CHARACTER*lO IFILE,OFILE CHARACTER*lO CHEMIS OPEN(3,FILE='TEMPI' ,STATUS='NEW') READ(*,*)CHEMIS,NEC,TVOL WRITE(3,*,ERR=4000)CHEMIS,NEC,TVOL 801 4000

FORMAT(lX,10A1,I2,F10.3) STOP END

File A program can be executed from DOS prompt by giving input data through keyboard and the output can be displayed on the terminal. But a better way of inputting data is to create an ASCII file a collection of records pertaining to an object (DAT). Then the data from this file (DAT) can be inputted to the program (DOOO? .FOR) using the following DOS command.

86

COMPUTER ApPLICATIONS IN CHEMISTRY

2.4 Input/Output

C:\> D0007 D0007 RESULT writes the result in a disk file named RESULT. The files OAT and RESULT can be edited and deleted. FORTRAN has a built in facility of creating, editing and deleting ASCII data files. They may be input, intermediate, scratch or output files. The files are classified as external and internal.

External File It is an ASCII data file present on a floppy, fixed disk or CD-ROM. The external files are sub-classified into sequential and random access files depending upon storage on the auxiliary memory devices.

Sequential File In a sequential file the records are accessed one by one from the first to the last without skipping. It is useful for a large volume of information that rarely changes. The entire data set is read every time the program is executed. Thus it is processed as single unit. Editing record of a sequential file requires accessing all the preceding records. A sequential file is generally updated by creating a new version with another name. The operations involved are copying, inserting new records and deleting undesired ones. The original and new files are called master and working or mother and daughter files. The disadvantage of sequential file lies in updating and retrieving processes. The hardware components keyboard, monitor and printer are sequential devices that are considered as files from the software point of view.

Random Access File In a random access (or direct) file, records are numbered sequentially from one to the maximum number declared by the user. The records are accessed in any order. A list of the physical positions of the records is maintained on the disk by the software. The only restriction for the random access file is that all the records should be of the same size and length.

Do's of Random Access File ./ The contents of a record can be overwritten.

Don'ts of Random Access File ® Deletion of a record is not possible. ® Retrieving the contents of a non-available record results in an error.

Formatted and Unformatted Files External files are classified as formatted and unformatted files. Formatted files are organized as a stream of characters terminated by end of line marker. The characters are transmitted between the file and the program during I/O operation. In formatted files, data are stored in system dependent form. I/O operations are faster than those in unformatted files because no conversion is needed during I/O operation.

Don'ts of Unformatted File ® Formatted and unformatted records cannot be mixed.

2.4 Input/Output

87

FORTRAN STATEMENTS

OPEN It is an executable statement. It can be placed any'where in the program. It establishes a logical connection

between the file on the device (magnetic tape, floppy, fixed disk) and the FORTRAN program. The statement has multifold purposes, namely to •

connect an external file to the unit specified



create a file that is pre-connected



create a file and connect it to the unit, and



change certain attributes of the connection

• The syntax of OPEN statement with all the options is OPEN ([UNTT=intconst I] [,ERR=intconst2] [,STATUS=stat] [,FTLE=fname] f,ACCESS=acel f,BLANK=blnl [,FORM=fmtl f,IOSTAT=intvar] [,RECL=intconst3] OPEN

Keyword. The default options of OPEN are that the file is a sequential and unformatted one.

UNIT

Logical unit number for the file. intconst 1 is integer constant/expression (>0) corresponding to file number.

ERR

It is a label of the statement in the program to which control should be transferred in the event of an error during open operation.

STATUS

It indicates whether the opened file is a new or old one. If the file is old (STAT='OLD'), it makes only the logical connection between the file and the program. If it is new file

(stat='NEW'), it is created with the unit number and is opened. The options are 'NEW', 'OLD', 'SCRATCH' or 'UNKNOWN'. The default option is UNKNOWN. In the SCRATCH option, the software deletes the file after the execution of the program. FILE

Name of the file (fname) to be associated with the unit number.

ACCESS

The two valid options are 'direct' (random access) and 'sequential'.

RECL

It indicates the maximum length of record in the random access file. RECL is used only

for random access file. FORM

The options available are 'FORMATTED' and 'UNFORMATTED'. The default choice is UNFORMATTED.

IOSTAT

It indicates the successful completion or error condition during opening of the file. The

integer variable (intvar) is set to zero upon successful opening of the file. intvar is assigned with positive value in case of an error condition.

Do's of OPEN ./ ./

OPEN statements can be collected together at one place like format statements . A scratch file is defined in FILEOO4.FOR.

88

2.4 Input/Output

COMPUTER ApPLICATIONS IN CHEMISTRY

* * *

FILE004.FOR



N = 6 OPEN(24,STATUS= , SCRATCH' ,FILE= 'JUNK') WRITE(24,*)N CLOSE (24) OPEN(24,STATUS='OLD' ,FrLE= 'JUNK') WRITE (24, *)N END

Don'ts of OPEN ® A specifier for a 'unit cannot appear more than once (FILE OOS.FOR) * * * $DEBUG

2001 951 2002 952

FILE005.FOR

OPEN(i5,FILE OPEN(15,FILE N = 4 WRITE(15,A)N CLOSE (15) OPEN(15,FILE= READ (15, *) X WRITE(*,*)X .sTOP WRITE(*,951) FORMAT (' Error STOP WRITE(*,952) STOP FORMAT (' Error END

'TEMP1',STATUS='NEW' ,ERR 'TEMP2',STATUS='NEW',ERR

2001) 2002)

'TEMP2',STATUS='OLD')

in opening file TEMP1')

in opening file TEMP2')

The first OPEN statement opens the existing file on the disk. The second OPEN statement dictates that the same logical unit number is to be used while the first file is open.

CLOSE It is an executable statement and disconnects the file from the FORTRAN unit number. FORTRAN has a built in fail-safe automatic facility of closing all opened files during the normal termination of the execution of the program. However CLOSE statement enables the programmer to explicitly close a specified file. In other words it closes the file that is opened with the unit number. It occurs anywhere in the program after the corresponding OPEN statement. The syntax of CLOSE statement is

2.4 Input/Output

89

FORTRAN STATEMENTS

CLOSE (UNTT= intconst, [,ERR = = intconst 2] l,STATUS=stat] [,TOSTAT=intvar3] CLOSE: Keyword STATUS :The two options are 'KEEP' and 'DELETE'. The default option is 'KEEP'.

Do's of CLOSE Statement ./ CLOSE statement for a non-existing file is equivalent to 'Do nothing' action (FILE006.FOR). '.

* *

FILE006.FOR

*

2001

CLOSE(lO) CLOSE (30) STOP FORMAT('ERROR IN CLOSING FILE 30 ') END

ENDFILE It is an executable statement and should succeed the OPEN statement for sequential execution of the file. The syntax is END FILE: Keyword It writes an end of the file mark. The statement can be followed by CLOSE statement. An ENDFILE is automatically written whenever BACKSPACE, REWIND or CLOSE are executed or during normal termination of the program.

Consider the situation where all WRITE operations of a sequential file are performed in one program and retrieval operations in another program (FILE007.FOR, FILE008.FOR).

* * *

FILE007.FOR OPEN(20,FILE='DAT',STATUS= 'NEW' ) N = 20 WRITE ( 20 , * ) N END

,

* * *

FILE008.FOR OPEN(20,FILE='DAT' ,STATUS READ(20,*)N WRITE(*,*)N END

'OLD' )

90

COMPUTER ApPLICATIONS IN CHEMISTRY

2.4 Input/Output

The END in the first program automatically closes the files after writing the end file mark. The same effect is also achieved by closing the sequential file in the same program (FILEOO9.FOR).

* *

FILE009.FOR

* OPEN(20,FILE='DAT',STATUS= 'NEW') N = 20 WRITE(20,*)N REWIND (20) READ ( 2 0 , * ) N WRITE(*,*)N END However an explicit use of the command is made to rewind the file without closing it (FILEOlO.FOR).

*

*

*

FILEOIO.FOR OPEN(70,FILE='DAT',STATUS= 'NEW') N = 20 WRITE(20,*)N ENDFILE(20) REWIND (20) READ(20,*)N WRITE(*,*)N END

BACKSPACE It is an executable statement used to reposition the pointer at the beginning of a record. It can occur anywhere after a file is opened. BACKSPACE command is useful to overwrite a record and to read a record again when an error occurs due to type or format specification conflict. This command is costly from computer resources point of view and ~o should be used scarcely.

Do's of BACKSPACE ,/ It can be used even when the file is at the beginning (FILEO II.FOR).

*

*

FILEOll.FOR

* $DEBUG PH = 7.5 OPEN(16756,FILE= 'PH',STATUS='NEW') BACKSPACE 16756 WRITE(16756,*)PH CLOSE(16756) OPEN(16756,FILE='PH') READ(16756,*)PHZ WRITE(*,*)PHZ END

2.4 Input/Output

FORTRAN STATEMENTS

91

./

If BAC~SPACE command is used n times, the pointer is positioned at (n-l) th preceding record (FILEOI2.FOR) .

./

If the file is positioned after the end file mark, two BACKSPACE statements position the pointer at the last record (FILE013.FOR).

* *

FILE012.FOR

* DIMENSION VOL(6) DATA VOL/1,2,3,4,5,6/ OPEN(3,FILE='VOL',STATUS='NEW') DO 10 I = 1,6 WRITE(3,*)VOL(I) 10

CONTINUE BACKSPACE (3) BACKSPACE (3) BACKSPACE (3) READ(3,*)X WRITE(*,*)X END

* *

FILE013.FOR

* $DEBUG DIMENSION VOL(6) DATA VOL/1.,2.,3.,4.,5.,6./ OPEN(3,FILE='VOLUME',STATUS='NEW') DO 10 I = 1,6 WRITE(3,*)VOL(I) 10

CONTINUE ENDFILE(3) BACKSPACE (3) BACKSPACE (3) READ(3,*)X WRITE(*,*)X END

When BACKSPACE occurs within a record the pointer is positioned before the current record.

92

COMPUTER ApPLICATIONS IN CHEMISTRY

2.4 inpllt/Owput

Don'ts of BACKSPACE ® It is not possible to BACKSPACE the record of a file that does not exist (FILEWl.FOR).

* *

FILE101.FOR

* $DEBUG BACKSPACE (4) END

REWIND It is an executable statement and used in I/O operations with a file. This command gets the name from computers using tape (magnetic or paper) files. It is an indication that the next record to be read is at the beginning of the file. When a file is opened, the program automatically positions the pointer at the first record. Tn the case of magnetic tape, rewind command physically rewinds the tape. The syntax is REWIND ([UNIT = 1intconst, f,ERR = intconst 21 f,IOSTAT=ntvar3l)

Do's of REWIND ./ It has no effect if the file is already at the beginning (FILEO l4.FOR)

* *

FILE014.FOR

* OPEN (25, FIL1~",' SOL_DATA' ,STATUS=' NEW' ) D = 77. WRITE(25,*)D CLOSE(75) OPEN(UNIT ~ 26,FILE='SOL_DATA') REWIND 26 REWIND 26 READ(26,*)DI WRITE(*,*)DI END

Don'ts of REWIND ® It should not be used when a sequential file is first created (FILE015.FOR).

* *

"

"

FILE015.FOR

* ATWT = 12. ATNO = 6. OPEN(161,FILE='ELEMENTS' ,STATUS='NEW') REWIND 161 WRITE(161,*)ATNO,ATWT END

2.4 Input/Output

FORTRAN STATEMENTS

93

The error is because the end of the file mark is not written when the file is opened. The corrected version of the program is FILEO 16.FOR.

* * *

FILE016.FOR ATWT = 12. ATNO = 6. OPEN(161,FILE='ELEMENTS',STATUS='NEW') ENDFILE (161) REWIND 161 WRITE(161,*)ATNO,ATWT END

® It is not possible to backspace over a record written by list directed output statements.

* *

FILE102.FOR

* $DEBUG EMF = 263.5 IVOL = 10 OPEN(15,FILE='VOLEMF' ,STATUS='NEW') WRITE(15,*)EMF,IVOL BACKSPACE (15) READ (15, *) X, IX WRITE(*,*)X,IX WRITE(15,*)IVOL,EMF END

INQUIRE INQUIRE is an executable statement. It is used to ascertain (1) whether a file exists on the disk, (2) whether a file is opened, and (3) the name of the file opened. The syntax is

INQUIRE (FILE =fname, EXIST = Ivar, NEXTREC = ivar, BLANK=cvar, NAMED=lvar, OPENED=lvar, NAME=cvar, ERR=ivar, IOSTAT=ivar, SEQUENTIAL=cvar, DIRECT=cvar, FORMATTED=cvar, UNFORMATTED=cvar, RECL=ivar, NUMBER=ivar, ACCESS=cvar, FORM=cvar) INQUIRE: Keyword lvar : Logical variable cvar : Character variable

94

COMPUTER ApPLICATIONS IN CHEMISTRY

2.4 Input/Output

Do's of INQUIRE ./

A file can be inquired by unit number or file name. Use of both name and logical unit number is only to keep track of the files (FILEOl7.FOR).

* *

FILE017.FOR

* $DEBUG LOGICAL LVAR1 COND = 2312 OPEN(16,FILE='COND' ,STATUS='NEW') WRITE(16,*)COND CLOSE(16) INQUIRE (UNIT=16,EXIST=LVAR1) WRITE(*,951)LVAR1 951 FORMAT(' FILE EXIST: ',L4) INQUIRE(FILE = 'COND' , EXIST=LVAR1) WRITE(*,951)LVAR1 END

Don'ts of INQUIRE (8)

A variable used as a specifier in INQUIRE should not appear in another specification statement.

* *

FILE103.FOR

* $DEBUG

100 201

LOGICAL LVAR1,LVAR2 K = 1E4 OPEN(16,FILE='TEST' ,STATUS='NEW') WRITE(16,*)K LOGK = LOG (K) WRITE(16,*)LOGK ENDFILE (16) BACKSPACE (16) INQUIRE (UNIT=16,NAMED=LVAR1,OPENED=LVAR2,ERR WRITE(*,*)LVAR1,LVAR2 STOP WRITE(*,201) FORMAT('ERROR IN INQUIRE STATEMENT') END

100)

LV ARl and LV AR2 are declared as logical variables. They are used in INQUIRE statement at NAMED = LVARl and OPENED = LVAR2.

2.5 Dimension

95

FORTRAN STATEMENTS

12.5

DIMENSION'

Type Statement TYPE is a non-executable statement and precedes the first executable statement. It is used to explicitly declare the variable type, to override default convention of real and integer variables and to define logical, complex or character variables. The concept of default convention of variables as real and integer dates back to early nineteen sixties when only real and integer were defined. Logical and character variables were introduced later. However, explicit declaration makes the reader aware of the type of variables. The preferable variable for molecular weight is MOLWT. But, it is an integer variable and the statement MOLWT = 126.06 results in the integer value 126. In order to retain the significant digits after the decimal point, MOLWT is to be declared as real (TYPEOOI.FOR).

* * *

TYPE001.FOR REAL MOLWT MOLWT = 126.06 WRITE(*,*)MOLWT END

The statement REAL MOLWT overrides the default convention that MOLWT is an integer and thus its value is stored as 0.1260 6000 E03. To declare some or all the variables starting with a character, IMPLICIT and EXPLICIT categories are available. The knowledge base for the compiler interpretation of the type of variable when both IMPLICIT and EXPLICIT statements present in a program are given below. (1) If Variable is absent in IMPLICIT type declaration & Variable is absent in EXPLICIT type declaration Default convention prevails Then * * *

TYPE002.FOR

* * *

MOLWT = 126.06 WRITE(*,*)MOLWT END (2)

If

Then

TYPE003.FOR LANDA =- 420 WRITE(*,*)LANDA END

Variable is present in EXPLICIT type statement Variable belongs to {TYPE} EXPLICIT type declaration overrides default convention (TYPEOO4.FOR)

* *

TYPE004.FOR

* REAL LANDA LANDA =- 420 WRITE(*,*)LANDA END

96

2.5 Dimension

COMPUTER ApPLICATIONS IN CHEMISTRY

Comment : LANDA is an integer variable by default. However, the EXPLICIT type statement (REAL LANDA) overrides the default assumption and now LANDA is real. (3)

If

First character (C) of the variable is present in IMPLICIT {TYPEI } & There is no explicit statement.

Then

All the variables starting with character (C) belong to {TYPE I }. {TYPEI}: INTEGER, REAL, COMPLEX, CHARACTER, LOGICAL

* *

TYPE005.FOR

If

TYPE006.FOR

*

*

(4)

* *

IMPLICIT INTEGER*2 (A-Z)

IMPLICIT REAL*4

LANDA = 370.5 WRITE(*,*)LANDA END

MOLWT = 126.06 WRITE(*,*)MOLWT END

(A-Z)

First character of the variable is present in implicit {TYPEI} & Variable (V AR 1) is present in explicit {type2} statement.

Then

(VARI) belongs to {TYPE2} & All other variables belong to (TYPE I ). Explicit type statement overrides implicit type statement. IMPLICIT REAL *4 (A-Z) declares that in TYPEOO6.FOR all variables are real. So the default convention is overridden. However, EXPLICIT specification that LANDA is INTEGER results in that the variable LANDA only is integer and all others though starting with L (LOGBETA) are real.

Do's of TYPE Statement ./

More than one variable can be declared in a TYPE statement. Each variable is to be separated by a comma. REAL KW, MOLWT, N, NAP, NIP

./

There can be more than one TYPE statements of a category (TYPE007.FOR).

TYPE007.FOR REAL KW REAL MOLWT,N DATA KW,MOLWT,N,NAP,NIP/13.99,126.06,6.03E23,2,3/ WRITE(*,*)KW,MOLWT,N,NAP,NIP END

2.5 Dimension

97

FORTRAN STATEMENTS

.I A real variable by default can also be explicitly and/or implicitly be declared as real (TYPE008.FOR).

* *

TYPE008.FOR

*

951

REAL ABSORB ABSORB = 0.356 WRITE(*,951)ABSORB FORMAT (IX,F25.18) STOP END

Don'ts of TYPE Statement ® A variable should not be specified in more than one type statement.

* * *

1 * 2 * 3 *

TYPEI0l.FOR REAL KW REAL*8 KW, PKW PKW = 14.0 KW = 10. ** (-PKW) WRITE(*,*)KW,PKW END

.,

TYPEI0l.FOR

.. REAL KW 4 5 REAL*8 KW, PKW ***** Error 33-identifier already has type 6 PKW = 14.0 7 KW = 10. ** (-PKW) 8 WRITE(*,*)KW,PKW 9 END

Dimension Statement Wave numbers in the finger print region of the IR, chemical shifts in NMR spectrum or mJe values in a mass spectrum are a set of values, called an array or subscripted variable. An array is a collection of variables that have the same name and belong to the same type. Each element of the array is a scalar. The number of elements in an array is called size of the array. The number of subscripts in an array can be one, two or many and they are called one-, two-, or multi- dimensional arrays. If there are four peaks in an IR spectrum of il compound, assignment statements to ,wave numbers are not preferred since it requires as many lines of code as the number of values. The number of variables increases with the number of data items and the calculations become more cumbersome since the same step is to be repcated with each variable. Then the numerical values of the wave numbers can be coded as given in ARRA YOOl.FOR or the values can be read as given in ARRA Y002.FOR.

* * *

ARRAYOOl.FOR NUIRl, NUIR2, NUIR3, NUIR4' / 1650, 3320, 2315, 870/ DATA WRITE (*,*) NUIRl, NUIR2, C NUIR3, NUIR4 END

98

2.5 Dimension

COMPUTER ApPLICATIONS IN CHEMISTRY

* ARRAY002.FOR

*

* READ .(*,*) NUIR1, NUIR2, NUIR3, NUIR4 WRITE (*,*) NUIR1, NUIR2, NUIR3, NUIR4 END In mathematical or statistical literature, a one-dimensional variable is used with a subscript indicating the number of elements (NUIRi, i = 1 to 4). As subscripts and superscripts are not allowed in FORTRAN, a DIMENSION statement is introduced. The two ways of representing the elements of a vector (a set of values) in human and computer domain are given in Table 2.5.1. Table 2.5.1 : Representation of Subscripted Variable in Human and Computer Domains

Computer Domain

Human Domain NUIR1, NUIR2. NUIR3, NUIR4

NUIR1, NUIR2. NUIR3, NUIR4

NUIR, NUIR, NUIR, NUIR

DIMENSION NUIR(4)

NUIR, i = I to 4

NUIR(l), NUIR(2), NUIR(3), NUIR(4)

DIMENSION is a non-executable statement. It is used to specify the maximum number of elements in an array. It precedes the first executable statement (ARRA Y003.FOR) and succeeds implicit statement (ARRA Y004.FOR).

*

*

* *

ARRAY003.FOR

* DIMENSION MBYE(lOO) MBYE (1) =' 46 WRITE(*,*)MBYE(l) END

ARRAY004.FOR

* IMPLICIT REAL*4(A-Z) DIMENSION MBYE(lOO) MBYE(l) = 46 WRITE(*,*)MBYE(l) END

The advantage o{ DIMENSION statement is concise syntax. It renders array manipulation obvious. The source code is reduced drastically. The two programs ARRA Y005.FOR and ARRAY006.FOR perform the same task of reading four values for DELNMR from the keyboard.

* ARRAY005.FOR

* *

DIMENSION DELNMR(4) READ(*,*) DELNMR(l)' READ(*,*) DELNMR(2) READ(*,*) DELNMR(3) READ(*,*) DELNMR(4) WRITE(*,*)DELNMR(l) ,DELNMR(2) ,DELNMR(3),DELNMR(4) END

2.5 Dimension

99

FORTRAN STATEMENTS

* ARRAY006.FOR

* *

10

DIMENSION DELNMR(4) DO 10 I = 1,4 READ(*,*) DELNMR(I) CONTINUE WRITE(*,*)DELNMR END

Functioning of Dimension Statement DIMENSION statement allocates a group of consecutive memory locations to array elements. In other words, a chunk of memory is allocated with the same variable name but sliced into units equal to the number specified based on precision. Each variable occupies a slice of memory. An array element is referred in the program by the name of the variable followed by the number of the element in parentheses. For example, DELNMR(3) is the third element in the array with the name DELNMR. DELNMR in the above programs is a single precision real variable with four elements, DELNMR(1), DELNMR(2), DELNMR(3) and DELNMR(4). They are stored in successive memory locations namely 20-35 as shown below. DELNMR [1]

DELNMR [2]

DELNMR [3]

DELNMR [4]

I III I I III I I III I I III I 20

23

24

27

28

31

32

Do's of DIMENSION ./ Only some elements of the array may be used while processing (ARRA Y008.FOR).

* *

ARRAYOOB.FOR

* DIMENSION DELNMR(4) , TOWNMR (4) DELNMR(l)

3.5

DELNMR(2)

6.0

TOWNMR(1)

10 . * DELNMR ( 1 )

TOWNMR(2)

10. * DELNMR (2)

WRITE(*,*)DELNMR,TOWNMR END

./ ./

More than one DIMENSION statements are valid (ARRAY009.FOR) . More than one array can be defined in a single dimension statement (ARRA Y009.FOR)

35

100

COMPUTER ApPLICATIONS IN CHEMISTRY

2.5 Dimension

* ARRAY009.FOR

* *

DIMENSION NUIR(100) DIMENSION DELNMR(20) , TOWNMR (25) DIMENSION TRCON(9) ,WEIGHT(12) END ./

./ ./

The numerical value of an element of the array can be passed forth and back into subprograms. DIMENSION WAVEL(20), CONC( I 0) CALL SUBl(WAVEL(6), C) The subscript used to refer the element of array is an integer constant, variable or arithmetic expression . Real constant/variable/expression is valid to refer the element of an array. If it is a real value the numerical value is truncated to the nearest integer (ARRA YOIIO.FOR).

* * *

ARRAY0110.FOR

10

DIMENSION VOL (50) DO 10 I = 1,10 VOL (I) = I CONTINUE X

1. 6

V1 VOL(2*INT(X)) V2 VOL(INT(4.6)) V3 VOL (INT (X) ) WRITE(*,*)V1,V2,V3 END ./

The subscript can be an integer variable provided its value is declared in PARAMETER statement. PARAMETER (NP = 10) DIMENSION EMF(NP)

Don'ts of DIMENSION Size of the dummy argument in subprogram should not be greater than the size of the argument in mainline. DIMENSION CONCCIO), ABSORB(IO) CALL BEER(CONC, ABSORB) END SUBROUTINE BEER (X,Y) DIMENSION CONC(20), ABSORB(20) RETURN END ® An array name should not be declared more than once. DIMENSION DELNMR(20), DELNMR(IO)

®

2.5 Dimension

101

FORTRAN STATEMENTS

® Type conflict in array declaration is invalid (DIMC1Ol.FOR).

* * *

DIMl 101.FOR DIMENSION NUIR(20) REAL NUI R ( 2 0 ) INTEGER WAVEL(36) DIMENSION WAVEL(36) NUIR(16) 2000 WAVEL (2) = )70 END

®

1 * 2 * 3 *

DIMl 101. FOR

DIMENSION NUIR(20) REAL NUIR(20) 5 ***** Error 30 - array already dimensioned INTEGER WAVEL(36) 6 DIMENSION WAVEL(36) 7 ***** Error 30 - array already dimensioned NUIR(16) 8 2000 WAVEL(2) = 370 9 END 10 4

Value of the subscript outside the range of the maximum value specified in DIMENSION statement is invalid.

* * *

DIMl 102.FOR DIMENSION VOL(12) VOL(O) VOL (-5) WRITE(*,*) WRITE(*,*) WRITE(*,*) END

VOL (10) 12 0 -5 VOL(12) VOL(O) VOL(-5)

Comment: The values 12,0 and -5 are outside the range 1 to 10. ® The subscript should not be without PARAMETER statement.

* *

DIMl 103.FOR

* DIMENSION EMF(NP) END

1

*

2 3 4

* *

DIMl 103.FOR

DIMENSION EMF (NP) ***** Error 66 - adjustable size declarations only for dummy arrays ***** Error 68 - adjustable bound must be parameter or in COMMON 5 END

Comment: The value of NP is not known at the time of analyzing DIMENSION statement.

102 ®

COMPUTER ApPLICATIONS IN CHEMISTRY

2.5 Dimension

The number of subscripts in the array declaration and array reference should not be different ARRA YlOl.FOR) 1 * 2 * 3 *

* ARRAY101.FOR

* *

ARRAY101.FOR

4 DIMENSION pH(10) 5 WRITE(*,*) pH(1,4) ***** Error 56 - too many' subscripts 6 END

DIMENSION pH(10) WRITE(*,*) pH(1,4) END

Comment: pH is referred as a two-dimensional array element in WRITE statement while it is declared as one-dimensional. In ARRAY I 02.FOR, volume is not declared but is used as a dimensional variable.

* * *

ARRAY102.FOR VO = 50.0 V = VO+VOL(J) WRITE(*,*)V END

Name

Type

MAIN VOL

REAL

Pass One

Size

Class PROGRAM FUNCTION

No Errors Detected 7 Source Lines

Microsoft 8086 Object Linker Version 3.02 (C) Copyright Microsoft Corp 1983, 1984, 1985 Unresolved externals: VOL in file(s):

Comment : Computer interprets YOL(J) as a function subprogram (YOL) with the input/output argument, J. So there is no compilation error. However, at the linking stage an error message "Unresolved externals" is displayed.

2.6 Subprogram

103

FORTRAN STATEMENTS

Two Dimensional Arrays The two-dimensional array, known as matrix, contains rows and columns. The rows of methyl orange data(absorbance at different pH's) matrix are the spectra at different pH's. The columns refer to absorbances at different wavelengths. The elusion profile obtained from HPLC is a vector of size NTIME x 1. If the UV -vis spectrum is taken at a ach elusion time instead of the absorbance at a single wavelength, it produces matrix. In chemical kinetics, the absorbance at a single wavelength (absorbance maximum) is monitored as a function of time to estimate the rate constant of a reaction. If full spectrum is recorded with time, the data are called kinetic spectrum of size nt X nl. These examples clearly establish the need of a two dimensional array. Many statistical computations on bivariate (X and Y) data involve arithmetic matrix operations.

Higher Dimensional Arrays Recent chemical literature shows that three-dimensional data arrays (tensors) are obtained from hyphenated instruments. The fluorescence at different excitation and emission wavelengths as a function of time is a third order tensor of size NEM x NEX x NT. Table 2.5.2 describes the data of different orders obtained from chemical instruments. Table 2.5.2 : Order of Data Obtained from Different Instruments

Number of Subscripts

Variable Chemical

FORTRAN Variable

Absorbance of a no of samples

1

ABSORB (NP)

NIR spectrum of a no of samples

2

ABNIR(NP,NNU)

Spectro chromatograms

2

RESP(NT ,NLANDA)

Excitation Emission Fluorescence spectrum

3

FLURES(NEX,NEM,NT)

Stability constants of ligands, metals, solvents at different Temperatures

4

BETA(NL,NM,NSOL,NT)

Kinetic spectrum

2

ABSOR(NTIME,NLANDA)

12.6 SUBPROGRAM • Intrinsic/Library Functions Statistical/mathematical/chemical computations repeatedly use several procedures. Determination of absolute value of a scalar to solving a set of differential equations is necessary at some stage of implementing an algorithm. Calculation of numerical values of hyperbolic, arithmetic and ordinary trigonometric functions, logarithm and antilogarithm require numerical methods of analysis. Due to extensive use, they are included as intrinsic functions in the computer language iibrary (for example, MATH.LIB and ALTMATH.LIB in FORTRAN) as a substitute to lookup tables. The results of chemical experiments are validated through statistical analysis. Algorithms of the statistical parameters and mathematical procedures are available as subprograms in packages like NUMERICAL RECIPES, Numerical Algorithm Group, Subroutine Package for Social Science and Quantum Chemistry Program Exchange. Estimation of rate/equilibrium constants, curve resolution of overlapping spectra! chromatograms or prediction of concentrations of non-interacting multi-component compounds are complex tasks. They demand a set of user written subprograms.

104

COMPUTER ApPLICATIONS IN CHEMISTRY

2.6 Subprogram

Function Subprogram Function subprogram is an executable statement. It is edited and saved like the main program. It has the advantage of avoiding the repetition of source code for similar calculations. The user is assured of the results of function subprogram and therefore can pay attention to the logic of main program. A function subprogram is referred or called in an expression. It exists as one of the variables on the RHS of assignment or replacement statement. The resulting value now becomes the operand of the expression. The. name of the function should be assigned with a value at least once in a function subprogram. The preceding statement to RETURN must be an assignment statement with function name as variable on the LHS. The input/output statements and calling SUMX1.FOR are collected in a driver program (SUM1.DEM) known as the mainline.

* * *

10

SUMX1.FOR FUNCTION SUMX1(N,X,MAX) DIMENSION X(MAX) SUMX1 '" O. IF(N .GT.MAX)PAUSE 'X DIMENSION EXCEEDS MAXIMUM VALUE' DO 10 I '" 1,N SUMX1 + X(I) SUMX1 CONTINUE RETURN END

*

*

WX1.FOR

*

952

* * *

SUBROUTINE WX1(NP,X,MAX) DIMENSION X(MAX) WRITE(~,952) (I,X(I) ,I '" 1,NP) FORMAT(100(' X(',I2, ')",',Gll.4, '; X(',I2, ')",',G11.4, * '; X ( , , 12, ' ) '" ' ,G 11 . 4, '; X ( , , 12', '.) '" ' , G11 . 4/) ) RETURN END

SUMX1.DEH

DIMENSION Y (3) DATA Y/1.5,2.6,3.8/ NP '" 3 CALL WX1(NP,Y,3) SUM", SUMX1(NP,Y,3) WRITE(*,*}SUM END $INCLUDE: 'SUMX1.FOR' $INCLUDE : 'WX1.FOR'

2.6 Subprogram

105

FORTRAN STATEMENTS

Working details of FUNCTION Subprogram Execution of main program starts at the first executable statement in the top down manner. When a function subprogram is encountered in an assignment or replacement statement • • •

The program counter remembers the current statement. The control is transferred to the function subprogram. The values of input arguments in the mainline now become the values for the dummy arguments for function subprogram.

• •

The function subprogram is executed up to RETURN. The value of the function name is its current value.

• The control is transferred back to the same statement where the mainline execution is halted. • The execution of mainline resumes. • The value of the function name is now used in the assignment or replacement statement. • The execution of main program continues till STOP statement. These steps are pictorially shown in Fig. 2.6.1. Thus the function subprogram directly processes the arguments in the mainline through the dummy ones present in the function. The consequences. of the function name in function subprograms and mainline in an expert system mode follow. SUMXI.DEM

SUM ~

SUMXI (N,X,MAX,)

SUMX1(NP,Y'~ RETURN

END Fig. 2.6.1: Execution Profile in a Function Subroutine (1) IF

Then

There is no TYPE in the function statements TYPE of the result is same as that of the variable used as function name (ISUM1.DEM).

* * *

ISUM1.DEM DIMENSION X(500) MAX 500 X(l) 1.5 X(2) X (3)

* * *

3.8 2.6

NP 3 CALL WX1(NP,X,3) IZ = ISUM1 (NP, X:) WRITE(*,*)IZ END $INCLUDE: 'ISUM1.FOR' $INCLUDE : 'WX1.FOR'

10

ISUM1. FOR FUNCTION ISUM1(NP,X) PARAMETER (MAX = 500) DIMENSION X (MAX) ISUM1 = O. DO 10 I = 1,NP ISUM1 ISUM1 + X(I) CONTINUE RETURN END

106

2.6 Subprogram

COMPUTER ApPLICATIONS IN CHEMISTRY

Comment: ISUM is integer variable by default and there is no TYPE in function statement. Hence the resulting ISUM is integer. (2) If TYPE of expression in main line is different from the TYPE of function name Then Value is transformed according to the rules of assignment statement (ISUM2.FOR).

* * *

* *

ISUM2.DEM

ISUM2.FOR

*

DIMENSION X(500)

FUNCTION ISUM2(NP,X)

MAX = 500 X(l) 1.5 X(2) 3.8 2.6 X(3)

PARAMETER (MAX=500) DIMENSION X (MAX) SUM2 = O.

NP 3 CALL WX1(NP,X,3) 10

SUM = ISUM2(NP,X) WRITE(*,*)SUM

DO 10 I = l,NP SUfv12 SUM2 + X (I) CONTINUE ISUM2 RETURN END

END $INCLUDE: 'ISUM2.FOR' $INCLUDE : 'WX1.FOR'

SUM2

Comment: ISUM2 calculated in the function is in the integer mode while SUM in the mainline is a real variable. So, the value ofISUM2 is converted to real while assigning the value to SUM. (3) If There is TYPE in the function statement Then Default conversion of variable name is overridden (ISUM3.DEM)

* * *

ISUM3.DEM

DIMENSION X(500) REAL ISUM3 MAX 500 X(l) 1.5 X(2) 3.8 X(3) 2.6 NP = 3 CALL WX1 (NP, X, 3) = ISUM3 (NP, X) SUM WRITE(*,*)SUM END $INCLUDE: 'ISUM3.FOR' $INCLUDE : 'WX1.FOR'

* * *

10

ISUM3.FOR REAL FUNCTION ISUM3(NP,X) PARAMETER (MAX=500) DIMENSION X(MAX) ISUM3 O. DO 10 I l,NP ISUM3 ISUM3 + X (I) CONTINUE RETURN END

2.6 Subprogram

FORTRAN STATEMENTS

107

Comment: TSUM3 is integer by default, but real function TSUM3 overrides the default convention and TSUM3 i~ a real variable. (4) The implicit statement in calling program does not alter the type of intrinsic function.

* * *

951

SIN103.FOR IMPLICIT REAL*B (A-Z) PI = 22 .DOn .DO DEG = 30.DO RAD = PI/1BO.ODO * DEG Xl = SIN(RAD) X2 = DSIN(RAD) WRITE(*,951)DEG,Xl,X2 FORMAT (lX, 'DEG ='FB.2,5X, 'SIN(DEG) --, , F15.10/5X, 'DSIN(DEG) =' ,F25.20) * END

DEG = 30.00 SIN(DEG) = .5001825022 DSTN(DEG) = .500 I 8250 2199 6698 0000

Comment: All variables are double precision real ones. SIN(RAD) and DSIN(RAD) are calculated in single and double precision, respectively. The numerical values of Xl and X2 differ after eight digits. Do's of FUNCTION Subprogram ...j A variable with function subprogram name can occur any number of times in assignment/replacement statements in the domain of function subprogram. ...j A function subprogram may not have even a single argument (LF103.FOR).

* * *

LF103.FOR LOGICAL X,ERROR X = ERROR() WRITE(*,*)X END

951

LOGICAL FUNCTION ERROR() WRITE(*,951) FORMAT(' Error in program') ERROR = .TRUE. RETURN END

108

COMPUTER ApPLICATIONS IN CHEMISTRY

2.6 Subprogram

.J An argument of different precisions in main line and function subprogram can be used after suitable conversion in the function (ROOTI.FOR). * * *

ROOT1.FOR Xl 1. x2 -lE-9 X3 -1.E-14 X = ROOT1(Xl,X2,X3) WRITE(*,*)X PH = -LOG10(X) WRITE(*,*)PH END

* * * FUNCTION ROOT1(Al,Bl,Cl) DOUBLE PRECISION X,A,B,C A DBLE (Al) B DBLE(Bl) C DBLE (el) X (-B + DSQRT(B*B - 4.0 * A* C))/(2.0* A) ROOTl = SNGL(X) RETURN END

Output of ROOT1.FOR

1.0050l2E-007 6.9978280 The actual arguments in mainline are Xl, X2, X3 and the corresponding dummy ones in ROOT! are AI, B I and Cl. Both sets have three elements and are single precision real variables (Fig. 2.6.2). Since the calculations of the roots of the quadratic equation are contemplated in double precision, the arguments AI, B 1, CI are converted to double precision through the intrinsic function DBLE. One root of the equation (X) is calculated in double precision. As ROOT 1 is in single precision, X is converted to single precision using intrinsic function SNGL.

.J .J

More than one return statement is valid . A function subprogram passes back the values of variables through arguments

(SUMX4.FOR).

2.6 Subprogram

109

FORTRAN STATEMENTS

Mainline XI

X2

X3

DDD 16

19

20

23

27

24

Function Subroutine Al

Bl

CI

DDD

o

3

4

7

8

11

Fig. 2.6.2 : Correspondence of Variables of Mainline and Function Subroutine

* *

951

SUMX4.FOR DIMENSION X(lOO) NP = 4 ERR = 0 SUM = SUMX4(NP,X,ERR) WRITE(*,951)ERR FORMAT(' ERROR: ',F5.0) END

* * *

10

FUNCTION SUMX4(NP,X,ERR) DIMENSION X(2) IF(NP .GT. 2)THEN ERR = 1 RETURN ELSE SUMXl = O. l,N DO 10 I SUMXl + X(I) SUMXl CONTINUE ENDIF RETURN END

" The variable names and statement numbers in a subprogram may be same as those used in the mainline.

110

COMPUTER ApPLICATIONS IN CHEMISTRY

2.6 Subprogram

Don'ts of FUNCTION Subprogram ®

The number of arguments in the main line and subprogram should not be different.

* *

FSUB301.FOR

* DIMENSION X(100) X (1)

SUM

= 5. = SUMX1(X)

END $INCLUDE:

®

'SUMX1.FOR'

There should not be any TYPE or precision conflict between the actual arguments in the main line and those in the subprogram.

* * *

FSUB302.FOR DIMENSION X (100) X (1)

=

5.

SUM = SUMX3(1,X) WRITE(*,*)SUM END

* SUMX3.FOR

*

* INTEGER FUNCTION SUMX3(NP,IX) PARAMETER (MAX = 100) DIMENSION IX (MAX) SUM

= O.

DO 10 I SUM 10

= 1,NP = SUM +

CONTINUE SUMX3 RETURN END

SUM

IX(I)

2.6 Subprogram

®

FORTRAN STATEMENTS

The number of 4imensions and size of the array in the main program and that in the subprogram should not be different.

* FSUB201.FOR

*

* DIMENSION X(100) NP = 2 X (1)

1

X (2)

3

SUMY END

SUMX(NP,X)

* FUNCTION·SUMX(NP,X) PARAMETER (NP = 20) DIMENSION X(NP) RETURN END ®

111

The program name should not conflict with any local name in the calling program.

*

*

FSUB202.FOR

* LOGICAL X,ERROR X = ERROR WRITE(*,*)X END

* * 951

LOGICAL FUNCTION ERROR WRITE (*,951) FORMAT(' Error in program') ERROR = .TRUE. RETURN END

® A function should not call itself.

* * *

FSUB203.FOR FUNCTION SUMX(NP,X) DIMENSION X(NP) N1 = 1 N2 = N1 + 1 SUM = SUMX(N1,N2,X) RETURN END

.

112

COMPUTER ApPLICATIONS IN CHEMISTRY

2.6 Subprogram

® The dummy argument should not be in the COMMON statement.

* * *

FSUB204.FOR DIMENSION X(lOO) COMMON NP NP'=,2 X(l) 1 X(2) 3 SUMY SUMX(NP,X) END

* *

* FUNCTION SUMX(NP,X) COMMON NP DIMENSION X(NP) RETU~N

END Comment: NP is both in COMMON and as dummy argument.

Subroutine Subprogram If the number of output arguments is more than one, a subroutine subprogram is used instead of function subprogram. For example, the least squares analysis of bilinear data outputs slope, intercept and correlation coefficient. Nowadays the preference is to develop subprograms of 50 to 100 source lines to increase clarity and to follow the implementation of algorithm step by step. In spite of the availability of several subroutine packages, there is always a need to develop subprograms to solve specific tasks. A suit of programs for chemical tasks in chemical kinetics, equilibrium chemistry, quantitation etc. using mathematical and statistical procedures are given from Chapter 3 onwards.

Working Details of Subroutine Subprogram The mainline program is executed from top to the statement at which the subroutine is called (Fig. 2.6.3). •

The current statement is remembered by storing the program counter.

• • •

Execution of mainline is halted and the control is transferred to the subroutine. The arguments for which the values are known are the input arguments. Subroutine subprogram is executed up to return statement. The values of output arguments are available through the corresponding variables in the mainline.



The control is transferred to the main line.



Execution of the main line resumes at the succeeding statement to the call statement.



The mainline continues till stop statement.

2.7 Data

113

FORTRAN STATEMENTS

Main Line

Subroutine LXYl(N,X,Y,MAX)

Call LXYl(N,X,Y,MAX)

----RETURN ~ Subroutine

LLSI (X,Y,N,SLOPE,CEPT,CC,LLS)

Call LLSl(X,Y,N,SLOPE,CEPT,CC,LLS) ~

~RETURN STOP Fig. 2.6.3 : Program Execution with Subroutines

Features of Subprogram The actual arguments in the main program are constants, variable, arithmetic expressions, array elements or intrinsic functions. The values of actual arguments are not passed to the dummy arguments in the subroutine. The information about the location of memory allocations of actual arguments is transferred to the subroutine. Thus only addresses of memory locations are available in the subprogram. The values for the corresponding dummy arguments are then fetched. The arguments in the call statements and those in the subprogram should match in number, order, TYPE and precision. The normal mode of exit from the subprogram is executing the return statement. A stop statement in subprogram not only stops the execution of subprogram but it terminates the mainline. If subprograms are developed with adjustable array dimension, they are less prone to run time errors. Then the actual size of the array is an argument.

2.7 DATA

I

DATA Statement Data is the first executable statement in a program. It (DATAOOl.FOR) is an abridged form of several assignment statements (ASSIGN006.FOR).

* * *

DATA001.FOR DATA SUMX,SUMY,SUMXY,SUMXXj4*O.Oj WRITE(*,*)SUMX,SUMY,SUMXY,SUMXX END

114

2.7 Data

COMPUTER ApPLICATIONS IN CHEMISTRY

* * *

ASIGN006.FOR SUMX = 0 SUMY = 0 SUMXY = O· SUMXX = 0 WRITE(*,*)SUMX,SUMY,SUMXY,SUMXX END

Data statement is also used to develop unit, identity and zero (DATA002.FOR) matrices.

* * *

DATA002.FOR

S

DIMENSION UMAT(5,5) DATA UMAT/25*O.O/ WRITE(*,5) ((UMAT(I,J) ,J. =1,5) ,1=1,5) FORMAT(lHO,15X,20F5.1) STOP END

1.0000000

.0000000

.0000000

.0000000

The data statement is convenient for variables whose values are unchanged throughout the program. Such variables include logical record numbers of sequential/random access files and chemical/physical constants (DATA003.FOR and DATA004.FOR).

* DATA003.FOR

* *

REAL N DATA N,C/6.03E23,l.0E10/ WRITE(*,*)N,C .END 6.030000E+023

1.000000E+0 10

* D~TA004.FOR

* *

REAL MWOX Data MWOX/126.06/,NIH/2/ DATA PKW,DH20/13.987,77.2/ WRITE(*,*)MWOX,NIH,PKW,DH20 END 126.0600000

2

13.9870000

77.2000000

2.7 Data

115

FORTRAN STATEMENTS

Do's of DATA ./ If more than one variable have the same value, they can be declared in a DATA statement. ./

If all the variables except one or a few have the same magnitude, the entire set is initiated first. Then those variables which have different values are redefined (DATA005.FOR). Instead of two data statements they can be clubbed together (DATA006.FOR).

* DATAOOS.FOR

* *

DIMENSION VOL(5) DATA VOL/5*3.0/ DATA VOL(4),VOL(3)/2.9,3.1/ WRITE(*,*)VOL END

*

* *

DATA006.FOR DIMENSION VOL(S) • DATA VOL/S*3.0/, VOL(4),VOL(3)/2.9,3.1/ WRITE(*,*)VOL 3.0000000 3.0000000

3.0000000

3.1000000

2.9000000

./ Variables of even different mode can be declared in a data statement.

* DATA007.FOR

*

* DATA SUMX/49.S/,NP/10/ AVE = SUMX/NP WRITE (*,117) SUMX ,AVE,NP FORMAT(lHO,lOX,F10.5,F10.2,I3) END

117

./

49.50000

4.95 10

Hexadecimal values can be assigned to integer variables. They are specified by the letter Z followed by one to four hexa decimal digits.

* *

DATA008.FOR

* 162

DATA lA, IN,I6/Z0016,ZABB, ZOOlA/ WRITE(*,162)IA,IN, IC FORMAT(//' IA = ZOQ16 =',15//' IN * 'IC = zOOlA =',IS) END

ZABB

, ,15/ / ' ,

116

COMPUTER ApPLICATIONS IN CHEMISTRY

2.7 Data

Numbers Hexa decimal: 0 1 2 3 4 5 6 7 8 9 ABC D E F Decimal: 0 1 23456789 10 11 12 13 1415

Don'ts of DATA Statement ® Dummy arguments are not allowed in data statement

* * *

DATA10l.FOR DATA TNP, AVE I 6.0,O.2/,TOT/43.21 SUMX == TNP * AVE END

1 *

2 * 3 * 4

DATA10l.FOR

DATA TNP, AVE! 6.0,O.2/,TOT/43.21 ***** Error 77 - constant expected ***** Error 38 - "I" expected ***** Error 38 - "I" expected 5

SUMX

6

END

TNP * AVE

The variable TOT is not used anywhere in the program except in data statement. So it is a dummy variable

® The number of variables and constants should not be different.

* DATA102.FOR

* *

DATA ABSOR, EPSI,CONC 10.567,6376.01 END

1 * 2 * 3 * 4

DATA102.FOR DATA ABSOR, EPSI,CONC 10.567,6376.0/

***** Error 79 - number of variables does not match 5

END

2.7 Data

FORTRAN STATEMENTS

117

Comment: There are three variables and two constants. ® The mismatch of the type of variables and their corresponding constants is invalid.

*

*

DATA103.FOR

*

*

*

DATA ABSOR/1/ DATA NIP!2. 0/ WRITE{*,*) NIP,ASORB END The variable name and specified constant do not agree in mode.

1 2 * 3 * DATA103.FOR 4 * 5 DATA ABSOR/l/ 6 DATA NIP/2.0/ ***** Error 833 - cannot convert constant 7 WRITE(*,*) NIP,ASORB 8 END

Comment: ABSORB is real variable and 1 is integer constant; NIP is integer variable and 2.0 is a real constant. ® Data statement cannot be succeeded by a dimension or type specification.

* * *

* *

DATAI04.FOR DATA VOL/5*50./,PH/3*2.0,3.,2*4.00/ DIMENSION VOL(5) ,PH(6) WRITE(*,*)VOL(2) ,PH(2) END DATA statement cannot be preceded by a dimension or type specification statement.

1 * DATA104.FOR 2 * 3 * DATA VOL/5*50./,PH/3*2.0,3.,2*4.00/ 4 ***** Error 79 - number of variables does not match ***** Error 79 - number of variables does not match DIMENSION VOLtS) ,PH(6) 5 ***** Error 100 - statement ord'er WRITE(*,*)VOL(2),PH(2) 6 END 7

118

COMPUTER ApPLICATIONS IN CHEMISTRY

2.8 Stop, Parameter

® Variables appearing in COMMON cannot be initiated by Data Statement.

* DATA105.FOR

* *

*. *

COMMON VO DATA VO/50.0/ TV =VO END Variables appearing in common cannot be initiated by Data statement.

12.8 STOP, PARAMETER ~ 2.8.1 STOP It is an executable statement but optional. It can appear anywhere in the program. The syntax of the

statement is STOP A FORTRAN program runs and exits normally to the operating system even without a STOP statement. It is used to terminate the execution and thus represents the logical end of the program.

Do's of STOP Statement v' Stop statement can be optionally followed by a five digit number, or a character constant or string.

* * * 101 203

v' If

STOP001.FOR STOP 989 STOP 'ABC' STOP 12345 END

Then

Argument is present in STOP statement It is displayed on the screen when the program terminates

Else

Message 'STOP - Program terminated' is displayed.

STOP002.FOR is a complete FORTRAN program but does nothing.

* * * 101

STOP002.FOR STOP 'ERROR IN INPUT' END

2.8 Stop, Parameter

119

FORTRAN STATEMENTS

./ More than one STOP statements in a program are permissible (STOPO03.FOR).

* *

STOP003.FOR

* STOP 111 STOP 99999 STOP

'ABC'

END

./

A STOP statement in a subroutine/function subprogram terminates the job. It is generally used to abort the run when a fatal error is encountered.

Don'ts of STOP Statement ®

More than five digits in the number following STOP is invalid. *

STOP1010 FOR

* *

2

STOP 'ABC'

145

STOP 9999999

345

STOP 'DSJSDJFSJFSJ99999,

23

STOP 'JLLHL

56

STOP, 367 END

1 * 2 *

STOP101.FOR

3 * 4 2 5 145

STOP 'ABC' STOP 9999999

***** Error 13 - too many digits in constant

6 345 7 23

STOP 'DSJSDJFSJFSJ99999, STOP 'JLLHL

***** Error 15 - character constant not closed

STOP, 367 8 56 constant expected 77 Error ***** END 9

/

120

2.X Stop. Parameter

COMPUTER ApPLICATIONS IN CHEMISTRY

® The Quotes for the character constant or string is mandatory.

* * *

1 * 2 * 3 *

STOPI02.FOR

101

STOP102.FOR

CHARACTER*10 CH CH = 'ABC' STOP CH ***** Error 77 - constant expected 7 101 STOP 'EBROR ~N INPUT' END 8 4

CHARACTER*10 CH CH = 'ABC' S'T'OP CH STOP 'ERROR IN INPUT' END

5 6·

Type

Name CH

Offset P Class

CHAR*10

16

STOP should not be used after return in a subprogram.

®

* STOPI03.FOR

* * 341

RETURN STOP END

1* 2* 3* 4 ***** Error 127 5341 6

STOP103.FOR RETURN RETURN not allowed here STOP STOP

Explanation: RETURN statement passes control to the calling program and so the next statement cannot be executed.

RETURN It is an executable statement, used in subroutine/function subprogram.

This statement instructs the computer to go back to tbe program that invokes or calls it. Thus the normal way of terminating the processing of subprogram is through RETURN.

Do's of RETURN Statement ./

More than one RETURN statement is valid.

Don'ts of RETURN Statement ./

® RETURN is not permitted in mainline.

* *

*

RETI01.FOR T = 273.16 WRITE(*,*)T RETURN END

2.8 Srop, Parameter

FORTRAN STATEMENTS

121

END It is a non-executable statement and occurs only once in a mainline, subroutine or function subprogram. It is thus the last physical statement of every program but not the physical end of the job. The syntax is END It indicates the compiler that there are no more FORTRAN statements for translation into machine code. There is no op code generated for END, as it is a non-executable FORTRAN statement.

Do's of END St:rtement ./ The shortest program ever possible is ENDOO1.FOR.

* *

END001.for

* END

Don'ts of END Statement ® Continuation lines for the END statements are invalid

* * *

ENDI01.FOR VOL = 50 END * 'END OF PROGRAM'

1

*

2 *

ENDI01.FOR

3 *

4 VOL = 50 5 END 'END OF PROGRAM' ***** Error 23 - extra characters at end of statement ® More than one END statement results in ignoring all succeeding statements after the first END.

* * *

ENDI02.FOR PH = 5.5 END FH = 10.**(-PH) END

122

COMPUTER ApPLICATIONS IN CHEMISTRY

2.8 Stop, Parameter.

1 *

2 *

END102.FOR

3 *

PH = 5.5 END

4 5 Name

Type

Offset P Class

PH

REAL 6 ***** Error 34 ***** Error 70 7

-

16 FH = 10.**(-PH} identifier already declared more than one main program END

. ®

A END statement label is not permitted for END

* *

END103.FOR

* 101 END

1 * END103.FOR 2 * 3 * 4 101 END ***** Error 89 - unrecognizable statement 5

***** Error 91 - missing END statement

2.8.2 PARAMETER It is a non-executable statement and comes after IMPLICIT type declaration and before explicit type declaration. When there is no TYPEIDIMENSION statement in the program, it should be before the first executable statement. It declares the numerical/logical values of variables/ constants frequently used in the program. The general format is PARAMETER (NAME 1 = CONSTl [,NAME2 = CONST2][,NAME3 = CONST3]) The advantage of parameter statement is that whenever the size of the array is changed, only the values in the parameter statement are altered. It avoids editing errors. It is used to declare the size of arrays in the main program (PAROOl.FOR), to declare physical or chemical constants whose values cannot be calculated by simple number crunching (PAROO2.FOR), to declare polynomial coefficients of McClure series, to calculate transcendental, logarithms etc. and to specify tolerance, maximum number of iterations etc. in mathematical computations (PAR003.FOR).

2.8 Stop, Parameter

123

FORTRAN STATEMENTS

* * *

PAR001.FOR PARAMETER (MAX = 100) DIMENSION ABSORB (MAX), CONC (MAX) ABSORB(l) = 0.50 CONC(l) = 0.5E-03 WRITE (* , *) ABSORB(l), CONC (1) END

* PAR002.FOR

* *

PARAMETER (C = 2.99792458) PARAMETER (PLANK = 6.63E-34, AVGAD WRITE(*,*)C,PLANK,AVGAD,FARAD END

6.602E-23, FARAD

* PAR003.FOR

* *

PARAMETER(TOL = 1.0E-6,MAXIT WRITE(*,*)TOL,MAXIT END

100)

Do's of PARAMETER ./

More than one constant (PAR002.FOR) can be declared in single parameter statement.

./ ./

More than one parameter statement (PAR002.FOR) can be used . Logical constants can be declared by a parameter statement (PAR004.FOR).

* PAR005.FOR

*

* LOGICAL QUAD,CUBIC PARAMETER(QUAD = .TRUE.,CUBIC WRITE(*,*)QUAD,CUBIC END

.FALSE. )

96500 )

124

COMPUTER ApPLICATIONS IN CHEMISTRY

Don'ts of PARAMETER ® Real Constant should hot be declared by parameter statement. *

PAR10l.FOR

* PARAMETER ( Al WRITE(*,*)Al END

1 * 2 3

-10.0)

PAR10l.FOR

*

PARAMETER ( Al = -10.0) ***** Error 22 - integer constant expected WRITE(*,*)Al 4 END 5

* * *

PAR102.FOR INTEGER*4 NP PARAMETER(NP = lE3) DIMENSION X(NP) X(l) = 112 END

1 * 2 * 3 * 4

PAR102.FOR

INTEGER*4 NP PARAMETER (NP = lE3) ***** Error 833 - cannot;: convert constant DIMENSION X(NP) 6 ***** Error 22 - integer constant expected expected ***** Error 26 X(l) = 112 7 END 8 5

")

II

2.8 Stop, Parameter

2.8 Stop, Parameter

®

FORTRAN STATEMENTS

Arithmetic expression is not allowed in parameter statement.

* *

PARI03.FOR

* PARAMETER(NEXP = 5,NP1=50) PARAMETER(NP = NEXP*NP1) WRITE(*,*)NEXP,NP1,NP END

1 * PAR202.FOR 2 * 3 * PARAMETER (NEXP = 5,.NP1=50) 4 PARAMETER(NP = NEXP*NP1) 5 ***** Error 89 - unrecognizable statement WRITE(*,*)NEXP,NP1,NP 6 7 END

125

13.1 MATRIX OPERATIONS' Matrix algebra provides powerful tools to implement many mathematical and statistical procedures whose results are of chemical significance. The basic subroutines required to implcmcnt matrix operations are described here.

Length of a Vector Euclidean norm (II . II) or the length of a vector is equal to the square root of the sum of squares of the elements of the vector. It is calculated as the square root of the product of a vector and its transpose. . V

=

[VI V2 V3 .............. ]T

IIv II

=

~VT * V

A point in 2D space is represented by [VI,V2] and the length of the vector is equal to

IIvW

=

[VI V2]

* [~~J =

V12+ V2

2

;

IIvll

= ~VI2+ V2 2

Calculation of Euclidean norm is implemented in NORM.FOR

* NORM. FOR

*

NORM.DEM

*

* REAL FUNCTION NORM(V,N) DIMENSION V (20) ZNORM = O.

00 10

* *

10 I = 1,N

ZNORM = ZNORM + V(I) * V(I) CONTINUE NORM = SQRT(ZNORM) END

REAL NORM DIMENSION V(20) V(l) 1. V(2) = 2. V(3) = 3. ZNORM = NORM(V,3) WRITE(*,*)ZNORM END $INCLUDE : 'NORM. FOR' .

3.1 Matrix Operations

127

SOFTWARE METHOD BASE

Dot Product of Two :Vectors The dot product of two column vectors VI and V2 (of size NxI) is a scalar and'is equal to the sum of products of corresponding elements of V I and V2 (DPV 1V2.FOR)

Y

VI

=

[VII V2I V3I V4I ........... VNI

V2

=

[VI2 V22 V32 V42 ........... VN2]T

DPVIV2

=

Vl.V2

= VI T * V2 =

N

LVIi

* V2i

i= 1

* * *

* *

DPVIV2.DEM

DPVIV2.FOR

* DIMENSION Vl(20) ,V2(20) Vl(l)

1.

Vl(2)

2.

Vl(3)

3.

V2(1)

2

V2(2)

4

V2(3)

6

10

SUBROUTINE DPVIV2(Vl,V2,V3,N) PARAMETER (MAX =20) DIMENSION VI (MAX) ,V2(MAX) V3 = O. DO 10 I = l,N V3 = V3 + Vl(I) * V2(I) CONTINUE END

CALL DPVIV2(Vl,V2,V3,3) WRITE(*,*)V3 END $INCLUDE :

'DPVIV2.FOR'

Cross Product of Two Vectors The cross product (V3) of two vectors VI and V2 is

* V2(3) - VI(3) * V2(2)] VI(3) * V2(I) - VI(!) * V2(3) VI(l) * V2(2) - VI(2) * V2(l) VI(2)

V3

=

[

The vector cross product in higher dimensional space is complicated but can be calculated. The geometric interpretation (Fig. 3.1) of the dot and cross product of vectors can be understood from the relationships Vi.V2

=

IIVil1

*

1IV211

* Cos (J

Vi®V2

=

IIVill

*

IIV211

* Sin (J*Z"

1\

Z is a unit vector of length 1 and is perpendicular to the plane containing two vectors.

128

COMPUTER ApPLICATIONS IN CHEMISTRY

* CPV1V2.DEM

* *

DIM~NSION

Vi (1)

V1(20) ,V2(20),V3(20)

1.

vl (2)

2. Vl(3) 3. V2(1) 2 V2(2) 4 V2(3) = 6 CALL CPV1V2(V1,V2,V3) WRITE(*,*) (V3(I),I 1,3) END $INCLUDE : 'CPV1V2.FOR'

* * *

CPV1V2.FOR SUBROUTI~E

PARAMETER DIMENSION V3(1) V3(2) V3(3) RETURN END

CPV1V2(V1,V2,V3) (MAX=20) Vl(MAX),V2(MAX) ,V3(MAX) V1(2)*V2(3) - Vl(3)*V2(2) V1(3)*V2(1) - Vl(1)*V2(3) V1(1)*V2(2) - Vl(2)*V2(1)

VI

1\

,"

,,'

Z

Fig. 3.1 : Geometric Representation of Two Vectors

3.1 Matrix Operations

3.1 Matrix Operations

SOFTWARE METHOD BASE

129

Angle Between Two Vectors The angle (8) between two vectors VI and V2 is related to the dot product and lengths of the vectors by the formulae. VI * V2 Cos () =

* 1IV211

IlvI11

Cos ()

=

II Vi 112 + IIV211 2 -IIVi - v211 2 2

* IIv/1I * IIV211

* *

*

10

* * *

ANGV1V2.FOR SUBROUTINE ANGV1V2(V1,V2,N,MAX) REAL NORM DIMENSION V1(MAX),V2(MAX) DIMENSION V1MV2(100) V1NORM = NORM(V1,N) V2NORM = NORM(V2,N) DO 10 I = 1, N V1MV2(I) = V1(I) -V2(I) CONTINUE V1V2NORM NORM(V1MV2,N) COSTH1 (V1NORM **2 +V2NORM **2 -V1V2NORM ** 2)/ * (2*V1NORM*V2NORM) THETA1 ACOS(COSTH1) CALL DPV1V2(V1,V2,V3,N) COSTH2 = V3/V1NORM/V2NORM THETA2 = ACOS(COSTH2) WRITE(*,*)THETA1,THETA2 WRITE(*,*)COSTH1,COSTH2 END

ANGV I V2.DEM

$DEBUG DIMENSION VI (20),V2(20),V3(20) VI(l) = 1. VI(2) = 2. VI(3) =3. V2(l) =2 V2(2) =4 V2(3) = 6 CALL ANGVIV2(VI,V2,3,THETA) END $INCLUDE : 'ANGV I V2.FOR' $INCLUDE : 'NORM.FOR' $INCLUDE : 'DPV I V2.FOR'

1.0000000 9.999999E-OOI .0000000 3.452670E-004

130

3. J Matrix Operations

COMPUTER ApPLICATIONS IN CHEMISTRY

Transpose of Matrix Transpose operation of A involves making the rows of A as the columns of AT. The transpose is represented as AT,

A, A'. An element of AT is equal to A(J,I) (MTRANS.FOR). 1.1

*

10 20

2.1 3.1

MTRANS.FOR

* *

SUBROUTINE MTRANS(A,ROWA,COLA,AT} INTEGER ROWA,COLA PARAMETER (MAX = 20) DIMENSION A(20,20},AT(20,20} DO 20 I = 1,ROWA DO 10 J = 1,COLA AT(J,I}= A(I,J} CONTINUE CONTINUE END

* *

1.1

1.2

1.1

2.1 3.1

1.2 2.2 3.2

2.1 2.2

3.1 3.2

1.2 2.2 3.2

MTRANS.DEM

* DIMENSION A(20,20},AT(20,20},ATT(20,20} A(l,l) 1.1 A(2,1} 2.1 A(1,2} 1.2 A(2,2} 2.2 A(3,1} 3.1 A(3,2} 3.2 M

3

N

2 CALL MPRIN(A,M,N}

CALL MTRANS(A,M,N,AT} CALL MPRIN(AT,N,M} CALL MTRANS(AT,N,M,ATT} CALL MPRIN(ATT,M,N} END $ INCLUDE

'MTRANS.FOR'

$ INCLUDE :

' HPRIN . FOR'

3.1 Matrix Operations

131

SOF1W ARE METHOD BASE

* *

MPRIN.FOR

* SUBROUTINE MPRIN(A,ROWA,COLA) CHARACTER* 20 FMT INTEGER ROWA,COLA DIMENSION A(20,20) FMT = '( IX, 8G8 .2) , WRITE ( * , 951 ) DO 30 I = 1,ROWA WRITE (*,FMT) (A(I,J) ,J =l,COLA) CONTINUE FORMAT (!) END

30 951

Transpose of a transposed matrix (AT) T is the original matrix. Thus (AT) T - (A) is a zero matrix of the same size. The sum of transpose of a skew symmetric matrix and the original one is a unit matrix. The transpose of the product of two matrices A and B is equal to the transpose of B post multiplied by transpose .

~~

(A

* Bl

= BT

* AT

If X is a rectangular matrix both XT *X and X*XT are square matrices and are called information matrix

and dispersion matrix, respectively. They are also called row wise or column wise covariance matrices and are useful in factor analysis. Nowadays, to avoid truncation errors, Singular Value Decomposition of X is used in least squares, orthogonal polynomials, eigen vector analysis etc.

Diagonal of Matrix The diagonal of a square matrix (A) is a vector of size equal to the dimension of ~

*

* *

10 951

DIAGX.FOR SUBROUTINE DIAGX(A,M,N,DIAG) PARAMETER (MAX = 20) DIMENSION A(MAX,MAX),DIAG(MAX) IF (M .NE. N)THEN WRITE(*,951) STOP ENDIF DO 10 I = 1,N DIAG(I) A(I,I) CONTINUE FORMAT(' It is not a square matrix'/lx, * , So, Diagonal is not possible'/) END

132

COMPUTER ApPLICATIONS IN CHEMISTRY

3.1 Matrix Operations

• * * *

1.1 2.1

DIAGX.DEM

DIMENSION A(20/20) /DIAG(20) A(l,l) 1.1 A(2/1) 2.1 A(1,2) 1.2 A(2,2) 2.2 A(3,1) 3.1 A(3/2) 3.2 CALL MPRIN(A,2/2) CALL DIAGX(A/2,2,DIAG) CALL MPRIN(DIAG,2/1) CALL MPRIN(A,3,2) CALL DIAGX(A,3/2/DIAG) END $ INCLUDE 'MPRIN.FOR' $INCLUDE : 'DIAGX.FOR'

1.2 2.2

1.1 2.2

1.1 2.1 3.1

1.2 2.2

3.2

It is not a square matrix So, Diagonal is not possible

Stop - Program terminated.

Trace of Matrix The trace of matrix is equal to the sum of diagonal elements.

*

* *

TRACE.DEM

DIMENSION A(20,20) A(l,l) 1.1 A(2/1) 2.1 A(1,2) 1.2 A(2/2) 2.2 A(3,1) 3.1 A(3/2) 3.2 CALL MPRIN(A,2,2) T TRACE(A,2/2) CALL MPRIN(T,l,l) CALL MPRIN(A,3/2) T = TRACE(A/3/2) END $ INCLUDE 'TRACE. FOR' 'MPRIN.FOR' $ INCLUDE 'DIAGX.FOR' $ INCLUDE 'SUM1.FOR' $ INCLUDE

* * *

1.1 2.1

TRACE. FOR FUNCTION TRACE(A,M/N) PARAMETER (MAX = 20) DIMENSION A (MAX, MAX) ,DIAG(MAX) CALL DIAGX(A,M/N/DIAG) TRACE = SUM1(DIAG,N,20) RETURN END 1.2 2.2

3.3 1.1 1.2 2.1 2.2 3.1 3.2 It is not a square matrix So, Diagonal is not possible

Stop - Program terminated.

3.1 Matrix Operations

133

SOFTWARE METHOD BASE

.*

SUM1.FOR

* *

FUNCTION SUM] (X,N,MAX) DIMENSION X (MAX) SUMl = O. DO 10 I = 1,N SUMl = SUMl + X(I) CONTINUE RETURN END

10

Determinant of Matrix The detenninant of a square matrix is useful to test the presence of linear dependence of rows or columns. A matrix with zero determinant is called singular and it has no inverse. However, in many real data sets, the determinant is not zero but is a very small quantity rendering it to be nearly singular. For such matrices, inverse is calculable but unreliable. The regression coefficients calculated from this inverse matrix are in high error.

* * *

DET.DEM DIMENSION A(20,20) A(l,l) 1 A(2,1) 0 A (1,2) 0 A(2,2) 1 A(3,1) 0 A(3, 2) 0 A(1,3) 0 A(2, 3) 0 A (3,3) 1 DETl DET2 DET3

DET(A,l,l) DET(A,2,2) DET(A,3,3) CALL MPRIN(A,l,l) WRITE(*,*)DETl CALL MPRIN(A,2,2) , WRITE(*,*)DET2 CALL MPRIN(A,3,3) WRITE(*,*)DET3 CALL MPRIN(A,2,3) DET4 = DET(A,2,3) WRITE(*,*)DET4

END $INCLUDE 'DET.FOR' $INCLUDE : 'MPRIN.FOR'

1.0 1.0000000

1.0 .00

.00 1.0 1.0000000

1.0 .00 .00

.00 1.0 .00 1.0000000

.00 .00 1.0

.00 .00 1.0 .00 1.0 .00 NOT A SQUARE MATRIX Stop - Program terminated.

134

COMPUTER ApPLICATIONS IN CHEMISTRY

3.1 Matrix Operations

The determinant of 2 x 2 or 3 x 3 matrix is calculated in DET.FOR.

*

DET.FOR

* *

951

FUNCTION DET(A,NC,NR) PARAMETER (MAX =20) DIMENSION A(MAX,MAX) IF (NR .NE. NC)THEN WRITE(*,951) STOP ENDIF IF (NR .EQ. l)THEN DET = A(l,l) ENDIF IF (NR .EQ. 2)THEN DET = A(l,l)*A(2,2) - A(l,2) * A(2,l) ENDIF IF (NR . EQ. 3) THEN DET A(l,l)*(A(2,2)*A(3,3) - A(2,3) * A(3,2)) DET DET - A(l,2)*A(2,l)*A(3,3) - A(2,3) * A(3,l) DET DET + A(l,3)*(A(2,l)*A(3,2) - A(2,l) * A(3,l)) ENDIF FORMAT(' NOT A SQUARE MATRIX ') RETU~N

END

Initiation of Matrix Initiation of a third order tensor is given in ZER03.FOR Two-dimensional zero, unit and identity matrices can be generated _ using the subprograms ZEROS.FOR, ONES.FOR and EYE. FOR, respectively (Chapter 2.3).

* * * *

30 20 10

ZER03.FOR SUBROUTINE ZER03(A,I,J,K,Z3) PARAMETER (MAX=20) DIMENSION A (MAX, MAX, MAX) I 2 J = 3 K = 4 ZERO = 0.0 DO 10 I1 = 1,1 DO 20 12 = 1, J D030I3=l,K A(I1,I2,I3) ZERO CONTINUE CONTINUE CONTINUE END

3.1 Matrix Operations

135

SOFTWARE METHOD BASE

Addition of Matrices Two matrices of same size can be added or subtracted. Elements of the resulting matrix are the sum or difference of the corresponding elements of the augend and addend matrix. MADD I.FOR is a subroutine subprogram for the addition of two matrices.

* * *

20 10

MADD1.FOR SUBROUTINE MADD1(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) DO 10 I = l;ROWA,l DO 20 J = l,COLA C(I,J)= A(I,J) + B(I,J) CONTINUE CONTINUE RETURN END

*

*

MADD1.DEM

* DIMENSION A(20,20) ,B(20,20),C(20,20) A(l,l)=1.1 A(2,l)=2.1 A(l,2) = 1.2 A(2,2) =2.2 A(1,3) =1.3 A(2,3) =2.3 B(1,l)=8.8 B(2,l)=7.8 B(l,2) = 8.7 B(2,2) =7.7 B(l,3) ;=8.6 B(2,3) =7.6 CALL MPRIN(A,2,3) CALL MPRIN(B,2,3)

* CALL MADD1(A,B,C,2,3,2,3) CALL MPRIN(C,2,3) END $ INCLUDE 'MADD1.FOR' $INCLUDE : 'MPRIN.FOR'

1.1 2.1

A 1.2

2.2

1.3 2.3

B

8.8 7.8

8.7 7.7

9.9 9.9

9.9 9.9

8.6 7.6

C

9.9 9.9

136

COMPUTER ApPLICATIONS IN CHEMISTRY

3.1 Matrix Operations

MADD I.DEM is driver routine to test addition operation. The program fails when the matrices are not compatible for addition. In MADD2.FOR the 'rows and columns of addend and augend matrices are tested for equality.

* MADD2.FOR

* *

SUBROUTINE MADD2(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB LOGICAL ROWCOMP,COLCOMP DIMENSION A(20,20) ,B(20,20) ,C(20,20) ROWCOMP = .FALSE. COLCOMP = .FALSE. IF (ROWA .EQ. ROWB )THEN ROWCOMP = .TRUE. ELSE WRITE(*,lOOl) ENDIF IF( COLA .EQ.COLB)THEN COLCOMP =.TRUE. ELSE WRITE(*,1002) ENDIF IF (ROWCOMP .AND. COLCOMP)THEN

* * *

MADD1.FOR -------

108 106 *

DO 106 I = 1,ROWA,1 DO 108 J = 1,COLA C(I,J)= A(I,J) + B(I,J) CONTINUE . CONTINUE

*

1000 1001 1002

ELSE WRITE(*,1000) RETURN ENDIF FORMAT(lHO, ' SO MATRICES ARE NOT COMPATIBLE FOR ADDITION') FORMAT (lHO, ' ROWS OF A ARE NOT EQUAL TO ROWS OF B') , COLUMNS OF A ARE NOT EQUAL TO COLUMNS OF B') FORMAT ( RETURN END

3.1 Matrix Operations

137

SOFTWARE METHOD BASE

Error messages Rows of A are not equal to rows of B

**** ****

Columns of A are nQt equal co columns of B

are displayed in case of incompatibility. Otherwise matrix addition using MADD I is performed. This is an example of a tiny knowledge based numerical program.

Multiplication of Two Matrices It is similar to vector dot product. Two matrices A and B are compatible for multiplication only when the number of columns of A is equal to the number of rows of B. The product A *B is read as A is post multiplied by B or B is pre-multiplied by A. The element Cij of the matrix C resulting from A *B is the inner product of ith row of A with/h column of B (Chart 3.1). Chart 3.1 : Matrix Multiplication

ll

b l2

b l3

b 21

b 22

b 23

b 31

h32

b 33

][

b

[

all

a l2

a l3

a 21

a'2'2

a 23

[

*

]

=

all *b ll +a I2 *h21 +a 13 *h31

all

* h 12 + a l2 * h22 + a l3 * b32

* hll + a 22 * b 21 + a 23 * h31

a 21

* hl2 + a 22 * b 22 + a 23 * b 32

a 2l

ACOL

C II

=

Lalk

*

b kl

=

!..=I

[all

a l2

a l3 ]

*

[h"] b l2

=

]

a(1,:)

* be:,!)

a(2,:)

* be:, I)

a(1,:)

* b(:,2)

b l3

ACOL

=

I

L a Zk

*

bkl

*

b k2

1..=1

ACOL

La

1..=1

lk

=

I

138

COMPUTER ApPLICATIONS IN CHEMISTRY

ACOL

C 22

=

a L 2k k=!

*

b k2

=

[a 21

a 22

a 23 ]

*

=

La'k k=1

*

b k3

[all

a l2

a lJ ]

*

=

a L 2k k=!

=

* b(:,2)

a(2,:)

[h"] b 23

=

a(l ,:)

* b(:,3)

=

a(2,:)

* b(:,3)

b 33

ACOL

C 23

b 22

b 32

ACOL

C I3

[h,,]

3.1 Matrix Operation.1

*

b k3

=

[a 21

a 22

a 23 ]

*

[hU] b 23

b 33

The multiplication of two vectors can be performed using MMUL1.FOR. The dimensions of a column vector are given as rowvl while that of row vector lcolv. The normal equations in linear least squares for bivariate data can be calculated from matrix operation. If X and Y are column vectors of bivariate data and one is a vector of same size containing 1.0 then the design matrix is fone, xJ.

* *

*

MMUL.DEM DIMENSION A(20,20) ,B(20,20) ,C(20,20) INTEGER ROWA,COLA,ROWB,COLB WRITE(*,*) 'ROWA,COLA,ROWB,COLB' READ(*,*)ROWA,COLA,ROWB,COLB READ(*,*) ((A(I,J) ,J = 1,COLA),I=1,ROWA) READ(*,*) ((B(I,J) ,J =l,COLB),I =l,ROWB) CALL MPRIN(A,ROWA,COLA) CALL MPRIN(B,ROWB,COLB)

CALL MMUL1(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) CALL MMUL2(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) CALL MMUL3(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) END 'MMUL1.FOR' $ INCLUDE 'MMUL2.FOR' $INCLUDE 'MMUL3.FOR' $INCLUDE 'MPRIN.FOR' $ INCLUDE

A 2.0 4.0

1.0 3.0 B

6.0 8.0

5.0 7.0 C 19. 43.

22. 50.

19. 43.

22. 50.

19. 43.

22. 50.

3.1 Matrix Operations

* *

*

* *

*

SOFTWARE METHOD BASE

MMULl.FOR SUBROUTINE MMULl(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20),B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) STOP ENDIF DO 10 I=l,ROWA DO 20 J=l, COLB C(I,J)=O.O DO 30 K=l,COLA C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT (lHO, 'MATRICES A AND B ARE NOT COMPATIBLE FOR' *' MULTIPLICATION '/,' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END

MMUL2.FOR

SUBROUTINE MMUL2(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) RETURN ENDIF DO 50 I = l,ROWA DO 50 J = l,COLB 50 C(I,J) = O. DO 10 J=l,COLB DO 20 K=l,COLA DO 30 I=l,ROWA C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT(lHO,!MATRICES A AND B ARE NOT COMPATIBLE FOR' * 'MULTIPLICATION '//,' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END

139

140

3.1 Matri:r Operations

COMPUTER ApPLICATIONS IN CHEMISTRY

* MMUL3.FOR

* *

SUBROUTINE MMUL3(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) STOP ENDIF DO 50 I = 1,ROWA DO 50 J = 1,COLB 50 C(I,J) = O. DO 10 K=l,COLA DO 20 I=l,ROWA DO 30 J=l, COLB C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT (lHO, 'MATRICES A AND B ARE NOT COMPATIBLE FOR' *' MULTIPLICATION ' j , ' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END

MMUL1.FOR uses inner product to calculate the multiplication of two matrices. The programs MMUL2.FOR and MMUL3.FOR are based on middle product and outer product. All these programs result in identical numerical results. But the speeds of execution of the source codes are widely different depending upon the architecture of the computer.

Application of Matrix Multiplication In a two-dimensional coordinate system, point rotation keeping the axis fixed and axis rotation keeping the point fixed are useful in chemical interpretations like group theory and factor analysis. Formulae for the transformations are given in Chart 3.2. Chart 3.2

XNEW=T*X;

Point rotation XNEW w.r.t. stanadard frame

[

cos ()

sin () ]

-sin () cos ()

C·8~) 1.24

X=

G)

() = 30°

Frame/axis rotation X w.r.t. frame

[

cos () -sin () ] sin ()

cos ()

(-O.13J 2.24

3.1 Matrix Operations

SOFTWARE METHOD BASE

Rota~ion

of the coordinate system through multiplication ofU with the data matrix (d).

r

U XZ

* *

=

l

It

141

radians about z axis can be achieved (XYZROT.FOR) by

cos(u)

sin(u)

-Si:(lI)

cO:(lI)

o~

1 J

u*X XYZROT.FOR

* DIMENSION U(20,20) ,V(20,20) ,W(20,20),E(20,20) DIMENSION A(20,20) ,ARZ(20,20) ,ARY(20,20) ,ARX(20.,20) WRITE(7,*) 'M,N' READ(*,*)M,N AROW = M ACOL = N READ (*, *) ( (A (1, J) , J= 1, N) , I ~1, M) CALL MPRTN(A,M,N) WRITE(*,*) 'ZR,YR,XR' READ(*,*)ZR,YR,XR WRITE(*,*)ZR,YR,XR CALL EYE(E,3,3) U ( 1, 1) COS (ZR) U(1,2) SIN(ZR) U(2,1) -U(1,2) U(2,2) U(l,l) CALL 11PRIN(U,N,N) V(l,l) COS (YR) V (1, 3) SIN (YR) V(3,1) -V(1,3) V(3,3) V(1,l) CALL MPRIN(V,N,Nl W(2,2) COS(XR) W(2,3) SIN(XR) W(3,2) -W(2,3) W(3,3) W(2,2) CALL MPRIN(W,N,N) CALL MMUL1(U,A,ARZ, N,N,ROWA,COLA) CALL MMUL1(V,A,ARY, N,N,ROWA,COLA) CALL MMUL1(W,A,ARX, N,N,ROWA,COLA) CALL MPRIN(ARZ,N,3) CALL MPRIN(ARY,N,3) CALL MPRIN(ARX,N,3) END

142

COMPUTER ApPLICATIONS IN CHEMISTRY

3.1 Matrix Operations

Similarly rotation about Y (through v radians) and X (through W radians) axes can be performed making use of V and W matrices. COS

V

= [

-Si~

(v)

(v)

o

sin (v) ]

1

0

o

cos (v)

A knowledge-based program MATKB.FOR diagnoses the type of matrix.

* MATKB.FOR

* *

SUBROUTINE MATKB(A,ROWA,COLA) CHARACTER *40 INF1,INF2,INF3,INF4,FMT INTEGER ROWA,COLA DIMENSION A(20,20) A(1,l) =1 INF1 'Scalar' INF2 'Row Vector' INF3 'Column Vector' INF4 'Matrix'

* 901

FMT = '(12X,A40)' FORMAT ( 'Since No. of Rows: ',13,' and No. of Columns: ',13!)

* IF (ROWA .EQ. 1 .AND. COLA .EQ.1)THEN WRITE(*,FMT)INF1 ENDIF IF (ROWA .GT. 1 .AND. COLA .EQ.1)THEN WRITE(*,FMT)INF2 ENDIF IF (ROWA .EQ. 1 .AND. COLA .GT. 1) THEN WRITE(*,FMT)INF3 ENDIF IF (ROWA .GT. 1 .AND. COLA .GT. 1) THEN WRITE(*,FMT)INF4 ENDIF WRITE(*,901)ROWA,COLA RETURN END

3.1 Matrix Operations

SOFTWARE METHOD BASE

* *

143

MATKB.DEM

* DIMENSION A(20,20) INTEGER ROWA,COLA A(l,l) =1 ROWA 1 COLA 1 CALL MATKB(A,ROWA,COLA) ROWA = 4 COLA = 1 CALL MATKB(A,ROWA,COLA) ROWA = 1 COLA = 4 CALL MATKB (A, RO~!A, COLA) ROWA = 5 COLA = 6 CALL MATKB(A,ROWA,COLA) END $INCLUDE : 'MATKB.FOR'

Sorting of One- and Two-Dimensional Arrays Estimation of robust statistics like median or Inter Quartile Range (IQR) requires sorting of univariate data. Further, finding the minimum and maximum of the data is a prerequisite in the detection of outliers and preparation of control charts. Printing of X, Y graphs demands sorting of Y in descending order followed by obtaining the X values corresponding to Y values. Minimum Value of Vector Assume the first element X(l) of the array to be the minimum. It is compared with the second element X(2). If the second element is less than the first, then the second one is the minimum. Otherwise the first one is the minimum. The comparison is continued for all the data points. The flow chart (Fig. 3.2), program (MIN 1. FOR) and the results for exhaustive test data set (Table 3.l) follow.

* * *

MINl.FOR

MINIMUM OF TWO NUMBERS

SUBROUTINE MINI (X,N,IMIN,XMIN) PARAMETER (MAX = 50) DIMENSION X (MAX) XMIN = X(l) IF(X(2) .LT. XMIN)THEN XMIN X(2) IMIN = 2 ENDIF END

144

3.1 Matrix Operations

COMPUTER ApPLICATIONS IN CHEMISTRY

MINI

XMIN=X(l)

.

XMIN=2 IMIN=2

RETURN Fig. 3.2 : Minimum of Two Numbers

* *

MIN1.dem

*

PARAMETER (MAX = 50) DIMENSION X(MAX) WRITE(*,901) READ{*,*)X(l) ,X(2) IMIN = 1 CALL MIN1(X,2,IMIN,XMIN) WRITE{*,951) IMIN,XMIN 901 FORMAT(/' GIVE X(l) AND X(2} FORMAT (! " ELEMENT ( , ,11, , ) 951 END $INCLUDE : 'MIN1.FOR'

: '\} , , Gl0. 2,' IS MINIMUM ')

3.1 Matrix Operations

145

SOFTWARE METHOD BASE

Table 3.1: Output of MIN1.For to Some Typical Data GIVE X(l) AND X(2) : 6.0,8.0 ELEMENT (1) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 8.0 6.0 ELEMENT (2) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 6.0 6.0 ELEMENT (1) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 0.0 0.0 ELEMENT (1) : .00 IS MINIMUM GIVE X(l) AND X(2) : -lE-30 -2E-30 ELEMENT (2) : .20E-29 IS MINIMUM

Consider a vector of size NP x 1. The minimum of this array is obtained by the above algorithm, except repeating the process of comparison NP times. The number of elements of the array which is the minimum of the vector is also marked. A FORTRAN subprogra~ (MIN2.FOR) effects the algorithm (Table 3.2). Table 3.2: Algorithm for Rnding the Minimum of an Array

I 1=1

X(I)polgn X 2... . X N are distinct values of X in the interval XL and XU. YI> Y2... YN are the corresponding values of a continuous function of X. If the interpolated argument, XIE satisfies the condition XL < XIE < XU, cal~ulation of YIE is termed as direct interpolation. In inverse interpolation, the value of XIE is computed for a desired value of YIE. The key to the success of interpolation is invoking an appropriate polynomial.

I 6.1

LINEAR AND QUADRATIC METHODS _

6.1.1 Linear Interpolation , Linear interpolation considers a polynomial of first degree, a straight line. Two points (Xi, YI) and (X2' Y2) are chost:n such that XI < XIE < X2. Since the three points are collinear (Fig. 6.1), the slopes calculated from any two points (XI and 'X 2 or XI and XIE) are equal (Deriv. 6.1). 2

2

_~ *I1 Y- L..J Y i

i=I

.,

}

=I

XIE -Xi X-X I

... 6.1

J

j,,-; When the range of X is large, linear interpolation is less reliable. The truncation error in approximating the quadratic function to a linear one is E_trunc = (XIE - X k)*(XIE - X k+l) where X k and X k+1 are the two points chosen from the data for interpolation.

214

COMPUTER ApPLICATIONS IN CHEMlsTf'{y

6.1 Linear and Quadratic Methods

0.9 0.8 0.7

>-

0.6 0.5

0.4 0.3 0.2 0

0.2

0.4

x

0.6

0.8

Fig.6.1: Unear Interpolation

6.1.2 Quadratic Interpolation The inaccuracy of linear interpolation is sUrmounted by adopting the polynomial of second degree P2(X) = a2

* X2 + aJ * X + a o.

ao, aJ and a2 are calculated such that the polynomial passes through the expet;imental points. The coefficients are used to compute the function at XIE (Deriv. 6.2). Obviously, at least three points are needed in quadratic interpolation. Although choice of the three points is arbitrary, one of them should be smaller than XU and the other larger than XIE (Fig. 6.2).

A smooth function is more accurately interpolated by higher order polynomial. A piece-wise lower order polynomial (straight line, Fig. 6.3) results in inaccurate values. A function with sharp comers or rapidly changing higher order derivatives is stiff. A higher order pelynomial approximates stiff function less accurately. Sometimes lower order polynomials work better. Some exponential or rational functions are smooth, but are badly approximated by higher order polynomials. There are some pathological functions for which any interpolation scheme fails . For example, I(X) =3*X2+~. *ln~1t" _X)2}+ 1 is singular at X = n. Any interpolation based on the values X = 3.13, 3.14, 3.15, 3.16 results in a wrong value for X = 3.1416. However a graph of these five points loo~ very smooth (Fig. 6.4).

6. J Linear and Quadratic Methods

>-

215

NUMERICAL INTERPOLATION

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

>-

0.5

0.4

0.4

0.3

0.3

0.2

0.2

, 0.1

0.1 0 0.5

X

0

· 0.5

· X

Fig. 6.2: Choice of Data Points for Quadratic Interpolation

*

Fig 6.3: Errors Due to Piecewise Linear Interpolation of a Non-linear Function

216

COMP'UTER ApPLICATIONS IN CHEMISTRY

6.2 Lagrange and Modified Procedures

30.55

[a] 30.8

[b]

30.5

30.6

~30.45

>.

30.4 30.4 30.2 3.12

3.13

3.14

30.35 3.135

J:]6

3.15

x

(a) Smooth [3.12 to 3.16]

3.14 3.145 x (b) Valley [3.135 to 3.145]

30.5

30.45

[c]

~--"'"

30.4

30.35 3.141

3.142 3.141.5 x (c) Break down [3.141 to 3.142]

Fig. 6.4 : Zooming Effect on the Path of Pathological Function in Different X Ranges

I 6.2 LAGRANGE AND MODIFIED PROCEDURES

~

6.2.1 Lagrange Interpolation Quadratic Interpolation is also inadequate as real data sets follow oblique trends. A cubic polynomial interpolation is a better choice using four points. Lagrange proposed the general formula for interpolation employing nth order polynomial passing through all n+ 1 experimental points. The polynomial given in Eq. 6.5 is called Lagrange polynomial (Deriv. 6.3) . .

Y

=

3

3

j; l

j ;j

XIE-X

Lll X

j

-

... 6.5

J*Yj Xj

j~i

_ X )* ~ Yj /(XIE - Xj) YlE-Il(xIE k .£.. N POL + I

II (X -xJ j

j;l

j ~l

... 6.6

6.2 Lagrange and Modified Procedures

217

NUMERICAL INTERPOLATION

The advantage of the method is that functional relationship between X and Y need not be known. However, knowledge of the order of polynomial is a prerequisite. The limitation of the method is that it does not give any error estimate. Further, different sets of data points produce different interpolated values. For example, the interpolated values (Fig. 6.5) obtained with the first four and last four points are substantially different.

2

2

o

o

-2

-2

o

2

4

6

2

4

6

o

2

4

6

2

o -2

Fig. 6.5 : Effect of Data Range on the Interpolated Value

6.2.2 Modified Lagrange Formula The difficulties in selecting the experimental points are obviated in modified Lagrange method. M2 M2 /(XIE X) YIE= II(XIE-Xkf L/~2 k=Ml I=Ml II(x;-xJ

-;

j=Ml

i#;

Ml and M2 are the initial and final points of the chosen segment of the data set with the interpolated M2

argument as centroid. The product

II(XIE-Xk ) ~l

tends to minimum resulting in very low error. .

The algorithm and Fortran program for any desired order of the polynomial are given in Chart 6.1 and Appendix 6.2.

218

COMPUTER ApPLICATIONS IN CHEMISTRY

6.2 Lagrange and Modified Procedures

Chart 6.1: Algorithm of Lagrange Interpolation

Step 1

x, Y are vectors of size NP x

1

NPOL : Order of polynomial M 1 & M2 first and last points of the segment XIE : Interpolated argument . Step

a

YIE f - - - 0.0

P

f---

1.0

M2

Step 1·

pf--

fI (XIE-X

k )

MI

Step 2

For i = M 1 to M2 calculate TERM 1

~ (XIE-X.)

Evaluate TERM =

TERM! -:M=2:-----

fI(Xj - XJ j=ml )

YIE Step 3

.".

= YIE + TERM

YIE*P

Example 6. 1 Data for a quadratic function, Y=X2 (Chart 6.2) are simulated in the X range of 1 to 6. LANIN < Ll. DAT at the DOS prompt gave interpolated values without error when at least three points are considered. Chart 6.2 : Interpolation Results for a Quadratic Function

2

2 Ll.DAT 6 1

1

2 3 4

4 9 16

5 6 2 1.5 4.5

25 36 2.25 20.25

6.2 Lagrange and Modified Procedures

=

XIE

DATA USED IN INTERPOLATION

K

219

NUMERICAL INTERPOLATION

.1500E+01

YIE

K

DATA USED IN INTERPOLATION

YIE

1

1

TO

2

2.500

2

1

TO

3

2.250

3

1

TO

4

2.250

4

1

TO

5

2.250'

5

1

TO

6

2.250

XIE

K

=

DATA USED IN INTERPOLATION

.4500E+01

YIE

K

DATA USED IN INTERPOLATION

YIE

, 1

1

TO

2

11. 50

2

1

TO

3

20.25

3

1

TO

4

20.25

4

1

TO

5

20.25

5

1

TO

6

20.25

Example 6.2 Protonation constants of ligands can be calculated by graphical interpolation of pH on the formation curve (pH versus nbarH) at nbarH = 0.5, 1.5 etc. Since nbarH is a function of pH, the problem turns out to be an inverse interpolation. The interpolation of experimental formation function data (Chart 6.3) indicates that a minimum of four data points are required. The stepwise stability constants of metal ligand complexes can also be calculated by using this program considering nbar versus pL. C:\CAC> Lanin-

0.01

0.005 0 -0.5

0.5

0 X

246

COMPUTER ApPLICATIONS IN CHEMISTRY

7.4 Integration o/Ordinary....

Chart 7.8

If

Then If

Then If

Then

Integrand is sharply concentrated in one or more peaks Transform the function into differential equations. Location of singularity is known Divide the interval into two parts at the point of singularity. There are breaks, bumps or singularities at unspecified point integration is difficult.

7.4 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS' Change in the concentration of a first order reaction with time

dP / dt=k, *(a - x) is a first order differential equation because it contains a first order derivative. The x is the concentration of P at time t and ks is the specific rate constant. The thermal decomposition of phenyl n-butyl diazirine (A) in DMSO is a two-step consecutive unimolecular reaction and can be represented as a set of first order linear differential equations (Chart 7.9). The solution can be readily sought as the rate constants are of similar magnitude. On the other hand the reduction of TI(III) to TI(I) by Fe(II) (Chart 7.10) can be modeled by a set of first order coupled differential equations.

Chart 7.9 A~B~P

d[A] dt B:

P:

. d[B] dt

-kl [ A ] , - -

1 - Phenyl-I-diazo pentone Carbon, N z kl =6.8*10-4 S-I ; k z = 2.2*10-4

S-I

7.4 Integration of Ordinary ....

NUMERICAL INTEGRATION

247

o.ort7.10 Fe(/I)+Tl(ll/) ~ Fe(JIl)+Tl(IJ)

~ Tl(JI)+Fe(IJ)~Fe(III)+Tl(J) Kinetic pattern can be modeled by a set of first order coupled differential equations

d[Fe(III)] k) *[Fe(II)r [Tl(III)}-k 2 * [Fe(III)r [Tl(II)]+k 3 *[Fe(II)r[Tl(Il)]

dt

d[Tl(I)] k *[Tl(Il)r[Fe(Il)]

dt

3

Since the initial concentrations of reactants, products and the intermediate and the rate constants are known at t =0, the solution of linear ordinary differential equations (ODE) represents concentrations of the species at desired time. Schroedinger wave equation is a second order differential equation relating the potential energy and total energy with distance. A vibrating diatomic molecule can be modeled using harmonic oscillator approximation to the potential energy. Diffusion (u) of a solute in a solvent and movement of blood/nutrients in human· body are represented by partial differential equations. The solution of differential equation (s) from a knowledge of u and its derivative with time describes the propagation of u with time. d 2u dt

--=v 2

or

2*

d2 U

-2

dx

~~ = -~x( D*~: )

Classification of differential equations and the methods in vogue are described in Chart 7.11. The solution of a set of ODES by numerical techniques follows.

248

COMPUTER ApPLICATIONS IN CHEMISTRY

7.4 Integration of Ordinary....

Chart 7.11

Differential Equations

Ordinary Differential Equations

Initial Value

Euler

Partial Differential Equations

Two Point Boundary

RK

Bullirsch

Elliptic

Hyperbolic

Parabolic

Taylor's Series

Predictor Corrector

/\ /\ 15th Order'

4th

Order

7th Order

Modified Gill

ABM

RKF

20th Order

Hamming Milne

7.4.1 EULER METHOD A simple first order differential equation (Eq. 7.3) can be approximated to the forward difference (Eq. 7.4).

dy

- = F(x,y)

... 7.3

dy dx

... 7.4

dx

Ay

Yi+I-Yi

Ax

Xi+! -Xi

where Yi and Yi+1 are values of the function for the limit,

Yi+1 - Yi

Ax

Xi

and

Xi+h

respectively. When

is equal to the derivative. The finite difference X,+i

- X; is

Xi+1 - Xi

(= Ax) approaches

denoted by step size (h). Then

Eq. (7.4) becomes Eq. (7.5).

. .. 7.5

249

NUMERICAL INTEGRATION

7.4 Integration of Ordinary ....

In this case the initial value Yo at x

=Xo is known. The features and algorithm are given in Chart 7.12. The

solutions of typical ODEs are depicted in Table 7.7. The results for the ODE, dy

dx

= X+ y

in the x range

0.0 to 1.0 using Euler method are given in Table 7.8. The values of y by Euler method (ycal) and those by analytical solution (yexact) are comparable for h values of the order 10-5 ensuring the applicability of the numerical procedure. Chort7.120

Assumption First order differential equation is approximated to forward difference formula. Result It estimates the value of the function (Y) at any desired value of X.

Applications It is the basis of other methods like RK, BS etc. It demonstrates the principle by a simple procedure and thus has only pedagogical value. Limitations It is least accurate and the method is not stable. Accumulated round off errors result in inaccurate end results.

Chart7.12b

Step 0 Step 1

i = 0; step_ = 0; The derivative is extrapolated to find the next values at YI Yl = Yo + h

Step 2

* F(xo,)'o);

i = i + 1; Xi =

Xo

+i

h

h=2

*h

Yi = Yi-I + h*F(Xi_J,Xi)

Step 3

If

Yi - Y,-l

> TOL

Then go to step 2

< TOL

Then go to step 4

Yi-l

If

I ~:i-ll Yi

Step 4

step_ = step_ + 1; If step_ > 20, then Stop, Else go to step 1

Step 5

it = it + I; If It > 20, then Stop, Else go to step 0

250

7.4 Integration o/Ordinary....

COMPUTER ApPLICATIONS IN CHEMISTRY

Table 7.7 : Solution of typical ODEs

ODE

Solution

dy -=x+y

Y=exp(x)-x-I

dy -=x dx

x y=2

dy 2 -=x dx

x y=-

dy 3 -=x dx

y=-

dy 4 -=x

y=-

dx

dx

2

3

3

X

4

4

X

5

5

dy Table 7.8: Solution of - = x + y by Euler Method dx

RE

x

Yeal

Yexaet

0.1

0.0051655

0.0051709

0.10435

9.7656e-005

0.2

0.021391

0.021403

0.055726

9.7656e-005

0.3

0.049839

0.049859

0.039656

9.7656e-005

0.4

0.091796

0.091825

0.031729

9.7656e-005

0.5

0.14868

0.14872

0.027063

9.7656e-005

0.6

0.22207

0.22212

0.024031

9.7656e-005

0.7

0.31368

0.31375

0.021936

9.7656e-005

0.8

0.42545

0.42554

0.020428

9·.7656e-005

0.9

0.5595

0.5596

0.019313

9.7656e-005

1.0

0.71815

0.71828

0.018477

9.7656e-005

h

Taylor's Series It offers solution to any differential equation, provided, the derivatives of the functions can be calculated. The value of y (x) at x = Xo is known and that at x = Xo + h is obtained by expanding it by Taylor's infinite series. . .. 7.6

7.4 Integration of Ordinary....

NUMERICAL INTEGRATION

251

Here TE is truncation error. If the first and second derivatives are considered, Eq. 7.6 reduces to Eqs. 7.7 and 7.8, respectively. . .. 7.7

.:.7.8 If the derivatives are calculated then, Y (xo + h) can be obtained. Chart 7.13 incorporates the applications and limitations of the method. Comparison of Eqs. 7.7 and 7.5 reveals that Euler method can be considered as Taylor's series truncated to first order terms. The only difference is that Euler's method approximates the derivative to finite difference.

Chart 7.13

• • • • • •

Assumption The differential equation has derivatives and the function is continuous. Result The solution is accurate only when truncation error is minimum. Application Using tinite difference method, Taylor's series of 15 th or 20th order is employed to solve kinetic equations. It is of pedagogical value. Limitation Higher order derivatives of many non-linear functions are difficult. Remedy Runge-Kutta and Bulirsch-Stoer methods.

7.4.2 RUNGE-KUTTA METHOD Fourth or seventh order Runge-Kutta (RK) method is one of the best choices to solve a system of linear differential equations. It is successful even for rough and discontinuous profiles in adoptive step size mode. Further modification is rapid and crosses singular points within the interval of integration. Algorithm The relation between the ordinates )'0 and )'i is linear in the Euler's method (Eq. 7.4). Thus j(x, y) is the slope. RK method employs the principles of modified or improved (polygon) Euler's method (MEUM). In the modified method the average of slopes at Xo and (xo + h) is used, while in the polygon method, the average of the points Xo and Xo + h is considered (assuming the linear relation between two points). The formula for second order RK method (Eq. 7.10) 1

Yi+l =Yj+-(k 1+kz) 2

... 7.10

can be derived (Chart 7.14) from the modified Euler's formula (Eq. 7.9). The corresponding equations for fourth order (RK4) are given in Chart 7.·15. RK4 has lower truncation errors than the second order method.

252

COMPUTER ApPLICATIONS IN CHEMISTRY

7.4 Integration o/Ordinary ....

Cheri 7.14: Derivation of RK Method The modified Euler's formula is

Yi+l = Yi+!!:.* [F(Xi'Yi) + F(Xi+l'Yi+l)] 2

... 7.9

Substituting Eqs. 7. 5 in 7.9, gives

Y,+I

= yj+~*[ h * F(x"y,) +% F(XI+I'Y,+I)] 1

r h*F(x"y;) +"2[F(x,+h),y;+hF(X)]J h l

= y;+"2*L Let kl Then

= h*F(x"y,) y,

+ 1

and k z

h = "2*[F(x;+h),y,+h*F(y.)]

= y, + ~ (k l + kJ

... 7.10

Chart 7.15 : RK4 Method Basis •

It employs Modified Euler's steps between two successive points (X;, XH1 ). Derivatives . d at .. , I (X) ' at th 'd ' (Xi +2Xj +1) are 0 b tame lilltla ;, filOaI (X) ;+1 and tWice e fill pomt

I]

Formulae = F( x,y );

12 =F(x+hl2,y+t/2); 13 = F( x+hl2, y+t 212); 14 = F(x+h, y+t 3 ) ; y

i+l

t]

=h * I];

t2 =h* 12; t3 = h * 1 3 ; t4 =h* 14;

=y.+1.((t] +2*t 2 +2*t? +t4 )/6.) I 2 -



Result Accuracy depends upon the step size. The solution is fairly reliable. Features



Order of RK4 method corresponds to approximately the order of Taylor's series.



Evaluation of derivatives of the function is not required. It uses only the information from the immediate preceding point.



It succeeds even for an intransient problem. In such cases BS method fails. Advice



BS method can be used when a highly accurate solution is needed.

7.4 Integration of Ordinary....

253

NUMERICAL INTEGRATION

results (yRK4)of RK4 for dy = x+ y given in Table ·7.9 are close to the analytical solution dx (yexact) with h values of the order 10-2 • RK4 method converges faster than Euler procedure, since the stap size (h) in RK4 is lar.ger tl-jan that in Euler (\ 0- 5) . The

Table 7.9 : Solution of dy = x + y by RK4 Method

dx

X

YRK4

Yexact

RE

h

0.1

0.0051709

0.0051709

6.8 I 39e-006

0.025

0.2

0.021403

0.021403

3.6388e-006

0.025

0.3

0.049859

0.049859

2.5894e-006

0.025

0.4

0.091825

0.091825

2.0718e-006

0.025

0.5

0.14872

0.14872

6.9522e-006

0.05

0.6

0.22212

0.22212

9.2428e-006

0.05

0.7

0.31375

0.31375

1.0438e-005

0.05

0.8

0.42554

0.42554

1.1 I 18e-005

0.05

0.9

0.5596

0.5596

1. I 54e-005

0.05

1.0

0.71828

0.71828

1.1827e-005

0.05

Bulirsch-Stoer (BS) procedure is a hybrid method that exploits the basic philosophies of midpoint procedure, Richardson's deferred approach and rational function. It is reliable and yields accurate solution of ODE with minimum computational effort. The heuristics for selection of this procedure are given in ,Chart 7.16. Monte Carlo (Me) procedure finds application in integrating the function and in solving ODE. Production of high quality pseudo random numbers is the key for the success of MC method. The differential equation corresponding to a first order process can also be viewed from probability point of view and the concentrations of the species at specified time is computed. Chart 7.16

Then

No singularity in the interval (a, b) & Profile is not rough & There are no discontinuities & High accuracy is needed use BS method.

If Then

Extrapolated function has a pole at any of evaluation points Rational function extrapolation fails.

If

Rational function failed & Profile is not rough Use polynomial extrapolation for two steps & Resort to rational polynomial.

If

Then

, 1~.1 ~IGEN VALUES' Eigen value~,also called hidd'en roots, find extensive applications in quantum chemistry, m~ltivariate quantitative analysis and in solution processes. Many physico-chemical phenomena in multi-dimensional ' space can be expressed as eigen problems. Methyl orange exists in two forms (HL and L), which absorb in visible region. Thus. absorbances in the. wavelength range 350 to 600 nm (NWL) at different pH's (NPH) . form a matrix ABSORB of size NPH*NWL. Two eigen values for the data matrix indicate the presence of both the speCies. Similarly, the number of independent reaction paths contributing 'to the rate law is equal to the number of eigen values of kinetic data matrix. -

Geometric Representation of Matrix A matrix can be represented as the coordinates of the points in N-dimensional space. For a 2 x 2 matrix the space is a flat plane and the two vectors originate from the origin of the coordinate system. The absorbance values of solutions at two different pH values and at two distinct wavelengths are represented by a 2 x 2 matrix (A). The rows correspond to the spectra at a given pH and columns correspond to absorbance data at a specific wavelength. The mathematical relationships connecting the extinction coefficients (£) and (2) and concentrations (c) and C2) are given in Eq. 8.1.

...8.1 The size of the data matrix grows when solutions at different pH values are considered or the absorbance values are measured at several wavelengths, but it contains the same intrinsic information, namely, the presence onwo species. Thus, irrespective of the size of the data matrix only two significant eigen values are obtained. All other eigen values are negligible.

255

NUMERIC INTERPOLATION

8.1 Eigen Value.s

Eigen Values of a 2 x 2 Matrix Differentiation of the function [exp(a*x)] results in (dldx)

[exp(a*~)]

= a*exp(a*x)

... 8.2

where dldx is an operator resulting in the same function multiplied by the constant 'a'. The constant is calIed eigen value (A) and the function, eigen vector (E). In general

... 8.3

A*E=A*E

=

* 2

2

* 2

2

A is a N x N real symmetric matrix (A - AT = 0), caIIed Hermitian. It has N eigen values (A!. 1..2, ••• AN) and eigen vectors [EJ, E 2 ,··· EN]. A*E-A*I*E=O (A-A*l)*E=O

... 8.4

One trivial solution is E = 0, which means that the basis vectors reduce to a point which is absurd. Hence a non-trivial solution (E::t 0) is satisfied only when A - 1..*1 = O. It leads to the fact that lA-A *

II ::t O. The

eigen values of A are determined by expanding the determinant in terms of A and finding the roots of characteristic equation. A-A * I =

[ala l

21

a 12 ] a 22

_

A* [10] =0 01

The eigen values indicate the magnitude of stretch or stretch followed by reflection. The eigen vectors are then calculated by substituting the values of A one by one in Eq. 8.4. Eigen vectors are the new (basis) directions along which stretch or reflection takes place. Generally it is not solved as a root finding problem but tackled by advanced algorithms lik,: Given's method or Singular Value Decomposition (SVD). All roots may not be distinct. Some may be nearly zero or relatively small. For a 3 x 3 matrix the characteristic equation is cubic and for an Nth order matrix it is an Nth order polynomial.

256

8.2 Eigen Vectors

COMPUTER ApPLICATIONS IN CHEMISTRY

18.2 EIGEN VECTORS' Normalization of Eigen Vectors The eigen vectors are nonnalized by dividing each element by the length of the corresponding column

(IIEJ)

For a 2D data matrix, the eigen values are equal to the major and minor axes of the ellipse and the data points lie on the ellipse. The eigen vectors represent the direction of the new orthogonal basis vectors and the product of eigen values and eigen vectors also can be represented with in the same ellipse (Fig. 8.1). 2.5

2

1.5

0.5

o -0.5 -1

-1.5

-2 -2.5

-2

-1.5

-1

-0.5

o

0.5

1.5

2

2.5

Fig. 8.1: Collinear Vectors

Example 8.1 1.0000 -0.1340 ] is 30°. Eigen value analysis The angle between column vectors of the data set [ 2.0000 2.2321 shows that the percentage explainabilities (PEs) in the two column spaces are 78.8 and 21.1. A large amount of variance in the data is explained by the first eigen vector, that is first column of the eigen vector matrix.

8.2 Eigen Vectors

0.7071 [ 0.7071

257

EIGEN VALUES

-0.4472J . The residuals with one component are high and they are completely accounted by - 0.8944

the second one. Eigen vectors and eigen values along with the data are pictorially represented in Fig. 8.2. 10 8 6

4 2 0

-2 -4

-6 -8 -10 -8

-6

-4

-2

0

2

4

6

8

Fig. 8.2: Eigen Ellipse for Non-orthogonal Vectors

Example 8. 2 0 1.0 0.0] are orthonormal (i.e., angle between them is 90 ). The eigen value [ 0.0 1.0 and eigen vector matrices are same and numerically equal to identity matrix. The PE of each eigen value is 50. Hence, the eigen vectors in 2D space reduce to a circle (Fig. 8.3).

The vectors of the data set

Example 8.3

The angle between the vectors of 1.0 [ 2.0 the eigen values

[~J

1.0] is zero, which obviously represents a straight line. The first of 2.0

explains 100% variance in data inferring one dimension is sufficient. Fig. 8.1 depicts

E 1, E2 axes and data points in x - y plane. As the minor axis is zero (Janda 2 = 0) the ellipse collapses to a straight line and both the points lie on the EI axis. Thus, transformation of the coordinate axis to eigen vector space results in dimension reduction.

258

8.2 Eigen Vectors

COMPUTER ApPLICATIONS IN CHEMISTRY

0.8

0.6 0.4 0.2

o -0.2 -0.4 -0.6 -0.8 -1 -1

-0.6

-0.8

-0.4

--0.2

o

0.2

0.4

0.6

0.8

Fig. S.3 : Eigen Vectors for Non-orthogonal Vectors

Example 8.4: Methyl Red

The spectra of Methyl red at different pHs (2.0 to 7.0) have one maximum (Fig. 8.4) and eigen vector analysis indicates the presence of two species. The absorbance matrix (Table 8.1 a) is highly correlated (Table 8.1b) and the column vectors are all non-orthogonal (Table 8.1c). The first two eigen values explain 73.77 and 25.11 % of the variance in the data (Table 8.1 d).

Table S.la: Spectra of Methyl Red at Different pH Values Absorbances at the Wavelengths (nm) pH

575

525

475

425

3.4

0.4600

0.3420

0.1080

0.0180

4.6

0.9930

0.7420

0.2820

0.0770

5.4

0.4000

0.3650

0.3100

0.2880

6.2

0.0600

0.1520

0.3200

0.4000

8.2 Eigen Vectors

259

EIGEN VALUES

[a]

[b)

0.8

0.8

~ 0.6

0.6 '

-e

Sl 0.4

.0

~,~,~:~~:-!

0.2

o L.-~_+_--+-.,.---+----'

OL-----if------if------if--~---I

400

450

500

550

600

3

5 pH

4

wI

Fig. 8.4: (a) Absorbtion Spectra. (b) Variation of Absorbance with pH for Methyl Red

Table 8.lb: Correlation Matrix

X2

Xl

1.00

X2

0.99

1.00

-0.13

-0.00

1.00

-0.73

-0.64

0.77

Table 8.le: Angles between Column Vectors

Cl

C2

Cl

0.00

C2

7.78

0.00

41.14

33.37

0.00

1.00

6

7

260

8.2 Eigen Vectors

COMPUTER ApPLICATIONS IN CHEMISTRY

Table 8.ld: Eigen Values and Cumulative Per cent Explainability (CPE)

No

A.

PE

CPE

1

73.77

73.77

2.428

2

2S.77

99.54

0.2963

3

. 0.361

99.9

0.0000S82

4

0.097

100

4.pSe-006

Example 8.5: Acido Basic Equilibria of Chloranilic Acid The spectra of chloranilic acid at different acid concentrations, 3D-surface and 2D-contour are presented in Fig. 8.S. Contour plots are 2D representation of 3D surface in variable axes at iso-response values. Each iso-contour represents the lines of equal responses. It gives quantitative information of the variation of the response with the simultaneous variation ofthe two variables.

o

0

2,------t-----t------, 1.S

0.5

o L..;;~:::......__r_-=:::;;;=-=~F_---.J 250

300

350

400

o

5

10

Fig 8.5: (a) 3D Surface of ChloraniHc Acid, (b) Contour Plot, (c) Spectra at Different Concentrations, (d) Spectra at Different Wavelengths

15

t 261

EIGEN VALUES

8.2 Eigen Vectors

The correlation between column vectors is 1.0 and angle is

Standard Normal Variate Standard normal variate (z,) has a zero mean and unit SD. It is applicable for data set with sample size greater than 30. When NP < 30, even if Zj is less than 2.5, no inference should be drawn regarding outliers. It is calculated as X,-MEANX SO IfZj > 2.5 & NP > 30 Then outlier If ~ < 2.5 & NP > 30 Then X is one among the set

Example 9.5 For the data set X = [-0.90, 0.67, 0.0, 1125.17, -0.67] the statistics are Mean = 224.85, Median = 0.0, SD = 503.29 and Zj = [-0.44855, -0.44543, -0.44677,1.7889, -O.4+:'lj. The outlier (1125.17) invalidates mean and standard deviation. Hence no conclusion is possible regarding outliers.

Mean Deviation Mean deviation is the average of the absolute of the deviations. Multiplication of mean deviation with ~ 7r I 2 ensures that the value approaches the SD for normal distribution.

·1

MEDEV

~ L DEV.! NP

Skewness It is the third moment about the mean. It is a dimensionless quantity indicating the shape of the distribution. A value of zero for skewness indicates that the underlying distribution is symmetrical. However, for any set of measured values, skewness zero is absurd.

If Then

Skewness> 0 Peak of distribution of the data is to the left of the mean & there is smaller scatter of lower values than larger ones

If Then

Skewness < 0 Peak is to the right of the mean & Scatter of the lower values is larger than those of larger ones

If Then Else

Skewness = 0 Distribution is symmetrical Distribution is asymmetrical

(Fig.9.4a)

(Fig.9.4b) (Fig.9.4c)

278

COMPUTER ApPLICATIONS IN CHEMISTRY

9.3 Moments of Data

[b)

[a]

[c)

Fig. 9.4: Skewness is (al Greater than Zero (bl Less than Zero (el Zero

* * *

SKEW. FOR

10

951 952

FUNCTION SKEW(X,N,MAX) DIMENSION X(MAX) REAL MEAN1,MED AVE = MEAN1(X,N,MAX) VAR = O. DO 10 I = 1,N VAR = VAR + (X(I) - AVE) ** 2 CONTINUE SD = SQRT(VAR/(N-1)) ZMEO = MED(X,N,MAX) SKEW = 3.0 * (AVE-ZMED)/SD WRITE(*,951)AVE,ZMED,SD WRITE(*,952)SKEW FORMAT (' Mean ',f10.4,' Median C FlO.4) FORMAT (' Skewness ',F10.4) ·RETURN END

$INCLUDE: $INCLUDE:

'SUM1.FOR' 'MEAN1.FOR'

',FIO.4,

' S D

UNIVARIATE ANALYSIS

9.3 Moments of Data

* * *

279

SKEW. OEM

$DEBUG DIMENSION X(5) DATA X/2.0,3.0,7.0,8.0,10.0/ N = 5 CALL WX1(N,X,N) Z = SKEW{X,N,5) END $INCLUDE: 'SKEW.FOR' $INCLUDE: 'MEO.FOR' $INCLUOE: 'WX1.FOR'

X(2)= X(l)= 2.000 X(4)= X(3)= 7.000 X(5)= 10.00 Mean 6.0000 Median 7.000 • S ];) 3.3912 -.8847 Skewness

3.000 8.000

Kurtosis It is the fourth moment about the mean indicating the sharpness of the peak. This is a measure of peakedness of the distribution near a modal value. It is also a dimensionless quantity (Fig. 9.5) sculates kurtosis and indicates the type of distortion of the distribution. Its demonstration routine KURT.FOR calculates is KURT.DEM

* * *

KURT. OEM

$DEBUG PARAMETER (MAX = 500) REAL KURT DIMENSION X(MAX) CALL IX1(N,X,MAX) CALL WX1(N,X,MAX) Z = KURT(X,N,MAX) END $INCLUDE: 'KURT.FOR' $INCLUDE: 'WX1.FOR' $INCLUDE: 'IX1.FOR'

280

9.3 Moments of Data

COMPUTER ApPLICATIONS IN CHEMISTRY

[b]

[a]

[c] Fig. 9.5: Kurtosis (0) leptokurtic (b) Mesokurtic (c) Platykurtic

* * *

KURT. FOR FUNCTION KURT(X,N,MAX) REAL*4 KURT DIMENSION X(N) SX2

= =

O.

SX4 O. DO 10 I .rsX2

10

= =

SX4 CONTINUE

1,N SX2 + X (I) ** 2 SX4 + X(I) ** 4

9.3 Moments of Data

UNIVARIATE ANALYSIS

KURT

* * '*

281

N * SX4/SX2/SX2 KNOWLEDGE BASE

IF (KURT .LT. 3.) THEN WRITE(*,951)KURT ENDIF IF (KURT .EQ. 3.) THEN WRITE(*,952)KURT ENDIF IF (KURT .GT. 3) THEN WRITE(*,953)KURT ENDIF RETURN 951

FORMAT (' Kurtosis =' ,f8.2,' Platykurtic')

952

FORMAT (' Kurtosis =',f8.2,' Meso or normal kurtic')

953

FORMAT (' Kurtosis =',f8.2,' Lepto kurtic') END

X(l)= 1.000 X(2)= 1.000 X(4)= 1.000 X(3)= 1.000 Kurtosis = 1.00; Platykurtic

Limitations of Skewness and Kurtosis Skewness and kurtosis are less robust than lower moments (mean, SD). So they should be interpreted with caution.

Confidence Interval It is the distance on either side of Xmean (Fig. 9.6a) within which one expects to find the true central value

within a specified probability. The range increases with increase in confidence level (Fig. 9.6b). The confidence interval at the significance level (ex) is given as -Z to +Z assuming a two tailed distribution (Fig. 9.7) The probability off(nding the mean is between the limits -1.959 and +1.959. Z values (Appendix 3) at different ~onfidence levels are given in Table 9.2.

282

COMPUTER ApPLICATIONS IN CHEMISTRY

Mean-3SD

Mean+3SD

Mean [a] 99

95 9

1 °1 99.5 99.9 [b]

Fig. 9.6: Confidence Interval for Univariate Data

Table 9.2: Z Values at Different Confidence Levels

Alpha (a)

% Confidence

Z

level

0.0

100

00

O.l X 10-7

99.999999

5.37

0.01

99.0

2.57

0.05

95.0

1.95

0.10

90.0

1.64

0.30

70.0

1.03

0.50

50.0

0.67

If probability of Z < 5% then the datum can be deleted or retained as an outlier.

9.3 Moments of Data

One Tailed

Two Tailed

Level Rejection

Acceptance

0.05

95%

cz

~ ~ m

;0

0.01

99%

»z

:t>

~

iii

-2.33

-2.33

2.33

Fig. 9.7: 'Rejection and Acceptance Regions for Standard Normallzl Variable tv

00 ....,

\

284

o o

o

COMPUTER ApPLICATIONS IN CHEMISTRY

~

o o

o

t

9.3 Moments ofData

.,

9.4 Robust Methods

285

UNIVARIATE ANALYSIS

Student t-test If the samples are from a normal distribution and the sample size is small (NP < 30), then the student t-test value is used to compute the confidence limits of the mean by the formula LU = X

±

t

* SQRT(SDINP)

The value of t is taken from standard t-tables (Appendix 4) and it depends upon the level of probability (a.) and the degrees of freedom. If a value of 0.95 is used for a., it means that 95% of the observed values will be within calculated limits LV (Table 9.3). The distance on either side of Xmean increases as the confidence one bestows increases. Table 9.3: t-Values at Different Confidence Levels

t-value(DF, a.); a.

% Confidence

5.841

0.05

99.5

4.541

0.Q1

99.0

2.353

0.05

95.0

1.638

0.10

90.0

NP=4

Example 9.6 The mean and SD for ten replicate titrations of anhydrous sodium carbonate with 0.1 mol dm- 3 HCI are 0.1008 and 3.38e-04. The 95% confidence interval for the mean is 0.1005 to 0.1010.

19.4 ROBUST METHODS 1 The statistics discussed so far (mean, SD, skewness and kurtosis) for univariate data are based on the normal distribution of noise, absence of systematic errors and outliers. The methods resistant to distribution of errors and outliers are referred as robust methods, which are classified as distribution free methods (Gnostic) and non-parametric ones (Median, MAD, Box-Whisker plot, etc.). Percentiles The 25th and 75th percentiles are called the first (lower) and third (upper) quartiles. The difference between the third and first quartile is Inter quartile range (IQR), which is a robust estimator of dispersion. Median The 50th percentile is the second quartile or popularly known as median. For a data set in 'ascending order, median is the middle value in the data set with odd number of points while it is the mean of the middle two values for even number of observations. 50% of elements lie below and the other 50% lie above the median. Some of the characteristics of median are given in Chart 9.7.

286

COMPUTER ApPLICATIONS IN CHEMISTRY

9.4 Robust Methods

Chart 9.7 •

Median detects outliers and is robust to them.



It is insensitive to obliqueness of distribution.



It is a maximum likelihood estimator of central tendency for Laplace distribution. It is a biased estimator for normal distribution.



(or) Area of the tail is large 50% or more of the observations are outliers Then Median fails

If

* * *

MED.FOR FUNCTION ZMED(X,N,MAX) DIMENSION X(N) N2

=

N/2

IF (2*N2 .EQ. N)THEN

=

ZMED

O.5*(X(N2) + X(N2+1))

ELSE X(N2+1)

ZMED ENDIF RETURN END

Median Absolute Deviation

all

Median of absolute deviations from the sample median is called Median absolute deviation (MAD). MAD is useful to detect outliers and the breakdown point is 50.

MAD=

X -MED" I

1 *__ _

MED(Res) 0.6745 IF

Res/Med(res) > 5.0

Then Outlier

Here 0.0/0.6745) is correction factor consistent with usual scale of Gaussian distribution. MAD.FOR is the FORTRAN function subprogram for calculation of MAD .



9.4 Robust Methods

UNIVARIATE ANALYSIS

* *

MAD. FOR

* $DEBUG REAL FUNCTION MAD(X,N,MAX) REAL MED DIMENSION X(N) ,RES(200) ZMED = MED(X,N,MAX) DO 10 I = 1,N RES (I) = XII) - ZMED WRITE ( * , *) I, X (I), RES ( I) 10 CONTINUE RESMED = MED(RES,N,MAX) WRITE(*,*)RESMED 'MED/0.6745 DO 20 I = 1,N R = ABS(RES(I)/RESMED) IF (R .GT. 5)THEN WRITE(*,951)I,X(I),R 951 FORMAT(I3,G15.4,F8.2,' OUTLIER') ELSE WRITE(*,952)I,X(I),R 952 FORMAT(I3,G15.4,F8.2) ENDIF ·20 CONTINUE WRITE(*,*)RESMED,ROBVAR END $INCLUDE : 'MED.FOR'

*

* *

UNIVAR.FOR

PARAMETER (MAX=200) DIMENSION X(MAX), CALL IX1(N,X,MAX) CALL ASOR02(N,X,MAX) CALL WXI (N,X,MAX) ZSKEW SKEW (X,N,MAX) ZKURT = KURT (X,N,MAX) END $INCLUDE: 'WXI.FOR' $INCLUDE: 'IXI.FOR' $INCLUDE: 'SUMI.FOR' $INCLUDE; 'SD:FOR' $INCLUDE: 'MEANI.FOR' $INCLUDE: 'ZMED.FOR' $INCLUDE: 'ASOR02.FOR'

287

288

9.4 Robust Methods

COMPUTER ApPLICATIONS IN CHEMISTRY

c:\>univaruniv1.res X(l)= 2.000 ; X(2)= 3.000 ; X(3)= 7.000

X(4)= 8.000;

X(5)= 10.00 6.0000 Mean Median 7.0000 S D Skewness Kurtosis

3.3912 -.8847 1.62 Platykurtic

UNIV AR.FOR is useful for exploratory statistical analysis of univariate ,data. It calls many of the function subprograms discussed earlier and the mode of calling is given in Fig. 9.8.

Hybrid Techniques Median and IQR are biased estimates of central tendency and dispersion, although they are robust. So the power of these point estimators and that of MAD is used to detect chemical outliers as well as data outliers. The outliers are filtered from the data set and the unbiased estimates (mean and SD) are calculated. A combination of biased estimators to detect outliers and unbiased ones to estimate the univariate statistics has become popular as a hybrid technique. IXI

ASOR2

MEAN 1

SD

SUM 1

MEAN 1

SUMI

MEAN 1

SUMI

UNIVAR

SKEW SD

KURT

MEAN 1

KURTKD

Fig. 9.8: Calling Sequence of Subprograms in UNIVAR.FOR

SUM 1

289

UNIVARIATE ANALYSIS

9.4 Robust Methods

Example 9.7: Simulated Data In an utopian situation, if there is no error in replicate measurements, the mean and median \viII exactly be the same and SD will be zero. The residuals with respect to mean and median are also exactly zero. MAD (

MED * -1- MED(Res) 0.6745

=[ Xi -

J)'

IS

. . zero. thus a Not a Number (NaN) as denommator IS

MEAN

SD = .000000

= 1.000000 MEDIAN = 1.000000 Residual wrt

NO

X

1

1.00000

.00000

.00000

NaN

2

1.00000

.00000

.00000

NaN

3

1.00000

.00000

.00000

NaN

4

1.00000

.00000

.00000

NaN

MAD

Mean

Median

Example 9.8: Without Outlier Mean and standard deviation for the replicate determination of potassium acid phthalate with sodium hydroxide indicate that the SD is far less than the mean. It infers that the variation in the replicate measurements is only due to tolerable random errors. There are no outliers since the mean and median are very close. MEAN MEDIAN

NO

X

= =

.033450 .033425

Residual wrt Mean

1 2 3 4 5 6

.03322 .03338 .03340 .03345 .03361 .03364

SD = .000156

-.00023 -.00007 -.00005 .00000 .00016 .00019

MAD

Median -.00020 -.00004 -.00003 .00003 .00019 .00022

1.7826 0.3913 0.2174 0.2174 1.6087 1.8696

Example 9.9: Single Outlier The data are LD-50 activity estimates in log (llmolar concentration). They are assurp.ed to form normal distribution with a variance of 0.7 and are independently distributed. Point 5 is an outlier and mean and median differ by 0.3 units.

290

9.4 Robust Methods

COMPUTER ApPLICATIONS IN CHEMISTRY

data = [5.27, 6.85, 4.94, 5.01, 4.62] MEAN MEDIAN

5.338000 5.010000

= =

NO

SD = .876396

Residual wrt

X

Mean

Median

MAD

1 2

4.62000 4.94000

-.71800 -.39800

-.39000 -.07000

3

5.01000

-.32800

.00000

4

5.27000

-.068"00

.26000

1.0000

1.84000

7.0769

5*

6.85000

1.51200

1. 5000

0.2692 0.0

Example 9.10: Two Outliers Points 6 and 7 of this data set are outliers. After their removal mean and median are equal to first decimal place in the next phase.

MEAN MEDIAN

. NO

= 1. 311428 = 1.1500 X

SD =

.342171

Residual wrt Mean

Median

MAD

1 2

1.05000 1.08000

-.26143 -.23143

-.10000 -.07000

1.4286 1.0000

3

0.7143

1.10000

-.21143

-.05000

4

1.15000

-.16143

.00000

O.

5

1.20000

- .11143

.05000

0.7143

6*

1.70000

.38857

.55000

7.8571

7*

1.90000

.58857

.75000

10.7143

9.4 Robust Methods

291

UNivARIATE ANALYSIS

After Elimination of Outliers X = [1.05,1.08,1.10,1.15.1.20] MEAN MEDIAN

= =

NO

1 2 3 4 5

1.116000 1.100000

SD

=

.059414

Residual wrt

X

1.05000 1.08000 1.10000 1.15000 1.20000

Mean

Median

-.06600 -.03600 -.01600 .03400 .08400

-.05000 -.02000 .00000 .05000 .10000

MAD

1.0000 0.40.00 0 1.0000 2.0000

Example 9.11: Three Outliers There are three outliers out of four data points (> 50 per cent outliers). Thus the break down point of even median has reached and no inferences can be attempted.

x = [4.62, 5.01, 6.89, 7.02] MEAN MEDIAN

NO

=

5.885000

SD = 1.246876

= 5.950000 X

Residual wrt Mean

1 2* 3* 4*

4.62000 5.01000 6.89000 7.02000

-1.26500 -0.87500 1.00500 1.13500

MAD

Median -1.33000 -0.94000 0.94000 1. 07000

1.3234 0.9353 0.9353 1.0647

Box-Whisker Plots The data pertaining to industrial production, environmental or biosamples are analyzed by classical statistics although the errors do not follow Gaussian model and outliers occur frequently. Thus mean and SD are greatly influenced demanding a method which is distribution-free and insensitive to outliers.

292

COMPUTER ApPLICATIONS iN CHEMISTRY

9.4 Robust Methods

Box-Whisker plot displays location (Chart 9.8), spread, skewness (Chart 9.9), tail length and outliers even for a small univariate data set. It is a non-parametric test and thus is resistant to a wrong assumption of the distribution of data.

Chart 9.8: Algorithm of Box-Whisker Plot Step 0 : X : A vector of univariate data set Step 1 : Sorting of X into ascending order Step 2: Finding minimum [X(I)], maximum [X(n)], median Step 3 : The value in the middle to the right and left of median (HU : Upper hinge; HL : Lower hinge) RF=HU-HL Lower fence (LF) = HL - 1.5 * RF Upper fence (UF) = HU + 1.5 * RF Median variability or robust confidence.interval RL = median - 1.57 * RFISQRT (n) RU = median + 1.57 * RF/SQRT (n) Step 4 : Draw rectangular box between the h;nges (HL, HU) Mark the medi"n by a cross bar Draw a horizontal line (Whisker) from HU to UP Draw another horizontal line from HL to LF Mark each value falling outside the range (RL to RU) with a star. Draw vertical lines corresponding to the notches (RL, RU)

Chart 9.9: Inferences Detection of outliers in the lower and upper limits of data set Skewness of the distribution Kurtosis from the tail If left hand section of the box is longer than the right Then distribution is negatively skewed and has tail along left. If right hand section of the box is longer than left Then distribution is positively skewed and has long tail towards right.

Example 9.12 The scatter plot for 41 samples of Arsenic indicates no outliers with the confidence interval 2 to 35 ppm. However, Box-whisker plot prescribes tolerance bounds to be 12 and 26. Now, there af~ more than 20% low-lying outliers. This controversy is as a result of the wrong assumption of the Gaussian distribution indicated in histogram.

Example 9.13 An outlier (0.035) is incorporated in the dataset given in Example 9.8 and the box-whisker plots are drawn for the data with and without outlier (Fig. 9.9).

0.035

0.035

0.0348

0.0348

0.0346

0.0346

0.0344

0.0344

0.0342

0.0342

'"

'"

C1)

..El ~

293

UNIVARIATE ANALYSIS

9 5 Compari~on of Univariate Data Sets

C1)

::!

~

0.034

0.034

0.0338

0.0338

0.0336

0.0336

0.0334 0.0332

?

0.0334 0.0332

Column Number

Column Number

(a)

(b)

Fig. 9.9: Box Plots (a) With and (b) Without an Outlier (0.035)

9.5 COMPARISON OF UNIVARIATE DATA SETS' With Standard Reference Value Today standard reference materials are available from international agencies for almost all types of materials like alloys, timber, bio-fluids, drugs, fertilizers, pesticides or food materials. The analysis of standard material is performed by a prescribed procedure to check the functioning of instrument, skills of the analyst and adaptability of the method. An accredited laboratory analyses field samples to ascertain whether the value significantly differs from the prescribed one. These tasks involve comparison of the sample characteristics with those of population. Chart 9.10 describes comparison of mean of a data set with a standard value.

294

COMPUTER ApPLICATIONS IN CHEMISTRY

9.5 Comparisoll of Ullivariate Data Sets

Chart 9.10 Input

: X = [Xl> X2 , X3 ... XNP]; Xstandard

Object of test Assumption

Ho: Xmean = Xstandard; No outliers

Test selection KB

If NP > 30, then use Z test, else perform t test

z=(

Formulae

HA: Xmean ;j:. Xstandard

Xmean- Xs tan dard SD

DF=NP-l Test Interpretation KB : If

Z (or t) > Z_table (or Uable)

Then

Accept Ho [i.e., data is not different from standard]

Else

Reject

Ho

[i.e., accept HA, i.e., data is different from standard]

Example 9.14 The mean concentration of 10 samples of anhydrous Na2C03 (standard value: 0.1000) titrated ",gainst 0.10 N HCl is 0.1008 with a variance of 1.14 * 10-7 • (-test is applied NP = 10,

( = 0.1008- 0.1000 = ~1.14xlO-7

df = 10 - 1 =9;

0.0008 4 3.376 * 10-

Alpha = 0.05 ;

=2.36 t_ table = 2.262;

t > C table; Ho accepted

Thus, sample analysis is reliable as X = X_standard.

With Another Data Set More often a judicious choice between two analytical procedures, instruments or laboratories is crucial. The two data sets in this case can be considered as two different popUlations with means III and 112 and variances 2 al ami a/. So, the samples of sizes nl and 112 are considered as representatives of two populations. Assuming each population to be homogeneous, the chemical task -is to arrive at the conclusion whether the precision and/or accuracy of the two sets are same or not. A few typical chemical tasks are illustrated in Chart 9.11.

Chart 9.11 • • • •

Calibration of a pipette from two manufacturers Nicotine in blood samples (comparison of precision of GC for two different concentrations) P 20 S in fertilizer (citrate and sulphuric acid methods) Dinitro cresol in herbicides (Polarographic and titrimetric methods)

UNIVARIATE ANALYSIS

9.5 Comparison of Univariate Data Sets



Iodine in soyabean oil (Flame ionization and atomic absorption spectroscopy)



Folic acid in two drugs



Analysis of Europium in radioactive material by two different laboratories



Efficiency of two catalysts in hydrogenation of oils



Comparison of a newly developed method of analysis with a standard one.

295

This comparison requires a priori knowledge of the type of distribution, sample size and independence of the gata sets. The F test for equality of variances of the two samples is given in Chart 9.12.

Chart 9.12 Ho

: Varl =Var2

HA

Varl:t; Var2

Input XI = [XII X~I X 31 ...... ]T X 2 = [X12 X 22 X 32 ....... ]T Calculate Mean l, Mean2, SD l, S02 Step 1 Step 2 : IF varl> var2 Then F =VARlNAR2 DFI = Nl - l; DF2 = N2 - 1 Else F =VAR2NARI DFl =N2 - 1 ; DF2 = NI - I End Step 3

If F < Ftable then accept Ho, else accept HA

If the distribution is normal and outliers are absent, classical Z and t tests are employed to check the

equality of means of the two samples. For data of unknown distribution Wilcoxon,. Cochran and MannWhitney tests (Chart 9.l3) can be used.

Chart 9.13 If Then

small sample, non-normal and independent use u-test of Mann-Whitney

If Then

small sample, heteroscedastic and non-normal use Cochran test

If

samples are related, small samples and non-normal use Wilcoxon - t - test.

Then

296

COMPUTER ApPLICATIONS IN CHEMISTRY

9 5 Comparisoll of Univariate Data Set'

However, in recent times, both types of tests are performed and the contradictory conclusions are rationalized by carefully inspecting the data. The implementation of the formulae with test conclusions is given in Chart 9.14. Chart 9.140: Z Test for Comparison of Means of Two Samples

Input

Xl == [Xli X 21 X 31 X 2 = [X12 X 22 X 32

Objective

...... ]T ...... ]

Xmeanl

Xmean2

Xmean 1

*"

Xmeall2;

Assumption

large sample, independent & normal If Then use Z statistic

Options

One tailed and two tailed test

Formula

Z

Xmeanl- Xmean2

If Z < Ztable, then accept Ho, else reject Ho

Chart 9.14b:

t-test for Comparison of Two Samples

Input : Xl and X z vectors Objective: Ho : Xmean I == Xmean2 HA : Xmeanl

*"

Xmeall2 (Fig.9.3h)

(s~ , s~

)&

If

small sample & homogeneous

Then

normal use t I test

If

(J'try ' (J'zz are unknown, III > 8 & nz> 8

Then

Use t I test

If

Small sample &

Then

Heterogeneous Normal lise t2 test

(s~

*" si)

(Fig. 9.3i) &

ry

If

nl

*"

nz &

s~;:::: S2

Then use t2 test If t < table then accept Ho, else accept HA

297

UNIVARIATE ANALYSIS

9.5 Cornpari.wll oj Univariate Data Seh

Chart 9.14c: Formula base

*

Xmeanl-Xmean2

~(n, -1)* s[ +(n2 -1)* si dlt, =

.

(nl

/nl*n 2*(n,+n 2 -2) 11 1+n2

+ n2 - 2) Xmeanl- Xmean2

Normal

Non Normal

~

Small

Small

Large

Fig. 9.10: Comparison of Means ofTwo Samples Example 9.15

The means and variances of two samples of size 36 and 31 are 42.8,40.3 and 6.75, 4.25, respectively. It is a case of large sample (NP > 30); hence, Z-test is performed for comparison of means. The null hypothesis (Ho) is not acceptable since Z_cal (4.388) :> Z_table (1.645, alpha=O.05, DFI = 35; DF2 = 30). The variances of the two samples are same (Homoscedastic) as F3al (1.5882) is less than the table value (1.84). Example 9.16

In an international comparative study, duplicate samples of per cent nitrogen (Table 9.4) in whole meal flour were analyzed by Kjeldahl method. The means and variances are statistically indistinguishable based on t- and F- tests.

298

COMPUTER ApPLICATIONS IN CHEMISTRY

9.5 Comparison of Univariate Data Set
ttable

Example 9.18

The estimation of concentration of a trace element (Table 9.6) with two different methods shows that the content is same although one method is more precise (smaller variance) than the other. Table 9.6 : Concentrations by Different Methods

2.0374

2.3441

1.8043

2.8163

1.9276

1.5985

2.0174

2.4005

2.0227

2.0297

2.0476

2.7964

1.9871

1.8745

1.9475

2.1487

300

COMPUTER ApPLICATIONS IN CHEMISTRY

1.8495

2.0679

1.9395

1.9750

2.0941

1.6584

1.9796

2.5841

meanx

1.9712

2.1912

var

0.0070

0.1640

np tvalve =-1.8431 Fvalue = 23.5591

95 Comparison of Univariate Data Set'

,

12

12

ttable = 2.0740 ;

xmean 1 = xmean2 is acceptable

alpha=0.05 ; df:22

Since Itvaluel Ftable

Comparison of Means of More than Two Samples When more than two methods or instruments are compared, a simple F-test (Chart 9.15) throws light whether the means are equal or not. However, it does not indicate which sample is different from the rest. Chart 9.15 : Comparison of Means of More than Two Samples

Ho: III = 112 = 113 ... = Ilk; HA

x

Ln, *Xf

:

III

7:

112

7:

113 ...

7:

Ilk

DF= k-l

n k

Ln F=

i

*(XMEAN i _X)2

_i_=I_----:-_-----:-_ _ __

(I ~:)* s/ If F < Ftable Then accept Ho (Means are equal)

Else Reject Ho (Mean.s are not equal)

Example 9.19 The means and variances of three data sets with different sample sizes (np = [10,12,14],) are [234], and [0.60.8 1.2]'. As F3al (15.3191) is less than the table value (19.0) at confidence level (a = 0.05, DFI : 2; DF2 : 2), the means are statistically indistinguishable.

Comparison of Variances of More than Two Samples A procedure or laboratory with lower precision is discarded in int~r comparison studies. Further, the validity of t and Z tests is based on the assumption of equality of variances of data sets. Bartlett proposed a parameter following x2distribution (with (k-l) degrees of freedom)·for testing the equality of variances (Chart 9.16).

301

UNIVARIATE ANALYSIS

9.5 Comparison of Univariate Data Sets

Chart 9.16: Bartlett Test

= [Xmeanl Xmean2 Xmeank] Var2 Vark] = [Varl NP2 NPk] NP = [NPI Hypothesis: Ho: Varl = Var2 = ..... = Vark

Data Mean Variance

p

p

BARTLEIT = I(n;-I)ln(i) - I(n;-I)ln(sh ;=1

S: : Variance of t

procedure with 11; determination

I : DF

11; -

HA

h

:

One or more ofthe equalities fail

BARTLETT parameter follows a X 2 distribution If BARTLETT < Table, then accept HQ , else accept HA •

HA : Varl

'* Var2,* ....... '* Vark

Example 9.20

The SDs for three simulated data sets are tested for their statistical equality. Since the the table value, Bartlett test infers the homogeneity of all the variances.

X2

= 0.0096615

SD

Variance

0.1000

0.0100

0.1050

0.Q11O

0.1020

0.0104

X2 _table = 0.103; df: 2

x 2'value is less than

varl = var2 = var3 ... is acceptable Since X2 X2 _table

Example 9.22 In the determination of molybdenum in molybdite ore by AAS, the percentage SDs of five groups containing different concentration ranges are 6.82,8.44,4.44, 7.13 and 5.80. The Bartlett parameter 5.03 is lower than the table value at 95% confidence limits. Thus, there is no significant difference between the variances between the groups covering different concentration ranges.

I 9.6 ANALYSIS OF VARIANCE' Analysis of a standard material by a number of analysts in different laboratories does not produce identical numerical values even if the same standard analytical procedure is adopted, due to systematic and random factors (Chart 9.17). Chart 9.17

* * *

*

*

Sample · Sampling · Sample preparation Pretreatment · Separation of some compounds · Conversion of a compound into a derivative Calibration · Concentration range · Instrument · Calibration standards Calculation · Algorithm · Software Report · Traditions/Conventions · Human factor

The variation can some times be ignored by inspecting the numbers. But a foolproof and unbiased approach for accreditation purpose is a sound statistical procedure like Analysis of Variances (ANOV A). In chemical analysis, ANOV A is popular under different heads, viz., one way (ANOV A I), two way (ANOV A II) and multiway (MANOVA) depending upon number of influential factors considered are one, two and many, respectively. A few of the recent applications of ANOY A I in food science, environmental pollution, natural raw materials, chemical industry etc., are given in Chart 9.18.

303

UNIVARIATE ANALYSIS

9.6 Analy.,j,,· of Variance

Chart 9.18

*

Standardization · Titration of NaOH with oxalic acid

*

Quality of a chemical compound · AgN0 3 from different manufacturers

*

Efficiency of catalysts · Yield of an organic compound using catalysts

*

Analysis of a trace metal/drug/food product

* *

Replicate samples by different laboratories with same analytical method Food Science · Aflatoxin M I in milk · Lead in milk powder · Fat in dried meat product · Nitrogen in cereal product · Commercial agar powder · Compounded animal feed stuff · So'ya protein · Whole meal flour · Skimmed milk powder

.'"

· Fish meal

*

Trace metals · Soil samples ( 137Cs, 134CS,40K) · Treated wood (Cu, Cr, As) · Radioactive reference material (Eu)

Features ofANOVA The ANOV A separates variation in the response into explainable factors (11) and random effects (E) using NF

a linear additive model Y

= L J.l i + e .

It indicates the influential factors operating on the system.

i~1

The mean of means (grand mean) of the data matrix and total sum of squares (TSS) of deviations of data points from grand mean are calculated. TSS is an algebraic sum of the sum of squares due to factors (SSFACT) attributable to identifiable source of variation and the sum of squares of residuals (SSR) due to random factors. SSFACT represents the variability between different levels of given factors while SSR is the variability within the factors. The repeatability of unexplained variances (SR2) and those between explainable factors (SL2) are calculable, the sum of which is termed as reproducibility of the total system. The relevant formulae to compute sum of squares due to factors (Chart 9.19) and a FORTRAN program ANOV A.FOR are given.

304

COMPUTER ApPLICATIONS IN CHEMISTRY

9.6 Ana/ysi.. of l'ariallLe

Chart 9.190 Data structure for one way ANOV A YII

Y/ z

YJ3 .... Y/,k

Y2I

Yn

Ya

YNP,/ YNP,z YNP,3

.... YZ,k

....

YNPJ

NP : Number of replicates; .k = Number of factors Factors (treatments) : Laboratories, methods, analysts Model Yij = 11 + 11 j

+ e'l

11 : Overall mean·; I1j: Effect of jth factor e lJ : Random error Y 1J : Normally distributed with mean (11, Ilj) and variance a 2 Result 11 + Ilj is estimated by Xmean for jth factor Chart 9.19b : Formulae of one way ANOVA

Algebraic Notation

Matrix Notation

MSSFACT = SSFACT/(K - I)

MSSFACT = MSSFACT/(k - I)

MERSS = ERSS/(TOTOB - K)

MERSS = ERSS/(TOTOB - K)

GMEAN = TSUMITOTOBS

GMEAN = TSUM/TOTOBS

F = MSSFACTIMERSS

F = MSSFACTIMERSS

NP

J= L~J TSUM = L SUMC j SUMC

1=1

!

(r,c) = size(Y) SUMC = SUM(Y)

K

TSUM = SUM(SUMC)

J=I

TOTOBS

= .tnJ

TOTOBS =r*c

(LLy,jf LLY;J- TOTOBS Ly SSFAC,T = __'_i _ (LL~r

TSS = SUM(SUM(Y,1\2» - TSUM.1\2ITOTOBS

2

r

ERSS=

SSFACT = SUMC.1\2/r - TSUM.1\2ITOTOBS

TOTOBS

LY, Ltl LLY'- 2

IJ

i-I

NP

I

i

J=I

TSS=

I I

ERSS = SUM(SUM(y.1\2» - TSUM.1\2/r

I I

UNIVARIATE ANALYSIS

9.6 Analysis of Variance

* * *

ANOVA.DEM

$DEBUG PARAMETER (MAX=4) DIMENSION X (MAX) ,Y(MAX) DATA X,Y/1.,2.,3.,4.,1.,2.,3.,4.02/ NP = 4 CALL WXY1(NP,X,Y,MAX) CALL ANOVA(X,Y,NP,MAX) END $INCLUDE : 'ANOVA.FOR' $INCLUDE : 'WXY1.FOR'

* * *

10

ANOVA.FOR SUBROUTINE ANOVA(X,Y,N,MAX) REAL X (MAX) ,Y(~~X) ,YCAL(20) ,R(20) ,MRSS,MTSS CHARACTER*l IAO,IAl REAL*4 Tl(10),T2(10) ,CHIL1(10) ,CHIL2(10) REAL SUMX,SUMY,SUMXX,SUMYY,SUMXY,XX,XY,YY,XMEAN,YMEAN REAL AO,A1,CC,THETA,TSS,RSS,SREG,SX2,C1,SD REAL F,T,CCF,CCT,CCC,SI,SY2,C9,FEHR,AOL,AOU,A1L,A1U,TAO,TA1,SYI REAL YCALL,YCALU,SDL,SDU COMMON /Q/A1,SD,CC,CCC,THETA,SI,F,FEHR DATA T1/63.66,9.92,5.84,4.60,4.03,3.71,3.50,3.36,3.25,3.17/ DATA T2/31.82,6.96,4.54,3.75,3.36,3.14,3.00,2.90,2.82,2.76/ DATA CHIL1/7.88,10.6,12.8,14.9,16.7,18.5,20.3,22.0,23.6,25.2/ DATA CHIL2/6.63,9.21,11.3,13.3,15.1,16.8,18.5,20.1,21.7,23.2/ SUMX = O. SUMY = O. SUMXY = O. SUMXX = O. SUMYY = O. DO 10 J=l,N SUMX+X(J) SUMX SUMY+Y(J) SUMY SUMXY+X(J)*Y(J) SUMXY SUMXX+X(J)*X(J) SUMXX SUMYY+Y(J)*Y(J) SUMYY CONTINUE

305

306

COMPUTER ApPLICATIONS IN CHEMISTRY

xx XY

9.6 Anal>."" ofVariante

SUMXX-SUMX**2/N SUMXY-SUMX*SUMY/N

YY SUMYY-SUMY**2/N YMEAN SUMY/N XMEAN SUMX/N C .... ANOVA TSS 0.0 RSS 0.0 SREG 0.0 SX2 0.0 SD 0.0 DO 20 I=LN YCAL(I) AO+Al*X(I) R(I) YCAL (I) - Y (1) TSS TSS+(Y(I)-YMEAN)**2 RSS+(Y(I)-YCAL(I))**2 RSS SREG SREG+(YCAL(I)-YMEAN) **2 SX2 SX2+(X(I)-XMEAN)**2 SD+R(I)*R(I) SD 20 CONTINUE NMl N-l NM2 = N-2 Cl RSS/TSS SD SQRT(SD/(N-l)) MTSS = TSSI (N-l) MRSS = RSSI (N-2) F MTSS/MRSS IF (F.LT.l.) F=l./F T = SQRT(F) WRITE (* ,951) WRITE(*, 952)NM1,MTSS,F,T,SREG,NM2 J MRSS 24 CONTINUE RETURN 951 FORMAT (' ANOVA I' I) 952 FORMAT(75(lH.)I' SUM OF SQUARES',5X, 'DEGREES OF FREEDOM' * 5X, 'MEAN SUM OF SQUARES',5X,lHF,9X,lHT,4X/78(lH.)/' CORRE' * 'CTED FOR' ,lOX, 'N-l=',I3,T40,E15.4,T62,2E8.3/3X, 'MEAN' * '(MSS) 'II' DUE TO FACTOR',lOX, 'P-l = l',T40,E15.41 * 5X,' (SREG) 'II' RESIDUAL (RSS) ',9X, 'N-P =',I3,T40,E15.4

*

/78(lH.)111)

END

9.6 Anal) \i.~ oj

~'ariam.. e

307

UNIVARIATE ANALYSIS

Example 9.23 The output of ANOV A program for a simulated data set is given in Table 9.7 Table 9.7: One Way ANOVA for Simulated Data

X(1)= 1.000 ; Y( 1)= 1.000 ; X(2)= 2.000 ; Y(2)= 2.000 X(3)= 3.000 ; Y(3)= 3.000 ; X(4)= 4.000; Y(4)= 4.020

SUM OF SQUARES

DEGREES OF FREEDOM

MEAN SUM OF SQUARES

CORRECTED FOR MEAN(MSS)

N-I = 3

.1687E+Ol

DUE TO FACTOR (SREG)

P-I = 1

.2510E+02

RESIDUAL (RSS)

N-P=2

.1508E+02

F

T

.894E+OI .299E+01

................................................................................................................................................................. Example 9.24 The data for estimation of the concentration of sodium hydroxide by titrating standard oxalic acid with phenolphthalein indicator is analyzed by one-way ANOV A. The data (Table 9.8) consists of three samples each in duplicate and F-test shows the concentration is reliable as the error due to random fluctuations is insignificant. Table 9.80 : Titration Data for Standardization of Sodium Hydroxide (TITRANT) with Oxalic Acid (TITRAND)

K

L

Cone. of'

Volume in ml

std. soln.

Cone. of' NaOH

Titrant

Titrand

Initial Volume

Final Volume

.lO1594

.390746

1.820

7.000

.000

1.820

1

1

1

2

.386499

1.840

7.000

.000

1.840

2

1

.388986

2.220

8.500

.000

2.220

2

2

.388986

2.220

8.500

.000

2.220

3

I

.390746

2.600

JO.oon

.000

2.600

3

2

.390746

2.600

JO.oon

.000

2.600

Conc.: Concentration; std. soln.: standard solution No. of Samples (k) : 3; No. of observations in the kth sample: 6.

308

COMPUTER ApPLICATIONS IN CHEMISTRY

9.6 Allalysi., of Variance

Table 9.8b: One Way ANOVA for Standardization of Alkali Source of Variation Between means (Treatment) SST Within Samples (ERROR) SSE Total TSS

Sum of Squares

DF

Mean Sum of Squares

F

.5186e-05

2

.2593e-05

.8586e+OO

.9060e-05

3

.3020e-05

. 1425e-04

5

F is less than F_tabJe(5,1 ,0.05) = 6.61. Therefore, error within the samples is negligible.

The heuristics for the magnitude of SR2 and SL2 are given in Chart 9.20. ANOV A is insensitive to moderate deviations from the model assumptions. However, the results are in error if the samples are interdependent. Thus it should not be taken for granted to apply ANOV A for any type of problem. Chart 9.20 : Heuristics for SR2, SL2 If

Method of analysis is not specified and interlaboratory comparison is attempted

Then SR2 and SL2 are of high magnitude. '.

If

Method of analysis is specified and interlaboratory comparison is adopted

Then SR2 and SL2 are of low magnitude.

Fixed and Random Effect Models In the interlaboratory comparison, two ANOV A models are in vogue, viz., fixed and random effect models. When a few accredited laboratories are chosen, the mean values of analysis are almost same and thus the grand mean is considered as fixed and the corresponding ANOV A is called fixed effect ANOV A. Comparison of effect of two or more factors is possible if a factor has fixed effect and one-way ANOV A is performed. On the other hand, when a large number of laboratories, procedures or analysts is considered for accreditation purpose the mean of each set is a random variable and is referred as random effect ANOV A model. The assumptions of these models and KB for validation of results are given in Charts 9.21 and 9.22.

Chort9.21

* Influence of a single factor on a single response for a sample * Each column (k) of data is from normal distribution * All observations are uncorrelated * Random errors anI mutually independent and normally distributed

* Random samples are selected from k factors.

9.11 AI/a/pis of Variance

UNIVARIATE ANALYSIS

Chart 9.22: KB for ANOVA

If Then Else

Ratio of Mean sum of squares to residual sum of squares (F) > Table value Reject Ho [effect of the factor is significant] Effect of factor is insignificant.

If

Variances are heterogeneous & ANOVA performed Results are perturbed.

Then If

Then

NSAMP > 20 & Fixed one way ANOV A performed & Normality of data is not satisfied Results are not perturbed

If Then

Fixed effect model and SSFACT/SST is low Model is invalid & At least one influential uncontrollable factor is not randomized during the course of experiments.

If

Then If

Then

NSAMP> 20 & Random one way ANOV A performed & Normality of data' is not s!ltisfied Results are in large error Random effect model of one way ANOV A performed & Assumptions are not satisfied Results are in error [Remedy: Transform data to adhere to the assumptions].

309

.

10.1 COVARIANCE, CORRELATION COEFFICIENT The effect of concentration of the analyte or pH on the absorbance of a coloured compound is monitored at the wavelength corresponding to a maximum absorbance in the visible spectrum. The concentration and the ' corresponding abs9rbance vectors form a bivariate data set. The change in absorbance is explained by the variation of cOl).centration by Beer's law. The concentration vector (CON) is, thus, an explanatory factor and the absorbance corresponds to response (RESP). They are also called independent and dependent . variables, respectively. . The variations in response and concentration are expressed as variance in statistical literature. The variation of one variable with the other is called covariance. It is the expectancy that the measured data (CON, RESP) deviate from the respective means. Thus the covariance indicates the relation between the two variables, in particular linear dependence. A few bivariate variables in different fields of chemical in~estigation are giveli in Chart to.l. The equations and source code in FORTRAN are given. Chart 101

Explainable Factor (X)

Response (Y)

Comment

Concentration

Absorbance

Both are primary data

Time

log(a - x)

Y is log transformed

cr

log k

log k is estimated ii'om primary data

Iff

logk

The range of T is generally small



The range of covariance is -oc to -toe. It depends on the measurement scale. For example, the covariances are very different when the concentration of analyte is expressed in flg or mg ..J'he electronic (0) and steric factors (E) for the compounds given in Table 10.1 are independent and covariance has a very low magnitude. Thus if sources of the two variables are different (independent), the covariance between them is zero. However, it does not imply ·that the two variables are independent, if covariance is zero.

* * *

311

BIVARIATE ANALYSIS

10,1 COI'ariance, Correlatioll Coefficiem

COV.FOR FUNCTION COV(X,Y,N,MAX) REAL MEAN1 DIMENSION X(MAX),Y(MAX) XMEAN = MEAN 1 (X,N,MAX) YMEAN = MEAN 1 (Y,N,MAX) cov = O. DO 10 1= l,N COV = COV + (X(I) - XMEAN) * (Y(IJ - YMEAN) CONTINUE COV = COV/REAL(N-1) WRITE(*,951)COV FORMAT (5X, 'COVARIANCE: 'G10.4) RETURN END

10

951

Table 10.1: Covariance between Electronic and Steric Factors

Covariance matrix

cr cr E

E

0.1878 -0.0387 -0.0387

0.7029

Correlation matrix

cr cr E

E

1.0000 -0.1065

-0.1065

1.0000

I

cr

E

1.1

0.24

1.05 1 0.85 0.6 0.52 0.49 0.41 0.405 0.385 0.36 0.215 0.11 0.08 0 -0.1 -0.115 -0.125 -0.13 -0.165 -0.19 -0.2 -0.3

0.24 0.27 0.37 2.55 0.19 -1.24 1.89 1.7.6 0.9 . 1.63 0.38 1.19 0 0 0.07 0.36 0.93 0.39 1.74 0.47 0.51 1.54

312

COMPUTER ApPLICATIONS IN CHEMISTRY

10.1 Covariante, Correlation Coefficient

Correlation Coefficient Correlation coefficient (CC) was introduced to obviate the limitation of scale dependency of covariance. It is the ratio of covariance to the square root of product of variances of the two variables. The correlation coefficient between X and Y is the projection of data onto a plane parallel to XY. The computation of covariance and correlation coefficient is described in Chart 10.2. Chart 10.2

cov(X,

y) =

I

(Xi -xmeaJ*(y, - Ymean)

N-J

i=1

cov(X,y)

r = -,=======-..:...::...'------

~

L

(Xi - Xmean

N-l

l *L

(Yi - Ymeallr

N-l

X(l)= 1.000 Y(l) = 2.000 Y(2)= 4.000 X(3)= 3.000 Y(3)= 6.000 COVARIANCE 2.000 CORRELATION COEFFICIENT (r) X(l)= -1.000 Y(l)= -1.000 Y(2) = .0000 X(3)= 1.000 Y(3)= 1. 000 COVARIANCE : 1. 000 CORRELATION COEFFICIENT (r)

X(2)= X( 1. 000

X(2)= X( 1.000

*

*

2.000

CC.FOR

* FUNCTION CC(X,Y,NP,MAX) DIMENSION X (MAX) ,Y(MAX) SDX = SD(X,NP,MAX) SDY = SD(Y,NP,MAX) COVXY = COV(X,Y,NP,MAX) CC = COVXY/(SDX*SDY) IF (NP .EQ .2)THEN WRITE(*,*) 'NP:2' WRITE(*,951)CC ENDIF IF (NP .LT. 10 .AND. NP .NE. 2)THEN CCC = 1. - (NP-1)/(NP-2)*CC WRITE(*,951)CC, ENDIF

.0000

10. J Cmariance, Correlarioll Coefficient

951 952 953

IF (ABS(CC) .NE.1)THEN IF (NP .LE. 30)THEN T = CC * SQRT( (NP-2)/(1.-CC*CC)) WRITE(*,952)T ENDIF .IF (NP .GT.30)THEN Z = 0.5 * ALOG((l+CC)/(l-CC)) WRITE(*,953)Z ENDIF ELSE ENDIF FORMAT(5X~ 'CORRLATION COEFFICIENT (r) FORMAT (5X, ' t value: ',G12.4) FORMAT (5X, ' z value: ',G12.4) END

* *

313

BIVARIATE ANALYSIS

:

'G10.4!)

CC.DEM

* $DEBUG PARAMETER (MAX=200) DIMENSION X(MAX),Y(MAX) CALL IXY1(N,X,Y,MAX) CALL WXY1(N,X,Y,MAX) = CC(X,Y,N,MAX) R END $INCLUDE 'IXY1.FOR' 'WXY1.FOR' $INCLUDE $ INCLUDE 'COV.FOR' $ INCLUDE 'CC.FOR' $INCLUDE 'SD.FOR' 'MEAN1.FOR' $INCLUDE $INCLUDE 'SUH1.FOR'

Properties of Correlation Coefficient Correlation coefficient is a scale independent scalar quantity. It is calculated based on the assumption that the two vectors x and yare random variables. The range of correlation coefficient is -I to + I through zero and is expressed by Schwartz inequality -I < CC < 1. It indicates the linear dependency between the two data sets.

314

COMPUTER ApPLICATIONS IN CHEMISTRY

10.1 COI'adance, Correlation Coefficient

Hypothesis Testing of Correlation Coefficient Statistical hypothesis testing establishes whether CC is significantly different from zero or not. The null and alternate hypotheses in this connection are Ho: CC = 0 and HA : CC;tO, respectively. When the number of data points is greater than 30, 0.5*ln(l + r)/( 1 - r) follows a Z distribution. For a small sample set (NP < 30), r* (NP - 2)/(1 - r) adheres to a t-statistic with NP - 2 degrees of freedom. In kinetic and equilibrium studies the number of data points is less than 10 and the t-statistic was found to be inadequate. Exner proposed corrected correlation coefficient (CCC) given by CCC = I - [(NP - I)/(NP - 2)]

* Cc.

The correlation coefficients (Fig. 10.1 a & 1O.lb) for data sets I and 2 are +1 and -I, respectively. When the absolute value of correlation coefficient is one all the points lie on a straight line indicating that data set is error free. In real experimental data, magnitude of CC decreases as the random error increases in either or both of the variables. In Fig. 10.1 c, the CC is 0.97 because few points lie slightly off the line. Although a linear relationship exists the low value (0.71) is due to significant scatter around the line (Fig. 10.1 d). A value of 0.26 for CC indicates that prediction of Y frolll knowledge of X from linear model is impossible (Fig. 10. Ie). The correlation coefficient is zero for the data set ioiiowing the functional relationship y =

± ~9-x2 (Fig. 10.2a). CC changes with the range of X for the data following the non-linear relation,

y = cos(tan (x» (Fig. 1O.2b).

Example 10.1 The physicochemical parameters of substituted pyridine carboxylic acids (Table 10.2) show that 7r is uncorrelated with (J" and R as the CC is less than 0.1. However, (J" is highly correlated with R (cc = 0.78). Tobie 10.20 : Physico Chemical Parameters

Hydrophobic parameter

Molar' refractivity (MR)

(n)

Substituent constant

Electronic parameter

(0")

(F)

Electronic parameter (R)

0.42

1.081

-0.34

0.22

-0.64

1.17

1.688

0.04

0.03

0.01

-1.48

-0.211

0.73

0.65

0.15

() 617

-0.27

0.26

-0.51

0.14

0.016

0.06

0.43

0.34

-0.26

0.726

0.78

0.67

0.16

-1.23

0.369

-0.66

0.02

-0.68

-0.98

1.332

0.0

0.28

-0.26

0.2

1.296

-0.83

0.1

-0.92

1.09

2.224

-0.9

0.01

-0.91

0.5

0.464

-0.07

-0.04

-0.13

-0.08

I

-

: i

I

;3

~

.;

Scatter diagram

Matrix

~

Linear Correlation

Data Form

~'

"":;

Least Squares fit

CC

~

[i n

l.00

8

~

g

8

61

6

6

4

4

:2

2 2

3

~

'"::;

". ~.

2

4

3

4 tll

--I

m

-4

-4

-6

-6

-8

-8 2

08]

» :0

-2

3

»

z ». r -< (J) iii

4

2

3

4

2

3

4

0.97 4

4

2

2

0

0

3.5 4.0

0 2

3

4

...., .....

lJ\

X 10- 10

0.71

2

o

o

~o

2

o

0

-2 (~

-2

o

-1

o

-1

X 10-10

l,le_lO

2, Ie-1O [

1

1

X 10- 10

()

o

r-~------r-----~----~--,

:s:: '0

0.26

c

-i

o

m

:0

» '0

3,1.00001e-IO

a

4, Ie-1O

o

o

3

o

~ z (fJ (5

1 ~~______~____~____~__~ 2

'0

r

4

2

3

4

Z () I

m

:s::

0.00

o

o

-1

-1

-1

o

-
.

3

~

2

0.5 0

a------~------~·o 0

-0.5 2

3

I(

2

3

x

Fig. 10.4 : Least Squares Fit and Residuals of a Simulated Data Set Example 10.3 The data set has two duplicate points and the points are distributed on either side of the least squares line (Fig. 10.5). The absolute value of residuals is around 0.5. If the rewoducibility of the data is more than 0.5, then the data adheres to the model

X(l)= .0000; Y(l)= 1.000; X(2)= .0000; Y(2)= 2.000 X(3)= 1.000; Y(3)= .0000; X(4)= 2.000; Y(4)= .0000

NP 4

CEPT

SLOPE -0.8182

1..364

y = 1.364 - 0.8182 * x

C

-0.8182

THETA .0000

323

BIVARIATE ANALYSIS

J0 J Covarl£mce, Correlation Coefficient

2.5 ....----I------t------I---,

2

0 0.5

1.5

o:i

~

~

0.5

0 -0.5

o -O.5~~

___

~

___

o

~~

-1

2

2

0 X

X

Fig. 10.5 : Least Squares Fit and Residuals of a Data Set with Noise

Residuals in Response The calculated response (YeAL,) at each experimental point is computed from the estimated least squares parameters (ao, a,) as YeAL, = ao+ a,

* X,

The residuals, RES" are the differences between measured responses (Y;) and that calculated (yeAL.) from the model. They indicate the unexplained variations in the response by the regression equation. The residuals are of paramount importance in concluding whether the data fit into the proposed model. The statistical tests for the analysis of these residuals to validate the model are discussed under the head residual analysis (section 10.4).

* * *

RESID.FOR

SUBROUTINE RESID(X,Y,NP,AO,Al,MAX) REAL X (MAX) ,Y(MAX) ,YCAL(500) ,RES(500) DO 20 1=l,NP YCAL (I)

AO+Al*X (I)

RES (I)

YCAL ( I) - Y ( I )

RSS

RSS+RES(I)**2

WRITE{*,951)I,X(I) ,Y(I) ,YCAL(I) ,RES(I) FORMAT(lX,I3,2F8.2,FI0.4,E15.4)

951 20

CONTINUE SD = SQRT(RSS/(NP-l)) WRITE(*,*)SD RETURN END

324

COMPUTER ApPLICATIONS IN CHEMISTRY

* * *

/0.1 Covariance, Correlation CoeftiLlcll(

RESID.DEM PARAMETER (MAX=4) DIMENSION X(MAX) ,Y(MAX) DATA X,Y/l.,2.,3.,4.,l.,2.,3.,4.004/ NP = 4

CALL LLS1(X,Y,NP,Al,AO,CC,MAX) WRITE(*,*)AO,Al,CC CALL RESID(X,Y, NP,AO,Al,MAX) END $INCLUDE: '\F77\FOR\LLS1. FOR'

Standard Deviation of Slope and Intercept In the case of univariate data, SD throws light on spread of replicate measurements. It is useful to calculate the confidence interval of the central tendency parameter namely mean. Similarly, for bivariate data following a linear relationship the SD in regression parameters are given in Chart 10.5. They are used to validate the model, to calculate the correlation between the parameters and their confidence intervals.

ChortlO.S

NPAR corresponds to the number of estimated parameters ao and a I

* * *

sda.for SUBROUTINE SDA(X,Y,N,MAX) REAL X(MAX) ,Y(MAX),MEANl REAL YCAL(500) REAL*4 Tl(lO) ,T2(lO) ,CHIL1(lO),CHIL2(lO) DATA Tl/63.66,9.92,5.84,4.60,4.03,3.71,3.50,3.36,3.25,3.17/ DATA T2/31.82,6.96,4.54,3.75,3.36,3.14,3.00,2.90,2.82,2.76/ DATA CHILl/7.88,lO.6,12.8,14.9,16.7,18.5,20.3,22.0,23.6,25.2/ DATA CHIL2/6.63,9.21,11.3,13.3,15.1,16.8,18.5,20.1,21.7,23.2/ CALL LLSl(X,Y,N,Al,AO,CC,MAX)

BIVARIATE ANALYSIS "

10 J Coraria1lce. Correlation CocJthient

YMEAN XMEAN

MEAN1(X,N,MAX) MEAN1(X,N,MAX)

SX2

0.0

SY2

0.0 1,N AO+Al*X(I)

DO 20 I YCAL (I)'

20'

SX2

SX2+(X(I)-XMEAN)**2

SY2

SY2+(Y(I)-YCAL(I))**2

CONTINUE SY2 = SY2/(N-2)

C. ,.

STANDARD DEVIATION

IN AO,Al

SDAl

(SY2/SX2)**0.5

C9

SY2*(1.0/N+XMEAN**2/SX2)

SDAO SQRT (C9 ) WRITE(*,*) 'SDAO,SDA1' WRITE(*,*)SDAO,SDAl C...

RANGE OF AO,Al AOL. = AO-T2(N-2)*SDAO A&u AO+T2 (N-2) *SDAO AlL AI-T2 (N-2) *SDAl Al+T2(N-2)*SDAl AIU TAO (AO-AOE)/SDAO N2 N/2 lAO 'N' IAl 'N' DO AO,Al SIGNIFICANTLY DIFFERENT FROM AOE,AIE IF(TAO.LT.T2(N2)) IAO='Y'

C. . .

TA1=(AI-AIE)/SDAl IF(TA1.LT.T2(N2)) IA1='Y' 23

WRITE(*, 957) WRITE(*, 958)AO,SDAO,AOL,AOU,TAO,IAO,Al,SDA1,AlL,AlU, *

957

TA1,IA1,SD,SDL,SDU FORMAT(65(lH-)/T8, 'VALUE',T19,2HSD,T30,lHL,T40,lHU,T50,

* 958 *

IHT,T59,2HAO/65(lH-)) FORMAT(2X,2HAO,lX,5FIO.4,T60,Al/2X,2HA1,lX,5FlO.4,T60,A 11 2X,2HSD,lX,FIO.4,10X,2FIO.4,/65(lH-)///) RETURN END

325

326

COMPUTER ApP~ICATIONS IN CHEMISTRY

10.1 Covariance. Correlatzon Coefficient

Statistical Significance of Slope and'Intercept In the case of calibration of a coloured compound by Beer's law the straight line passes through the origin when the blank solution has no absorbance or the absorbances are measured against blank. However, the intercept will not be 0.0 but Qf small magnitude. In order to statistically establish that the intercept is not different from zero, point hypothesis testing is used. The expected slope of the Hammett equation for log k versus substituent constant is one. A large deviation is explained in terms of other effects. In this case, a regression parameter is to be tested against a fixed value. Further the regression model is valid only when the parameters are different from zero. The testing of null hypothesis for the significance of slope and intercept of the straight line are performed parameter wise (Chart 10.6). Chart 10.6

aoexpect

Ho.-ao : ao HA-ao ao

* ao expect

Ho.-al al HA-ao : al

* al expect

=

al expect

(ao - ao expect) I (SOao) or (al - al expect)/(SOa/) follows a t-distribution with (NP-2) degrees offreedom Example 10.4 The slope, intercept and their standard deviations for a data set with 18 points are t> t-table Value SD t t-table (ex., DF) 1.415 0.218 False 6.47 1.75 ao False 0.6987 0.0089 7.78 1.75 al OF = NP- 2 = 16, a = 0.05 t-value calculated for the data set is greater than table value at 95% confidence level. The last column infers the statistical validity of the null hypothesis. The regression coefficients are statistically different from zero.

Standardized Regression Coefficient The magnitudes of regression parameters are scale dependent. Thus their numerical magnitudes do not reflect the relative importance of the explainable factors. The standardized regression coefficients are scale independent and are useful to interpret the relative importance of slope and intercept in explaining the total variation in y. sdao = ao *sdao I sdy sda,

= al *sdal I sdy

Correlation Coefficient Between Slope and Intercept In linear regression, the slope and intercept are simultaneously estimated. So there may be several combinations of numerical values satisfying least squares criterion. The information regarding correlation coefficient between slope and intercept is given as H = inv(XT*X) * X T

hNPARxl = diag{inv(XT*X)T} r _par=

(.Jh{ *inv(XT*Xl* (~{

BIVARIATE ANALYSIS

10.1 Co.-aria1/( e, Carre/aru", Coefficient

327

Confidence Intervals of Slope and Intercept The confidence interval (LU) for the intercept and slope can be calculated separately only when the CC between aD and a, is negligible. The Z or t statistic is used depending upon the number of points (Chart 10.7). However, Joint Parameter Uncertainty Intervals (JPUI) are calculated when the absolute CC (aQ, a,) is significant. Chart 10.7 : Confidence Interval of Slope and Intercept

If

NP>30

Then LU_a~ LU_a, IF

NP

ao

± Z (CL) * SD ao

= a,

± Z (CL) * SD a,

=

Pis

I

1 aD k=-*logl t (ao-x)

1

J

aD : concentration of reactant at t = 0

x

: concentration of product at time t

From a paired data set of x vs t, the rate constant is calculated from the slope of the linear least squares analysis of equation log (ao- x) = log (ao) - k

*t

where Y is log (ao - x) and explanatory variable is t. The objective is to estimate a Best Linear Unbiased Estimator (BLUE) of rate constant (KINET2.FOR) which has chemical significance. Here, the SD of the regression parameters should be as minimum as possible. The concentration of the reactant is followed up to hundred minutes in a kinetic study.

328

COMPUTER ApPLICATIONS IN CHEMISTRY

* * *

/0. J Comriallte, Correlatioll Coeffidelll

KINET2.FOR DIMENSION T(100) ,X(100) MAX = 100 WRITE(*,901) WRITE(*,902) READ(*,*)NP READ(*,*)AO DO 10 I =.l,NP WRITE(*,903)I,I READ (*.*) t, CONC T(I)

=

t

X(I) = ALOG (CONC)/ALOG(10) 10 CONTINUE CALL LLS1(T,X,N,SLOPE,CEPT,CC,MAX) WRITE(*,*)SLOP£,CEPT,CC 901 FORMAT (' GIVE NO OF POINTS: '\) FORMAT (' GIVE INITIAL CONC OF REACTANT 902 FORMAT (' TIME ( '12, ' ) ,X ( , ,12, ') : '\) 903 END INCLUDE: 'LLS1.FOR'

'\ )

Example 10.5 The primary data, log (ao - x) and its residuals are given in Table 10.4. For the least squares line, the residuals (Fig. 10.6) are less than I %, indicating a good tit of the data into a first order kinetic model. The SD in k is also less than I % indicating it to be reliable. This model is not used to predict the value of concentration of the product or reactant at any specified time. So the aim is not to fit all the data points into the curve. The correlation coefficient and angle between time and)O vectors are -0.99998 and 0.0, indicating a good linear trend. Table 10.4: Kinetic Data and Residuals

Time

Response

Y

(ao-x)

[Log (ao-x)] 0.2185

2.6400

. 1.5240

6.1800

0.6000

1.6540

Residuals in Y -0.0023

0.1830

-0.0017

1.3200

0.1206

-0.0013

13.9200

0.9660

-0.0150

0.0002

19.3800

0.7740

-0.1113

0.0007

26.9400

0.5700

-0.2441

0.0018

37.6800

0.3680

-0.4342

0.0021

84.1200

0.0560

-1.2518

0.0074

107.3400

0.0210

-1.6778

-0.0071

329

BIVARIATE ANALYSIS

10. J Covariance. Correlatiol/ Coeffidellt

Model = 0.2314 - 0.0177 * x (8.0243e - 006) (1.6472e·- 007) sdy = 0.0041972

y

0.01

Kinetics

0.5

0.005

0

'" 0;

-0.5

:::1

0

:9

>.

~

-1

-0.005

-1.5 -0.01

-2

0

50

0

100

150

50

100

150

x

Time

Fig. 10.6: Least Squares Fit of Kinetic Data Set and its Residuals

Calibration Model using Beer's Law The change in absorbance of a coloured species (exhibiting a maximum in the visible spectrum) with concentration of the analyte adheres to Beer's law ABSORB = blank + (E

* path length) * CONe.

The calibration involves measurement of the absorbance of a series of solutions for different concentrations of the' analyte at a wavelength corresponding to the maximum in the spectrum. When the concentration is expressed in molar scale, the slope is equal to extinction coefficient of the analyte at A. max • But in practice, calibration equations are developed in mg, Ilg or ng scales depending upon the extinction coefficient of coloured species. The intercept corresponding to the blank should be \ero. Even when it is of a small magnitude, the intercept is of no significance. The confidence interval in the range of concentrations used for calibration is of importance.

* * *

BEER. FOR DIMENSION CONC (100 )', ABSORB (100) MAX = 100 WRITE(*,901) READ(*,*)NP DO 10 I = l,NP WRITE(*,902)I,I READ(*,*)CONC(I),ABSORB(I)

330

COMPUTER ApPLICATIONS IN CHEMISTRY

/0./ Comriance. Correiatioll'Coejficielll

10

CONTINUE

901 902

CALL LLS1(CONC,ABSORB,NP,SLOPE,CEPT,CC,MAX) WRITE(*,*)SLOPE,CEPT,CC FORMAT(' GIVE NO OF POINTS: '\) FORMAT (' CONC ( '12, , ) , ABSORB ( , ,12, , ) '\ )

END $INCLUDE: 'LLS1.FOR' Example 10.6

The absorbances at "'max of the spectrum of a coloured species and corresponding concentration data are tested for the adherence to Beer's law. The residuals are very low confirming the model. (Fig. 10.7) Table 10.5: Data for Beer's Law and Residuals

Concentration

LLS Residuals

Absorbance

2.8000

0.1530

--0.0139

4.0000

0.2160

0.0070

4.8000

0.2310

-0.0060

5.6000

0.2970

0.0320

7.6000

0.3120

--0.0231

14.8000

0.6110

0.0236

16.8000

0.6370

-0.0205

17.6000

0.6690

--0.0165

18.4000

0.7310

0.0175

0.035038 = 0.068836 * x (0.00031523) (2.637e - 005) sdy = 0.022004 cc = 0.99593, angx (x, y) = 0

y

Beers Law

0.04

0.8 0

..,u

0.02

0.6

~

'""

:9

c:

-e'"0

~

~

0.4

0 -0.02

0.2 -0.04 0

0 0

5

10 Concentration

15

20

5

10 Concentration

Fig. 10.7 : Least Squares Fit of Spectrophotometric Data

15

20

10, I

CmarIlllltC,

331

BIVARIATE ANALYSIS

Correlation Coeffident

Hammett Equation The variation of log k or log K with substituent constant for a series of homologous compounds follows a linear model log k = log ko + P * 0" Example 10.7 The variation of logarithm of rate constant with substituent parameter is titted into the linear model (Table I 0.6). The graphical representation of least squares analysis is depicted in Fig. 10.8. Table 10.6: Hammett Equation Data 0"

log k (y)

(x)

Residual

-2.7e-OOI

2.241

-0.00069203

-J.7e-OOI

2.398

-0.0030394

2.3e-001

3.045

0.006571

7.8e-001

3.912

-0.0028396

+

2.6719

y

( 1.6122e -005) sdy

=

0.0055208

1.5935

*

cc(x,y) = 0.99998

angx(x,y)

Hammett equation

X

=0

10-3

10 ~------~------~------,

4

3.5 >.

x

(3.691 Ie -005)

5

3

o 2.5 -5

2

-0.5

0.5

0

L -______+-______+-____

-0.5

x

o

~

0.5 x

Fig. 10.8: Hammett Model for Variation of Rate Data.

Regression Through Origin In Beer's law the absorbance versus concentration plot passes through the origin. The statistical model is ABSORB = al

* CONC + E

and conforms to the chemical laws. It is essential to establish that the intercept is statistically insignificant before estimating the slope of the model (1). It results in biased values of slope, standard errors and erroneous confidence intervals.

332

COMPUTER ApPLICATIONS IN CHEMISTRY

10.2 Polynomial RegreHion

* X, . If the data is fitted into a two-parameter linear model, * Xi refers as over ambitious model although SD lj= aO+al * Xi and that used for fitting the data is Yi=al * X, it is

Consider the case where true model is Y; =al

part of the random errors is also fitted. Then lj= ao+al

in Y decreases. If the true model is called a constrained model. In other words one of the regression coefficients (ao ) is assumed to be zero or the straight line is forced to pass through the origin.

Least Squares Solution of Slope and Intercept in Matrix Notation The estimation of slope and intercept (par) in matrix notation is described in Chart 10.8.

ChartlO.S

Since X is a rectangular matrix, it is pre-multiplied by XT rendering the product T (X * X) a square matrix. XT * Y = (XT * X) * par Multiplying both sides by (XT * X)-I (XT * X)-I * (XT * y) = (XT * X)-I par = (XT

* (XT * X) * par = [ * par = par

* X)-1 * (XT * y)

It is the least squares solution of the model y = X * par + Ey where Ey is the vector of random errors in y. This is however, not the derivation of least squares in the matrix notation.

110.2 POLYNOMIAL REGRESSION' Polynomial Models in One Explanatory Variable A quadratic model is invoked when the residuals in y for a linear model show a trend and their magnitudes are far greater than the accuracy of the measurement and reproducibility of the data. If the residuals are still not acceptable, the third and higher order polynomials are used until the distribution of the residuals is random and are of comparable magnitude with data accuracy and precision. Such non-linear trends are common in calibration, variation of chemical parameters with dielectric constant, ionic strength or temperature. However, the least squares solution becomes unstable with the order of the polynomial as the values of X, X2, X 3 etc. are interdependent.

Example 10.8 For a weakly quadratic model Y= 0 + X + 0.2* X2

333

BIVARIATE ANALYSIS

10.2 Polynomial Regre.HicJII

2

if the x range is 0 to 0.9, x and x are highly correlated (0.97) and the angle between the vectors is very low (14.4°). A linear fit gives very low residuals of the order ± 0.025, but a perusal of the residual plot shows a quadratic trend. When a quadratic model is tried, the residuals are of the order of 10-15 • This is an adequate model (Fig. 10.9a-d). When the x range is from -3 to +3, the correlation between x and x 2 is 0.0 and the angle between the column vectors is 90°. Thus, the two variables arc independent and appropriate for polynomial regression. A quadratic tit is proposed since a linear fit is inadequate (Fig. 1O.ge-h). Table 10.7. : Y= 0 + X + 0.2* X'

Range [0 to 0.9] Par (SdPar)

Linear -0.0255

ao

( 8.80e-005) 1.18

aj

(0.00016)

Range [-3 to 3]

Quadratic 2.7756e-017 (3.81e-031)

Linear 0.7 (0.12 )

.

1

I

( 1.96c-030)

(0.067)

0.014124

1 (1.16e-031 )

(7.05-032)

(2. IOe-030) Sdy

' 1.1 102e-0 15 (3.2ge-031 )

0.2

0.::

az

Quadratic

7.8328e-016

0.67454

8.88e-016

Example 10.9 The results for a strongly quadratic model Y= 0 + 0.1 *X + 0.9* X2 are given in Table 10.8. The y is poorly correlated with x (0.07) while highly correlated with x 2 (1.00), which is reflected in the angle between the vectors. Table 10.8: Y= 0 + O.l*X + 0.9* X'

Range [0 to 0.9] (Fig. to.10a-d) Par (Sd Par) ao

-D. 11475 (0.0017)

aj

Model 2

Modell

1.6653e-016 (2.58e--032)

Range [ -3 to 3} (Fig. 10.10e-h) Model 1 3.15 (2.55)

1.11 02e-0 16 7.28e-D31

0.91

0.1

0.1

0.1

(0.0033)

( 1.3 2e-D31)

(1.36)

(2.58e-D31) 0.9

0.9

a2

(1.5584e--031 )

(1.42e--031) Sdy

Model 2

0.06356

2.03e-D 16

3.0354

1.32e-D15

334

COMPUTER ApPLICATIONS IN CHEMISTRY

Linear 1.5 r - - - - - - - - - - - - - . [a]

10.2 Polynomial Regression

Quadratic 1.5 [b]

0.5 0.5

0 ~.5

0 0

0.5

0.5

0

x 10-15

0.03

4 [d]

[c]

0.02 2

O.oJ

18

0 -0.01

2

0

4

-0.02 0

5

20

15

Linear

6

0

5

10

[e]

20

2

4

Quadratic

6 [f]

0

4

15

4

2 2 0 0

-2

-4 -4

L5

-2

0

2

4

-2 -4

-2

0

x 10-15 2 [h]

[g)

0.5 0 0 -0.5

-2

-I

0

5

10

15

0

5

10

Fig. 10.9 : linear and Quadratic Fit of Weakly Quadratic Dala

15

102 Polyn0J1llai Regre~~jon

335

BIVARIATE ANALYSIS

Linear

Quadratic

1 [a]

[b) 0.8

0.5 0.6

04 0.2

-0.5

' - - - - - - o f - - -_ _ _...J

o

Ost~~~-4--------~

0.5

o

X

015~~~--~-~--,

0.5

10-1.5

4r--~--~----~1~6~18~

[c)

0.1

0.05

-0.05 -O.1~_~--~~--4_---...J

o

15

10

5

20

Linear

Quadratic

'10 ...-----------------:----,

[e)

o

o

8

6 4

8

6

o

o

2

4

o

o

o

0

2

o

o

OL..-_ _I-__'o,j.-t'J-Y--_+----'

-2

-4

o

'10 r-:-:[f]~---------'

2

4

o

L..-_ _---1I---"~"'"---_+---....I

X

6...-~~-------,

:r

-2

-4

o

2

4

10-15

4r-~~---------~

[h)

-2

-4~

o

____+-____-+_____...J 15 5 10

~'-----~-----~---~ o 15 5 10

Fig. 10.10: linear and quadratic fit of strongly quadratic data

336

COMPUTER ApPLICATIONS IN CHEMISTRY

/0.2 Polynomial Regre.,,;oll

Example 10.10 The per cent control of grasses after use of herbicides like substituted styrene derivatives (Table 10.9) is analyzed by polynomial models. From the -magnitudes and trends in the residuals (Fig. 10.11), a quadratic model was found to be adequate. Models with only linear, quadratic, cubic or quartic terms are rejected based on higher values of SDy compared to that for quadratic. Table 10.9 Substituent R1

1t

Log (Activity)

-

1.2

2

-

1.8

1.9777-

R2

CHCl 2

.

CCI 3 CCb

4-CH 3

2.53

1.8865

CHCh

4-isopropyl

3.1

1.8633

CHCICOCH 3-

-

0.48

1.716

CCI 2

-

3

1.6902

CH 20H

-

0.5

1.9031

Model Par Linear

Quadratic Cubic

Quartic

ao

1.8938

1.6543

1.4356

1.6322

al

-0.01744

0.39151

0.9953

0.30851

-0.11536

-0.52324

a2

0.077047

a3

=ao+az*x z 1.8997

y =aO+a3*x3 Y =aO+a4*x4 1.9015

0.12906

0.099668

0.10489

-0.2143

0.12748

1.9012

-0.0086488 -0.0033887 -0.0011959

0.040035

a4 sdy

0.197

y

0.12514

0.12187

0.11995

337

BIVARIATE ANALYSIS

10.2 Polynomial Regression

Linear 2

''-'

1.9

Quadratic

2

0

1.9

~o

1.8

1.8

o

1.7

1.6

o

2

4

3

0.2

1.6

0

0

~4

() 2

3

4

4

6

8

0.1

cy

0



Q

0

-0.1

-0.1 . (~6

(~

o

2

4

8

6

-02

0

2

Cubic

Quatric 2

2.1 ~

o~o

1.9

o

1.7

1.9

~

j'

1.8

1.6

2

0.2

0.1

-0.2

0

1.7

o

1.8

1.6 0

2

0.1

0

1.7

o 4

3

0

0

2

0.1

2

0.05

4

3

2

0.05

0 0 -0.05 -0.05 -0.1 -0.15

6

0

6

-OJ 2

4

6

8

0

2

Fig. lO.n: Polynomial Fit of log Activity with 1t

4

6

8

338

COMPUTER ApPLICATIONS IN CHEMISTRY

102 Polynomial Regres"iol1

Example 10.11

The variation of dielectric constant of aquo-DMSO mixtures in the range of 10-65% (w/w) is non-linear. It is titted into a quadratic model and the residuals are of the order 0.02 to 0.10, which is lower than the' measurement accuracy. Thus, any value interpolated in the composition range is valid.

x

Residual

y

10

78.2

-0.054

18.58

77.9

0.215

20

77.5

-0.109

32.56

76.9

-0.062

40

76.4

-0.022

52.01

74.9

0.0016

60

73.3

0.099

65.01

71.7

-0.067

Par

Linear

Quadratic

Cubic

Quartic

ao

-8.1036

-105.6

-364.27

20850

al

828.49

15179

72348

--6.20e+6

y

=ao+ a2*x 2 Y =aO+a3*x3 Y =aO+a4*x -2.439

-0.554

4

0.385 1

,

a2

-5.26e+5

a3

-4.73e+6

6.92e+8

1.03e+8

-3.42e+1O

a4 sdy

30209 1.46e+6 7.98e+OO7

6.33e+11 0.18894

0.05567

0.05990

0.06138

0.19904

0.209

0.2189

,

The cases discussed in this section are of curve-fitting, viz., fitting the data to a mathematical model. In such cases the residuals should be low, unlike parameterization where the statistical significance of model parameters is essential.

BIVARIATE ANALYSIS

IO.3.Robust Regressiol!

339

10.3 ROBUST REGRESSION' The objective of regression is to obtain the trend of majority of points and not to explain the outlier by fitting it into the model. Least Squares (LS) is applicable if the errors in Yare normal and homoscedastic. An outlier in y attracts the least squares line, resulting in incorrect slope and intercept. Hence, methods robust to outliers are desirable. Median is a successful statistical parameter in estimating the central tendency in presence of outliers. It is useful even in a linear model.

Single Median Method Usually, a large number of experimental points are obtained to minimize the effect of measurement errors on the model parameters. But, a pair of points is sufficient to estimate slope and intercept of a linear model. For three data points, 32 pairs (Chart 10.9) of slopes and intercepts are possible. Regression is not possible for the diagonal pairs as they have the same points. It leaves 32 - 3 = 6 sets in the upper and lower triangles. Since the slope and the intercept for a pair of points I, 2 or 2, I are the same, they can be estimated from either upper or lower triangle. Thus, one is left with three {(3 2 - 3)/2} pairs. The algorithm of single median method procedure of Theil is given in Chart 10.10. Chart 10.9

Chart 10.10: Single Median Algorithm

Object function: Median of parameters Step 1

Calculation of slope for pairs of points in the upper triangular matrix Slope = (yj -Yj )/(x,Xj)

Step 2

Sorting of vector of slopes Calculation of median of slopes slope_sma = median (slopes)

Step 3

Calculation of intercept for all points using slope_sma Calculation of median of intercept intercepCsma = med (intercepts)

Repeated Median Estimator For six pairs of points, a unique set (Table 10.10) of pairs of points «6 2 - 6)/2 = 15) can·be represented as an upper triangular matrix. Calculation ofthe slope and the intercept are given in Chart 10.11.

340

COMPUTER ApPLICATIONS IN CHEMISTRY

10.3 Rpbu.,/ R~gressioll •

Table 10.10: Sets of Points and the Slopes . Slope

Median of slopes ofith row

Row (i)

Point numbers

1

1,2

slope (1,2)

1,3

slope (1,3)

1,4

slope (1,4)

1,5

slope (1,5)

1,6

stope (1,6)

2,3

stope (2,3)

2,4

slope (2,4)

stope(2,4) +slope(2,5)

2,5

slope (2,5)

2

2,6

slope (2,6)

3,4

slope -(3,4)

3,5

stope (3,5)

3,6

slope (3,6)

4,5

slope (4,5)

4,6

stope (4,6)

slope(4,5) + slope(4,6) 2

5,6

slope (5,6)

slope(5,6)

2

3

4

5 Median of medians

Slope (1,4)

stope(3,5)

slope(3,5)

Chart 10.11: RME (Sigel) Algorithm

Object function: Median (median of slopes in each row of upper triangular matrix) Step 1: For every row in the upper triangular matrix For each pair of points Calculate slope End Calculate median of slopes End Step 2: Slope_ Sigel = median (median (slopes of all rows» Step 3: Intercepc Sigel (same as Step 3 of Single Median Algorithm)

341

BIVARIATE ANALYSIS

103 Robust Regression

LEAST MEDIAN SQUARES Least squares results in the best linear unbiased estimator (BLUE) of intercept and slope of a linear model. In least median squares (LMS), the median of the squares of residuals is calculated for each pair of points. The slope and intercept corresponding to the minimum of medians of squares of residuals are the LMS estimates of that model. They are robust to outliers but are biased. The algorithm of LMS procedure is given in Chart 10.12. Chart 10.12 : Algorithm of Least Median Squares

Object function: Minimum of the median of squares of residuals in 'y' Step 1 : For every pair of points in upper triangular matrix, calculate

[:~] [~ ~~r *[;J =

RES = y-X* a RES2 = RES * RES

Med _ RES2 = Med (RES2) End Step 2: Minimum of Med _ RES2 LMS estimates are the corresponding slope and intercept

In the presence of an outlier, the residuals by LLS are high but are within the 3SD limits, while those by LMS are very high at outlying points only. When the outliers are deleted, the parameters by both LMS and LLS are nearly equal. This combination of LMS to detect outliers and LLS to calculate BLUE of slope and intercept is a popular hybrid method. Example 10.12 The performance of the above discussed methods is illustrated with a data set from an industrial process (Table 1O.11a). Single Median, Repeated Median and LMS behaved similarly and different from LLS (Table 1O.11b). Table 10.110: A Data Set with One Outlier x

Y

0 1.0 2.0 3.0 4.0 5.0

0 1.1 2.0 3.1 3.8 10.0

Table 10.nb: Comparison of Regression Parameters from Different Methods

Method

au

at

LLS

0.815

1.6914

Single Median (Theil)

-2.220e-16

1.033

Repeated Median (Sigel)

0.0250

1.016

LMS

0.000

1.033

342

COMPUTER ApPLICATIONS IN CHEMISTRY

/0.3 Robust Regression

Example 10.13 Data sets with one outlier in y at different positions are simulated. The data points with regression lines from LLS and LMS, along with residual plots are given in Fig. 10.12. The position of outlier does not affect the regression parameters in LMS but the influence is drastic in LLS (Table 10.12). Table 10.12: Regression Parameters of LLS with Outliers at Different Positions Parameter

Position of outlier

1

2

3

4

ao

2.0000

1.4000

0.8000

0.2000

-0.4000

-1.0000

al

0.5714

0.7429

0.9143

1.0857

1.2571

1.4286

SDy

1.0351

1.2593

1.3575

1.3575

1.2593

1.0351

5

6

343

BIVARIATE ANALYSIS

10.3 Robu,/ ReKreSl;ol1

[e]

o : Expt Points,

[f]

Line: Fitted line

0

6 4

'"

,,/

:Q

,,/

~

,,/ ,,/

2

-1

"@ ::>

,,/ ,,/

'"~-l

,,/ ,,/ ,,/

-2

0

-3 -2

-2

-2

2

0

4

0

[g]

6

o :Expt Points,

4

2

6

6 [h]

Line: Fitted line

0 ,,/

"/

'" -I

"@ ::>

:Q

4

~

'" :§

-2

2

-3 4

2

0

6

0

2

0

4

6

[j]

[i]

o : Expt Points,

Line: Fitted line

3

8

6

'"

"@

.g

-4

2

"00

~

Vl

0

~

-l

2 0

0 0

2

4

6

0 8

2

4

6

8

344

COMPUTER ApPLICATIONS IN CHEMISTRY

[k]

o :Expt Points,

/03 Robu.,( Regres."ioll

[1]

Line: Fitted line

3

8

6

4 ./

./

./

./

./

./

./

./

0

. '"

2

~

~

tZl

~

.....l

2 0

0

0 0

2

4

4

6

8

[n]

[m]

o : Expt Points,

2

8

6

Line: Fitted line

3

10

..

8

'" ;::s

6 4

/"

./

./

./

./

./

./

2

~ .,

./

0

~ tZl

~

.....l

2 0

0

0 0

5

5

10

10

Fig. 10.12: Effect of Position of Outlier on LLS and LMS Regression Lines and the Residuals in LMS Fit

345

BIVARIATE ANALYSIS

10.3 Robust Regressioll

Example 10.14

A simulated data set with one outlier in

'y'

at sixth point is given in Table 10.13.

Tobie 10.13: Comparison of LLS with LMS x

y

Residual

LMS

LLS

1.0000

1.0000

0

0.5714

2.0000

2.0000

0

0.1429

3.0000

3.0000

0

-0.2857

4.0000

4.0000

0

-0.7143

5.0000

5.0000 .

0

-1.1429

6.0000

9.0000

-3.0000

1.4286

Regression Parameters

Intercept

0 1.0000

-1.0000

Slope SDy

2.2500

1.0351

1.4286

The residuals mislead that the least squares estimators are reliable, since SDy by LLS is far less than that by LMS. The plot of residuals versus y shows a trend indicating the insufficiency of a linear model. The residual by LMS is very high (3.0) for the outlier while those for all other points are zero (Fig 1O.12m, n). This demonstrates robustness of LMS to the outlier. The results of analysis after the removal of the outlier (Table I 0.14) show that the parameters, residuals and SDyare identical in both the methods. Tobie 10.14: Comparison of LLS with LMS after Removal of Outlier x

1.00 2.00 3.00 4.00 5.00

Residual

y

1.00 2.00 3.00 4.00 5.00

LMS

LLS

0 0 0 0 0

0 0 0 0 0

Regression Parameters

Intercept Slope SDy

0 1.00 1.0e-014

0 1.00 0.11 c-014

Example 10.15 The results for a data set with normal noise in y and an outlier (Table 10.15) show that the residuals by LLS are higher than those obtained by LMS. The high residuals in LLS are due to the pull of regression line towards the outlier (sixth point).

346

COMPUTER ApPLICATIONS IN CHEMISTRY

103 Robust Regression

Table 10.15: Comparison of LLS with LMS

x

Residual

y

LLS

LMS 0

0

1.0000

1.1000 2.0000

-0.0667

0.3038

0.0667

-0.4876

2.0000

0.0000

0.8952

3.0000

3.1000

0.0000

--1.0790

4.0000

3.8000

0.3333

-2.0705

5.0000

10.0000

-4.8333

2.4381

Regression Parameters Intercept ;, Slope

i

0.0000

-0.8952

1.0333

1.6914

1.7697 ~ I 5.8703 Chemical Tasks Outliers, from a statistical point of view, result for data with asymmetric distributions and/or higher values of cumulative probability,far away from central values. The chemical reasons for their occurrence arc insufficient concentrations of reagent in calibration, presence of ortho-compound in Hammett relationship, solute-solvent interactions in Born dielectric model and on set of a different mechanism in kinetic order.

I

Example 10.16 The data set consists of the instrument signals for different concelltrations of a pollutant (Table 10.16a). The extinctio.n coeffiCIent (slope) and blank (intercept) are vitiated in LLS but not in LMS (Table 1O.16b). Table 10.160: Residuals by LLS and LMS

Concentration

Signal

I

1.1

Residual LLS LMS 0.32 -0.01 -0.04 -0.03

2

2.0

3

-020

4

3.1 3.8

-0.76

0.15 -0.07

5

6.5

0.68

1.71

Table 10.16b: Regression Parameters

Method

Intercept

Slope

LLS

-0.48

1.26

LMS

0.19

0.92

Example 10.17 In the determination of order of reaction, a plot of log concentration versus log k is linear with slope corresponding to the order of reaction. Hence, it is a least squares problem. For a typical kinetic data set without outliers (Table 10.17), the SD in log k is 10-3 and the residuals are comparable in LLS and LMS methods. The slope and intercept are also the same.

347

BIVARIATE ANALYSIS

103 Robu, ( Regres,joll

Table 10.17 : LLS and LMS Results for the Kinetic Data

Concentration

logk

Residuals

LMS

LLS

-0.9590

-3.1146

8.8818e-016

1.0623e-003

-0.8218

-2.8416

3.3194e-003

-6.5402e-004

-0.7095

-2.6253

1.337ge--002

-9.399ge-003

-0.6307

-2.4486

-4.6506e--003

9.5503e-003

-0.5605

-2.3116

8.8818e-016

5.7213e-003

-0.4845

-2.1713

1.2890e--002

-6.279ge-003

Intercept

-1.18

-1.19

Slope

2.01

2.00

SD in logk

0.009718

0.007958

Example 10.18 The solute-solvent and solvent-solvent interactions have pronounced effect on log k. The trend may become non-linear when these interactions are predominant. A data set oflog k versus liD (dielectric constant) with point numbers I, 9 and 10 as outliers (marked with asterisk, Table 10.18) has the slopes -4.64 and -2.04 obtained from LLS and LMS. When point 1 is eliminated, the slope by LLS is increased to -3.29 and SD decreased by 50%. Subsequent deletion of ninth and tenth points further increased the slope to -2.35 with a SD of 0.11. • Table 10.18 : Elimination of Outliers through LMS Analysis

Residuals

Point No.

LMS

1

1.74*

2

0.26

LLS -0.91 0.28

LMS

LLS

Data set

LMS

LLS

Eliminated 0.26

-0.10

0.00

-0.1

liD

log (k)

0.526

-5.301

0.416

-3.596

3

0.00

0.62

0.00

0.19

0.28

0.0

0.448

-3.400

4

-0.04

0.41

-0.04

0.11

0.25

0.1

0.349

-3.154

5

0.05

-0.06

0.05

-0.16

0.03

-0.0

0.204

-2.954

6

0.07

-0.13

0.07

-0.21

0.00

-0.0

0.185

-2.935

7

-0.08

-0.18

0.08

-0.15

0.09

0.0

0.101

-2.602

8

0.00

-0.41

0.00

-0.30

0.03

-0.0

0.046

-2.576

9

-0.61 *

-0.61 *

0.29

Eliminated

0.039

-1.950

10

-0.67*

0.21

-0.67*

0.34

Eliminated

0.027

-1.860

-2.48

-1.94

-2.48

-2.11

-2.40

-2.44

Intercept

-2.04

-4.64

-2.05

-3.29

-2.85

-2.35

Slope

0.70

0.47

0.36

0.25

0.17

0.11

0.)7

SD in y

348

COMPUTER ApPLICATIONS IN CHEMISTRY

IDA Residual Analysis

10.4 RESIDUAL ANALYSIS' The data analysis with least squares is a common modeling technique. The distribution of experimental error is unknown hut we assume that it follows a normal distribution. The residuals obtained are analyzed to understand whether the model proposed is adequate or not. The purpose of residual analysis is to understand • distribution of errors in response • detection of outliers in the data • ruling out inadequate models • avoiding over fitting and • dealing with model errors. The relationship between the error in response and residual are given in Chart 10.13. Chart 10. 13

ao + a,*xj + ej

Model: y, e yeal;

Res;

= = =

N(o, Ii) ao + a,*x; yeal,

)'j-

The residuals are analyzed for normal distribution. If the statistical measures of the residuals and the errors assumed in the model are not significantly different from each other, it establishes that model is adequate since the necessary conditions of LS are satistied. Then one can calculate the confidence contours of regression coefficients, Y cal, etc. Further the regression coefficients are BLUE. A model is considered as adequate only if the residuals do not show any trend. When the data fits into the models the residuals should be ideally equal to zero. But they will be of a small magnitude compared to the response. The absence of trend and auto-correlation leaves the numbers to be random ones, in fact they can be of any disttibution. Respecting the hypothesis that the errors are random following normal distribution in the least squares analysis, the residual vectors are tested for normality. The half-normal plot and normal-plot are popular under 'VEDA' and X2 , skewness, kurtosis tests belonging to 'SEDA.' A discussion on some popular methods of residual analysis viz., X2, R-factor, Exner '1', Ehreusen F-tests employed in chemical sciences follows.

10.4.1 X2 test for Analysis X2 is a special case of Gamma distribution whose probability density function (PDF) is an unsymmetrical function. When the errors in the response follow normal distribution, the sum of the weighted residuals follow X2 distribution (Chart 10.14), with (np - npar) degrees of freedom (dt). This distribution measures the probability of residuals forming a part of standard normal distribution, which has zero mean and unit standard deviation. 2

Chart 10.14: X test

If

X2 calculated < X2 table

Then

Ho: Model is acceptable valid

. 10.4 Re.l;dual Analys;s

BIVARIATE ANALYSIS

349

A higher value of X2 than the table vallIe arises due to •

inadequate model for a good data and



adequate model for imprecise data

X2 statistic is used to •

compare experimental results with expected data belonging to a statistical distribution.



check the adequacy of a model, whose parameters are estimated by linear or non-linear least squares.



assess the association between two variables.



estimate parameters of the model by invoking it as an object function for minimization.

Limitation

X2 statistic is also a point estimate. Although X2 test indicates that the observed values are not significantly different from expected values, data accuracy is an important component in planned experiments. For example, X2 test is passed even for a difference of 0.1 pH in the observed and calculated values in alkalimetric titrations. But when an instrument of 0.01 readability with 0.03 precision is used, these residuals are on higher side and remedial measures to improve experimental conditions are implemented. 10.4.2 Exner and Ehreusen Parameters The number of compounds studied in LFER is generally in the range of 5 to I O. The variation of log k or log K with variation of substituent in a basic moiety, co-solvent composition or even non-aqueous organic solvents belongs to this category. It has been recognized that correlation coefficient, standard deviation, ttest are inadequate. Exner and Ehreusen proposed new statistics (SIEXN and FEHR ) applicable when number of compounds is as small as five. Further correlation coefficient is moditied as corrected correlation coefficient (CCC). These parameters again depend upon the residuals and deviation from the mean. An empirical rule is invoked to arrive at the best model based on the range of these parameters. The formulae and heuristics are given in Chart 10.15. Chart 10.15 Corrected correlation coefficient (Ccq = 1.0 - (NP - 1)(RSS/ TSS)/ (NP- 2) Exner parameter::::: (RSS/ (NP- 2)*TSS)1/2 Ehreusen parameter (FEHR) =(RSSITSS) 1/2 If 0.0 ? @

(381)

ASCII Character

A B C D E F G H I J

K L

77

M

78

N

79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96

0 P

Q R S T U V W

X Y Z

[ \

] 1\

-

,

Value

97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

ASCII Character

a b

c d

e f g

h

i j

k

1 ill

n 0

P q r

s t

u v

w x Y

z {

I

t ___

}

DEL

382

COMPUTER ApPLICATIONS IN CHEMISTRY

Appendix 2 Object Oriented Representation of Hardware

Computer System

[Analog, digital, hybrid] [Micro, work-station, mainframe, supercomputer] [I, II, III, IV, V generation], [Offline, online] [Hardware, Software, Firmware]

Hardware

[Micro Processor, Memory, I/O device]

Microprocessors

[Intel, Zilog, Motorola, NS]

Intel

[80X86, Pentium]

80X86

[8086,8088,80286,80386,80486]

Pentium

[Pentium, Pentium MMX, Pentium II, Pentium III, Pentium III

Computer Classification

Xeon, Pentium IV] Memory

[Primary, Secondary, Cache, Virtual]

Primary

[ROM, RAM, Virtual, RAM DISK]

ROM

[EEROM, UVEROM, CDROM]

RAM

[Expanded, Extended]

liD device

[Input, Output]

Input

[Keyboard, Mouse, Magnetic media, OCR, Digitizer, Voice]

Output

[VDU, Printer, Magnetic media, CD-ROM-W, Multimedia]

VDU

[MDA, CGA, ECGA, VGA, MCGA, SVGA]

Printer

[Impact, Non impact]

Impact

[Dot matrix, Line printer]

Non impact

[Jet, Thermal]

Jet

[Laser, Ink]

Magnetic Media

[Disk, Tape]

ApPENDICES

383

Disk

[FDD, HD, Zip Drive]

FDD

[8", 3 W', 5 1;.\"]

Hard disk

[10 MB, 40 MB, 1GB, 4 GB, 8 GB, 80 GB]

Tape

[Paper, Magnetic]

Optical

[CD, DVD]

CD

[CD ROM, CD-RW]

Software

[System, OS, Application]

Application Software

[Compiler, Interpreter]

Compiler

[C, Fortran 77, C++]

Interpreter

[BASIC, VB]

OS

[MS DOS, UNIX, WINDOWS]

WINDOWS

[WINDOWS3.1, WINDOWS 95, WINDOWS 98, WINDOWS NT, WINDOWS 2000]

Applications

[Languages, Packages, User programs]

Languages

[Low level, High level]

Low level

[Binary, Octal, Hexadecimal, Assembly]

High level

[Basic, FORTRAN, C, C++]

Package

[DBASE, GRAPHER, SPSS]

User Programs

[Source, Object, Executable]

Appendix 3 Z-table

a

0

0

ao

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

2.575829

2.326348

2.17009

2.053749

1.959964

1.880794

1.811911

1.750686 1.695398

0.1

1.644854

1.598193

1.554774

1.514102

1.475791

1.439531

1.405072

1.372204

1.340755 1.310579

0.2

1.281552

1.253565

1.226528

1.200359

1.174987

1.150349

1.126391

1.103063

1.080319 1.058122

0.3

1.036433

1.015222

0.994458

0.974114

0.954165

0.934589

0.915365

0.896473

0.877896 0.859617

C

0.4

0.841621

0.823894

0.806421

0.789192

0.772193

0.755415

0.738847

0.722479

0.706303 0.690309

:rJ

()

o

s: "U

--I

m

»"U "U

r

0.5

0.67449

0.658838

0.643345

0.628006

0.612813

0.59776

0.582842

0.568051

0.553385 0.538836

0.6

0.524401

0.510073

0.49585

0.481727

0.467699

0.453762

0.439913

0.426148

0.412463 0.398855

0.7

0.38532

0.371856

0.358459

0.345126

0.331853

0.318639

0.305481

0.292375

0.279319 0.266311

0.8

0.253347

0.240426

0.227545

0.214702

0.201893

0.189118

0.176374

0.163658

0.150969 0.138304

0.125661

0.113039

0.100434

0.087845

0.07527

0.062707 ; 0.050154

0.037608

0.025069 0.012533

a

0.002

0.001

0.0001

0.00001

0.000001

0.0000001

0.00000001 0.000000001

z

3.090232

3.29053

3.89059

4.41717

4.89164

5.32672

5.73073

~

--I

oZ (jJ

Z () I

m

s:

Cii

0.9

6.10941

--I :rJ

-