248 116 38MB
English Pages 405 Year 2009
COMPUTER APPLICATIONS IN CHEMISTRY
"This page is Intentionally Left Blank"
"
\
GJIimalaya GJlublishing GJIouse MUMBAI. DELHI. NAGPUR. BANGALORE. HYDERABAD
©
Authors: No part of ~his book shall be reproduced, reprinted or translated for any pur~se whatsoever without prior permission of the Publisher in writing.
ISBN
: 978-93-5024-311-4
REVISED EDITION: 2010
Published by
Mrs. Meena Pandey for HIMALAYA PUBLISHING HOUSE "Ramdoot", Dr. Bhalerao Marg, Girgaori, Mumbai - 400 004. Phones: 2386 01 70 I 2386 38 63, Fax: 022-2387 71 78 Email: [email protected] Website: www.himpub.com
Branch Offices Delhi
Nagpur
Bangalore
Hyderabad
Printed by
"Pooja Apartments", 4-B, Murari Lal Street, Ansari Road,-Darya Ganj, New Delhi - 110 002. Phone: 2327 03 92, Fax: 011-23256286 Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440018. Phone: '272 12 16, Telefax: 0712-272 12 15 No. 16/1 (Old 1211), 1st Floor, Next to Hotel Highlands, Madhava Nagar, Race Course Road, Bangalore - 560 OOL Phones: 2281541, 2385461, Telefax: 080-2286611 No. 2-2-1167/2H, 1st Floor, Near Railway Bridge, Tilak Nagar, Main Road, Hyderabad - 500044. Phone: 55501745, Fax: 040-27560041 Globe offset, New Delhi
LEGENDS FOR FIGURES' Fig. 1.1
Integrated Circuit - 2
Fig. 1.2 Fig. 1.3 Fig. 1.4 Fig. 1.5 Fig. 1.6 Fig. 1.7 Fig. 1.8 Fig. 1.9 Fig. 1.10 Fig. 1.11 Fig. 1.12 Fig. 1.13 Fig. 1.14 Fig. 1.15 Fig. 1.16 Fig. 1.17 Fig. 1.18 Fig. 1.19 Fig. 1.20 Fig. 1.21 Fig. 2.1.1 Fig. 2.2.1 Fig. 2.2.2 Fig. 2.2.3 Fig. 2.2.4 Fig. 2.2.5 Fig. 2.2.6 Fig. 2.2.7 Fig. 2.2.8 Fig. 2.2.9 Fig. 2.3.1 Fig. 2.6.1 Fig. 2.6.2 Fig. 2.6.3
Micro Processor - 3 3 Yz" Floppy Disk - 8 Hard Disk- 8 CD-ROM-9 Scanner - 10 Visual Display Unit (a) Cathode Ray Tube (b) Flat Panel (c) Laptop with LCD Monitor - II Representation of Pixels (640 x 480) on the VDU in Graphics Mode - 11 Mouse - 11 Laser Printer (a) Mono (b) Colour - 12 LCD Projector - 12 Modem - 12 Mother Board - 14 Desktop Icons in Windows OS - 16 Object in Microsoft Power Point - 17 Microsoft Excel - 18 SPSS - 19 Neural Networks -19 Data Acquisition with UNICAM Spectrophotometer - 20 MA TLAB - 20 Microsoft Word - 21 Memory Allocations for Variables - 32 Flow Chart Symbol of IF - 45 Absolute Value of a Variable - 46 IF Statement with Null Process - 46 Range of a Variable - 47 Comparison of Two Numbers - 49 Nested DO Loop - 50 Multilevel Nesting - 51 Expert System Approach for Comparison of Numbers - 52 IF Statements using Logical Variables - 54 DO Statement for Summation of Three Numbers - 61 Execution Profile in a Function Subroutine - 104 Correspondence of Variables of Mainline and Function Subroutine - 109 Program Execution with Subroutines - 113
Fig. 3.1 Fig. 3.2 Fig. 4.1 Fig. 4.2 Fig. 4.3 Fig. 4.4 Fig. 4.5 Fig. 5.1 Fig. 5.2 Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. Fig. Fig. Fig.
6.5 7.1 7.2 7.3
Fig. Fig. Fig. Fig. Fig. Fig.
7.4 7.5 8.1 8.2 8.3 8.4
Fig. 8.5
Fig.8.6
Fig. 9.1 Fig. 9.2
Geometric Representation of Two Vectors - 128 Minimum of Two Numbers - 144 Roots of a Quadratic Equation - 164 Calculation of pH in a Titration of Strong Acid vs. Strong Base - 173 Real Roots pf Quadratic Equation - 176 Flow Chart for QUAD4.FOR Exhaustive Testing of Coefficients of Quadratic Equation - 182 Profiles of Functions and Derivatives - 191 Traces of Functions with the Independent Variable x - 193 Linear Interpolation - 214 Choice of Data Points for Quadratic Interpolation - 215 Errors Due to Piece Wise Linear Interpolation of a Non-linear Function - 215 Zooming Effect 00 the Path of Pathological Function in Different X Ranges - 216 (a) Smooth [3.12 to 3.16] (b) Valley [3.135 to 3.145] (c) Breakdown [3.141 to 3.142] Effect of Data ~ange on the Interpolated value - 217 Chromatographic Elution Profile - 234 Area under the Curve of y=x 2 - 235 Integration by Trapezoidal Rule ofa Function - 237 (a) Non-linear (b) Linear (c) Constant Effect of Step Size on the Error in Area by Trapezoidal Rule - 238 Profiles of Test Functions and Areas in the Range 0 and 1 - 239-240 ColIinear Vectors - 256 Eigen ElIipse for Non-orthogonal Vectors - 257 Eigen Vectors for Orthogonal Vectors - 258 (a) Absorption Spectra (b) Variation , of Absorbance with pH for Methyl Red - 259 ChloraniJinic Acid - 260 (a) 3D Surface ofChloranilic J\cid. (b) Contour Plot (c) Spec~ra at Different Concentrations (d) Spectra at Different Wavelengths Hf-Chloranilic Acid - 261 (a) 3D Surface ofHf-chloranilic Complex (b) Contour Plot (c) Spectra at Different Concentrations (d) Spectra at Different Wavelengths Demonstration of Precision and Accuracy - 269 (a) Replicate Measurements of Weight ofa Substance - 270 (b) Potentiometric Titration (c) Conductometric Titration (d) ~bsorption Spectrum
Fig. 9.3
Statistical Distributions - 271 (a) uniform distribution (b) normal distribution (c) log normal distribution (d) exponential distribution (e) t-distributions (f) f-distribution (g) X2 -distribution (h) normal distribution with different means (i) normal distribution with different variances
Fig. 9.4
Skewness is (a) Greater than zero (b) Less than zero (c) Zero - 278
Fig. 9.5
Kurtosis (a) Leptokurtic (b) Mesokurtic (c) Platykurtic - 280
Fig. 9.6
Confidence Interval for Univariate Data - 282
Fig. 9.7
Rejection and Acceptance Regions for Standard Normal (z) Variable - 283,284
Fig. 9.8
Calling Sequence of Subprograms in UNIV AR.FOR - 288
Fig. 9.9
Box Plots (a) With and (b) Without an Outlier - 293
Fig.9.10 Comparison of Means of Two Samples - 297 Fig. 10.1 Pictorial Representation of Correlation Coefficient with Typical Data Sets - 315, 316 Fig. 10.2 Effect of Data Range on Correlation Coefficient - 317 (a) Smooth Function (b) Transcendental Functions Fig. 10.3 Modular Development of Least Squares Algorithm - 320 Fig. 10.4 Least Squares Fit and Residuals of a Simulated Data Set - 322 Fig. 10.5 Least Squares Fit and Residuals ofa Data Set with Noise - 323 Fig. 10.6 Least Squares Fit of Kinetic Data Set and its Residuals - 329· Fig. 10.7 Least Squares Fit of Spectrophotometric Data - 330 Fig. 10.8 Hammett Model for Variation of Rate Data - 331 Fig. 10.9 Linear and Quadratic Fit of Weakly Quadratic Data - 334 Fig. 10.10 Linear and Quadratic Fit of Strongly Quadratic Data - 335 Fig. 10.11 Polynomial Fit of Log Activity with 1t - 337 Fig. 10.12 Effect of Position of Outlier on Ligand LMS Regression Lines and the Residuals in LMS Fit342,343,344 Fig. 11.1 Optimization of Numerical Factor Space in Experimental Design - 356 Fig; 11.2 Two Factor-FuFD with (a) 2-level, (b) 3-level, (c) 4-level, (d) 5-level- 358 Fig. 11.3 Response of2 factor-2 level Simplex (a) Factor space (b) Response with Simultaneous Variation ofFl and F2 (c) 3-D Response Surface (d) 2D- Contour of (c) - 359 Fig. 11.4 Central Composite Designs - 360, 361 Fig. 11.5 Non-Central Composite Design - 361 Fig. 11.6 Coordinates and Geometries of a 3-factor -3-level Design - 363, 364 Fig. 11.7 Factor Space (a) I-factor (b) 2-factor and (c) 3-factor Simplices Response profiles of (d) I-factor and (e) 2-factor designs - 370 Fig. 11.8 Progress of (a) Factor Value and (b) Response in One-factor Simplex - 372 Fig. 11.9 Progress of2-factor Simplex - 374 Fig. 11.10 Oscillation of2-factor Simplex - 374
"This page is Intentionally Left Blank"
CONTENTS
,
Chapter 1 Hardware and Software ..................................................................... 1 - 26 1.1 Hardware
Chapter 2 FORTRAN Statements ................................................................... 27 -125 2.1 Sequence ..,. Assignment & Replacement ..,. Precision in Arithmetic Operations 2.2 Transfer Control Statements ..,. Conditional IF THEN ELSE, Logical IF ..,. Unconditional GOTO 2.3 DO Statement 2.4 Input 10utput ..,. READ/ WRITE Statement ..,. List Directed I/O ..,. Formatted I/O Input/Output through external files ..,. Files ~ Sequential & Random Access ~ Formatted & Unformatted ..,.
Statements ~ OPEN, CLOSE ~ BACKSPACE, REWIND ~ INQUIRE
2.5 Dimension ..,. Type Statement ~ REAL, INTEGER 2.6 Subprogllam ..,. Intrinsic/Library function ..,. SUBROUTINE 2.7 DATA 2.8 STOP, PARAMETER
Chapter 3 Software Method Base ................................................................ 126 -162 3.1 Matrix Operations 3.2 Sorting 3.3 Permutations and Combinations
Chapter 4 Roots of an Equation .............. _................................................... 163 -189 4.1 Quadratic Equation .. Hydrogen Ion Concentration .. Strong Acid-base Titration 4.2 Roots of Cubic Equation .. Weak Mono-protic Acid .. Van der Waal's Equation
Chapter 5 Optimization ...................................................:........................... 190 - 212 5.1 Minimization Algorithm .. Taylor's Infinite Series .. Gauss Newton Method .. Newton Raphson Method 5.2 Gauss-Newton Method for Two Variables Functions .. Simulation of Alkalimetric Titrations .. Distribution of Iodine between Water and Organic Solvent
Chapter 6 Numerical Interpolation .............................................................. 213 - 232 6.1 Linear and Quadratic Methods 6.2 Lagrange and Modified Procedures
Chapter 7 Numerical Integration .................................................................. 233 - 253 7.1 7.2 7.3 7.4
Trapezoidal Rule Simpson's 1/3 Rule Gauss Legendre Quadrature Integration of Ordinary Differential Equations > Euler Method > Range Kutta Method
•
Chapter 8 E!gen Analysis ..................................................•........................... 254 - 263 8.1 Eigen Values 8.2 Eigen Vectors
Chapter 9 Univariate Analysis ...................................................................... 264 - 309
..
9.1 9.2 9.3 9.4 9.5 9.6
Errors Statistical Distributions Moments of Data Robust Methods Comparison of Univariate Data Sets. Analysis of Variance •
Chapter 10 Bivariate Analysis ...................................................................... 310 - 355 10.1 Covariance, Correlation Coefficient ,A Linear Least Squares )- First Order rate Equation )- Beer's Law )- Hammett Equation 10.2 Polynomial Regression 10.3 Robust Regression ,A Least Median Squares 10.4 Residual Analysis 2 ,A X test for Analysis ,A Exner and Ehreusen Parameters ,A Crystallographic R-test. 10.5 Multiple Linear Regression ,A Taft Equation ,A Hansch Model
Chapter 11 Experimental Design ...................................................•.............. 356 - 376 11.1 One Variable at a Time 11.2 Parallel designs )- Factorial Designs 11.3 Seque~tial Designs )- Simplex Design ,A ,A
For Further Reading Advanced Reading
References .................................................................................. 377 - 380 Appendices ..................................................................•.............. 381 - 392 Appendix Appendix Appendix Appendix Appendix Appendix Appendix
1: 2: 3: 4: 5: 6: 7:
8-bit ASCII Characters and their Decimal Values Object Oriented Representation of Hardware Z-table One-tail t-table Two-tail t-table F-table l-table
..
"
"This page is Intentionally Left Blank"
computer system comists of hardware and software. The hardware can be felt by touch and comists of physical components like monitor, keyboard, CDROM drjve etc. Software, on the other hand, performs several tasks but is invisible. It is like a song recorded on an audio recorder that is heard. The hardware and software technology is at a matured state and dealing it breadth- or depth-wise is a formidable task. An attempt is made, in this book, to introduce the vocabulary in a nutshell This will be usefol to choose the computer configuration and to reduce the communication gap during discussion with information technologist. (i
Origin of Computer . Charles Babbage and his sponsor Lady Augusta Lovelace showed that a machine could be programmed with a sequence of instructions producing results. Babbage depicted his analytical engine with 10 state-gear wheels on a paper. This concept of stored program was realised after 100 years, but Augusta is considered as the, world's first programmer. The number representation dates backs to ninth century. Leonardo Fibonacci introduced the first mechanical calculator in 1175. It was succeeded by analytical engine in sixteenth century and .the concept of automatic computer was introduced in around 1888. The first electronic digital computer ENIAC (Electronic Numerical Integrator And Calculator) was unveiled in 1946 at the university of Pennsylvania. It had 18000 vacuum tubes and weighed 30 tonnes, occupying a space of two garages. It failed for every seven minutes on average and costed about half a million dollars at 1946 prices.
2
J. J Hardware
COMPUTER Apl;'LlCATIONS IN CHEMISTRY
John Bardeen, Walter Brattin and WilJjam Shackley at Bel1 laboratories. developed the first transistor in 1948. After a decade, Texas Instruments mounted six transistors on the same substrate material (base) and were connected without wires. It was the birth of integrated circuit (IC) that revolution:i.sedthe computer industry. The Ies (Fig. 1.1) are classified as medium (MSI) to ultra large (ULSI) scale 'integrated circuits depending 'upon the \ number of transistors (Chart 1.1). Fig. 1.1: Integrated Circuit
Chart 1.1 Abbreviation
Actonym
Number of transistors
MSI
Medium Scale Integrated Circuit ·
102
LSI
Large Scale Integrated Circuit
103
VLSI
Very Large Scale Integrated Circuit
.105
ULSI
Ultra Large Integrated Circuit
106
The computer of 1950s was co~sidered to be a~ obedient servant and was used to relieve the drudgery of repetitive tasks. It resulted in the master-slave frame to explain the function of a computer. But the computer available today is equipped with multi-fold capabilities and intelligence.
1.1 HARDWARE , Everyone knows that a' computer contains a Pentium processor, mega bytes of RAM, giga bytes of hard disk and. multimedia. The information can be fed through a keyboard, voice- or video-recorder. The technical details are necessary in configuring a system, but the manufacturing details are not interesting. Scientific information that promoted the technology is highly involved and is worth knowing for research scientists only. Classification of Computers The computers have been classified under several categories based on mode of processing, size, memory, components employed, architecture, speed as • •
Analog, digital and hybrid computers Micro, mini,' work stations, main frame, super computers and super micros
• •
I, II, ill, IV, V generation computers Van Neumann machines, pipeline and parallel architecture
Analog Computer A computer is called analog computer if it measures' physical parameters like pressure and temperature as voltage in analog mode. Digital Computer In digital computers the input and processing is in digital mode. They are more accurate than analog computers.
1.1 Hardware
HARDWARE AND SOFTWARE
3
Hybrid Computer Computers having both analog and digital processes are called hybrid computers and are useful in process technology. They receive analog signals and convert the data into digital form. After numerical calculations, the digital output is converted into analog signal and mechanical operation is performed. Mainframe Computer .IBM 360 and 370 were sophisticated computing systems -of 1960s and occupy roomful of humming metal boxes, switches etc. Air conditioning and utmost care were prerequisites. Digital Electronic Corporation (DEC), International Computers Limited (TCL), Control Data Corporation (CDC) and Norton Data (ND) were other popular mainframe computer manufacturing companies. Upper case· text with a few special characters was the input through punch cards, magnetic tapes and hard disks. Hard copy, a print out with the same character set on a line printer, was t!le standard output. It was not possible to get even simple line graphs. Special Tectronics terminals were used for graphical display. A monitor program and Job Control Languag~ (JCL), later called operating system, was used to run the computer and the individual jobs. Microcom puter The microcomputer era brought renaissance in Electronic Data Processing (EDP). In 1971 Intel Corporation produced a microprocessor containing the central processing unit on a chip. It contained 2300 equivalent transistors with 640 bytes of memory, which is little more thaI} a single-spaced page. IBM personal computer (IBM PC) released in 1981 was due to the concerted efforts of different manufacturers listed in Table 1.1. The components of a microcomputer are Central Processing Unit, memory and input/output (UO) devices. Table 1.1: Manufacturers of Modules of IBM PC
Component
Manufacturer
Floppy drive
Tandon magnetics
Dot matrix printer
Epson
Monitor
Tatung
PC DOS
Microsoft
Central Processing Unit (CPU) Central Processing Unit (CPU) is the heart of the computer. It performs arithmetic operations (addition, subtraction, multiplication and division) and logical operations (OR, AND, NOT). These operations are essential for system programs to control the computer. The application programs written in high-level languages perform the operations on very large and small numbers. In microcomputers a microprocessor takes care of these tasks. Micro Processor A microprocessor is a large-scale integrated circuit (Fig. 1.2) on a silicon chip consisting of tens of thousands to millions of transistors. It is used in PCs and workstations as CPU. Micro Processors manufactured by different companies are listed in Chart 1.2.
Fig. 1.2: Micro Processor
4
1.1 Hardware
COMPUTER ApPLICATIONS IN CHEMISTRY
Chart 1.2 Manufacturer
Processor 8086,8088,80286,80386,
Intel
80486, Peritium, Cyrix 686
Zilog
Z80,ZSOOO
Motorola
68000,68020,68030
National Semiconductors
NS
All these chips have upward compatibility. It means a program developed for a lower end processor runs on all higher versions but not the other way. If a program developed for a higher end processor, rons even on all lower versions, it is referred as downward compatibility (for example, CYRIX 6X86 chip). The number of transistors Yt different micro processor chips of Intel (Table 1.2a) and systems using Motorola processors (Table 1.2b) are described below. Table 1.20 : No. of Transistors in Intel Processors .
Micro Processor
Number of Transistors
MIPS
8080
6,000
0.64
8086
29,000
0.33
80286
134,000
1.2
80368DX
275,000
6.0
1,250,000
20.0
1,185,000
16.5
80486DX ,
80486SX
* MIPs: Million Instructions per second Table 1.2b : Computer Systems with Motorola Micro Processors
Micro Processor
Number of Bits
6502
8
Speed (MHz)
1
64K
8
16MB
32 (data) 68000
16 (address)
Memory
System Apple II Macintosh HP laser jet printers
68020
32
16 to 33
4GB
MacintoshU
68030
32
20 to 50
4GB
Macintosh II
I
Coprocessor It is a hardware device dedicated for 'transcendentaVtrigonometric calculatiods, addition, subtraction, multiplication and division. It performs calculations concurrently with the regular CPU operations. The coprocessors 8087 and 80287 perform all calculations (Table 1.3) with 80-bit precision on seven different
data types: (a) three integets, (b) one decimal and (c) three floating point. 64-bit floating-point ·range extends to 10308 •
1.1 Hardware
5
HARDWARE AND SOFTWARE
Table 1.3 : Coprocessors for Intel Micro Processors
Processor 8086 80286 80386 80486 Pentium
Coprocessor 8087 80287 80387 Built in Built in
Today personal computers with Intel 486 or a lower processor has become obsolete. Pentium is comparable to 1988 Vintage CRA Y Y-MP super computer. Pentium II, III and IV are destined for high performance server and multimedia systems.
Array Processor A group of micro processors controlled by another CPU is called array processor. It is used to speed up 3D graphics, video or mathematical calculations.
Parallel Processor In parallel processor computers, several micro processors run simultaneously and another CPU coordinates the results. Here, Multiple Instruction Multiple Data (MIMD) and Singie Instruction Multiple Data (SIMD) type of computations are performed. Example: CRA Y X-MD 48 (four processors).
Workstation Workstations are used for high computational power, graphics and animation. The speed, memory and throughput are higher compared to PCs. A few typical systems are described in Table 1.4. Table 1.4 : Features of Few Workstations
Name IBM Intelli-station HP kayak XM 600 Professional 3D AGP 4X
Processor Pentium III Xeon (866 MHz) Pentium III 733 MHz (Dual processors) Pentium III
Memory 18.2 GB 1 GBRAM 32MB 48X IDE DVD-ROM
On-line System
An instrument connected to the compute~ or an instrument with micro processor inside is a computerised system. The system with in situ processing and real-time response is called on-line system.
Special Purpose Computers Dedicated computer systems developed for a specific application are called special purpose computers. Some special purpose computer systems and the fields of their applications are given in Table 1.5. Table 1.5 : Typical Special Purpose Computer Systems
System LISP Machine Database System CAD/CAM Machine
Application Artificial Intelligence Electronic Data Processing Engineering Design
LISP: LISt Pr()cessing; CAD: Computer Aided Design; CAM: Computer Aided Manufacturing
- - - - - _._------
6
1_1 Hardware
COMPUTER ApPLICATIONS IN CHEMISTRY
High Perfonnance Computing It is required for desktop vi
23
HARDWARE AND SOFTWARE
Input
Pseudo Code A=l, B=2, C=I
Flow Chart Symbol
(
(
1------
Print ).-
•
Output
0 Magnetic Tape
Print to File
1'--.._/
,
,.
Hard Disk
0
Connectors
0
off-page
>
Termination
End
(
)
Edit-Compile-Run Cycle Step I Text editors arc used to key in a data or Fortran program. Edit, Notepad, NE or NE2 are some of the popular editors. The sequence of user actions and display are as follows. At the DOS rrompt key in C:\> Edit followed by pressing the ENTER key. The display of editor screen appears. Now the program is keyed in. The contents are saved in a file named ASIGNOOl.FOR qS follows.
24
COMPUTER ApPLICATIONS IN CHEMISTRY
1.2 Software
Click the mouse at File and a pop-down menu appear~. Click at Save As and enter the tile name ASIGNOO1.FOR and press OK. By clicking on Exit in the file pop-down the control returns to DOS prompt. The contents of the file can be displayed on the monitor by the command. C:\MSF>Type ASIGNOO1.FOR
*
*
ASIGN001.FOR
*
951
ABSORB = 0.356 CONC = 0.000178 EPSI ~ ABSORB/CONC W~ITE (*,951)ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END
Step 2 Translation of a Fortran source code into an executable form cOlrprises of checking the code for syntax errors, conversion to a binary object module and linking with Fortran library, C:\MSF>FORI ASIGNOO1.FOR prompts
Object filename [asign001.OBJ]: Source listing [NUL.LST]: Object listing [NUL.COD]: If there are errors they are displayed as ***** Error 50,line 5 -- invalid symbol in expression Pass One 1 Errors Detected 12 Source Lines The editor is again invoked and the required correction (removing one "=" sign) is made. On giving the command sequence of FORI again
Pass 1 No errors detected 12 Source lines is displayed. Compiler develops intermediate tiles PASIBF.SYM and PASIBF.BIN C:\.t'v1SF> FOR2 Pass two No errors An object module with the name ASIGNOO1.obj is stored in the current directory and the message Code Area Size #OOAC ( 172) Cons Area Size = #0014 20) Data Area Size = #002C 44) Pass Two No Errors Detected is displayed on the monitor.
1.2 So/i\\"are
HARDWARE AND SOFTWARE
25
C:\MSF>LlNK8086 Microsoft 8086 Object Linker Version 3.D/' (C) Copyright Microsoft Corp 1983, 1984, 1985 Object 110dules [.OBJ]: ASIGN001 Run File [ASIGN001.EXE]: List File [NUL.MAP]: Libraries r .LIB] :
The file with an extension .EXE is assumed by default. Here one can choose another name. The path of Fortran library is needed and pressing the ENTER key assumes that they are in the default current directory. Now ASIGNOOl.EXE is stored on the disk.
Running an .EXE program ASSIGN OOl.FOR does not require any input and the result is displayed by either of the following command lines C: \MSF>ASIGNOOl C: \MSF>ASIGN001. EXE
gives results . . 356000000000000000 .000178000000000000 200D.000000000000000000 Stop - Program terminated.
PHHCL.EXE program requires numerical values of variables. They can be given through keyboard on demand or through a text file containing the data.
Procedure to run the program (PHHCL) At the DOS prompt Stepl C:
\>
Kcy in cd MSF-CAC & press ENTER key C: \MSF-CAC> Step 2 To execute the program PHHCL, key in PHHCL and press ENTER C:\MSF-CAC> PHHCL
Step 3 The program prompts Give cone of HCl
26
COMPUTER ApPLICATIONS IN CHEMISTRY
1.2 Software
You should enter 10 e-2 M as 1.0 e -02. Then another prompt is displayed as Give value pKw:
•
Enter 14.00 and the pH of the solution is outputted pH = 2.000 The above sequence can be viewed as user action and output on the screen (Table 1.13). Table 1.13
Step Step 0
User Action
Step 1 Step 2
cd MSF-CAC PHHCLI
Output on Screen C: \ .
0.02 14
C: \11SF-CAC> C: \MSF-CAC> PHHCLI Give HCI - can : Give HCI - can : 0.02 Gi "0: pkw : Give pkw :14 HCI - can = .2000E-Ol pKW := 14.00 PH := 1. 699
Batch File One or many DOS commands can be stored in a file with extension .bat. It is like a shell program in UNIX. For example, the compilation and execution of asignOOl.for can be saved as ASIGNl.BAT
* ASIGNl.BAT * * FORI ASIGNOOl.FOR; FOR2 LINK8086 ASIGNOOl,AISGNOOl"D:\F77-UGC\\ ASIGNOOI When ASIGNI is given at the DOS prompt, the result is obtained. C: MSF-CAC\> ASIGN1 .356000000000000000 .000178000000000000 2000.000000000000000000 Stop - Program terminated.
12.1 SEQUENCE STATEMENTS' Assignment and replacement statements belong to sequence statements. Their components are constant, variable and operator. The program execution starts from the top and each statement is executed sequentially up to END .
Constants Numerical constants are an integral part of chemical calculation and used to estimate several other values of chemical significance. Constants can be of different types based on their accuracy and variation with ambient conditions. The valucs of Avogadro number, Faraday, Ideal gas constant and atomic weight are known accurately. On the ' other hand, density, dielectric- and auto-ionization- constants are determined precisely, but they change with temperature. Similarly, precision of equilibrium and rate constants and extinction coefficient depends on the equipment, method of calculation and external factors . A constant is invariant in a compliter program. It may be literal,. numerical or logical. Literal constant is a • character or a string of characters used in arithmetic operation, keywords, continuation of a FORTRAN statement in more than one line, carriage control of a printer and input/output statements. The numerical constants are subdivided into integer, real and complex.
Integer Constant Integer constant is a whole number with or without sign including zero. The maximum value of an integer constant is not infinity in computer terminology. For 16- and 32- bit computers, it is 2 16 - 1 (or 32767) and 2 32 - I (or 2.147 x 10\ respectively. The plus sign is optional but minus sign is mandatory. Some examples are 5, -25675, +30675 . Zeros preceding the first significant digit (-DOn " are ignored. The number of ionizable protons in oxalic acid (2), number of electron change in the reduction of ceric to cerous ion (1), charge on sulfate (-2) and number of neutrons in carbon (6) are some examples of integer constants. The stoichiometric coefficients [-1, -2, 1] of the chemical reaction Cu(ll) + 2 en Cu(en)2 are also integcrs. The. coefficients are positive for products and negative for reactants. Some of Don'ts are given in Table 2.1.
28
COMPUTER ApPLICATIONS IN CHEMISTRY
2.1 Sequence Statements
..
Table 2 1, Don'ts of Integer
Invalid 21;475
Reason Comma
2M
M, units of concentration is a character
37A
A, a character
Real Constant A real constant is signed or unsigned number with a decimal point (Table 2.2). It is convel)ient to express these numbers in exponent notation. A constant 0.025 is normally expressed as 0.25 x 10. 1, I~ the exponent notation E replaces " x 10 " found in normalized form. The mantissa and exponent are the numbers before and after E, respectively Cfable 2.3). The value of a constant is calculated by multiplying the number preceding'E by 10 raised to the exponent. Table 2.2 : Real Constants in Chemistry
Real Constant
Parameter
0.025
Absorbance
0.0000472
Rate constant
0.0000 0000 00001
Ionic product of water
47632.365
Epsilon
800000.0
Beta
Table 2.3 : Real Constants in Different Forms
Constant
Normalized Form
Exponent Forn,t
0.025
0.25 x 10. 1 •
0.25 E-Ol
0.0000472
0.472 x 10-4
0.472 E-04
476.32365
0.47632~65 x 103
0.47632365 E03
Computer stores real constant of normalized form up to a fixed number of digits after decimal point. A real constant is stored up to eight and sixteen decimal places in single and double precision modes, respectively (Table 2.4). The rules for invalid constants are depicted in Chart 2.1.
Table 2.4 : Real Constants in Single and Double Precision Real Constant
Single Precision
Double Precision
0.025
0.2500 OOOOE-Ol
0.2500 0000 0000 OOOOE-O 1
0.123456789012
0.12345679EOO
0.1234 5678 9012 ooOOEOO
FORTRAN STATEMENTS
2.1 Sequence Statements
29
Chart 2.1 If Any character other than E or D present Then Invalid [0.1 x (10)-23] If More than one decimal point Then Invalid [6.0.3E+23] If More than three digits after E or D Then Invalid [ 1.00E6725] If Decimal point occurs after E or D Then Invalid [3.0 E 10.0]
Any character other Ihan + or - aner E or D Then Invalid [1.4E-I,0]
If
If Any character other than + or - before first digit Then Invalid [$1.2E+ 10]
Variable An alphanumeric character is an alphabet (A-Z) or numeral (0-9). A variable is an alphabet optionally followed by any alphanumeric character with a maximum length of six. Anv name whose length is greater than six characters is truncated. For example, EXTINCTION is truncated to EXTINC and ABSORBANCE to ABSORB. The choice of variable name is arbitrary. However, it is advisable to choose a relevant name so that anybody in a given discipline can easily comprehend its meaning. For a data set of wavelength versus absorbance, the variable names WAVEL and ABSORB are more appropriate than Zl and Z2. Similarly for volume versus pH data, PH and VOLUME are preferable to X23 or Z4671. Do's of VARIABLE ./ Keywords can be used as' -iable names
* * *
VAR001.FOR8 DO = 0.123456T890123456789 IF = 981 )54321 WRITE(*,*)DO,IF END
VAR001.FOR 1.234568E-001 987654321
~~---------
30
COMPUTER ApPLICATIONS IN CHEMISTRY
2.1 Sequence Statements
* *
VAR002.FOR
* STOP = -123 END
=
0879
REAL == 12.23 INTEGER DATA
=
=
2
122
DOUBLE PRECISION == 1.D-3 WRITE(*,*)STOP,END,REAL,INTEGER,
VAR002.FOR 879.0000000
-123.0000000
12.2300000 2
./ Library functions (SIN, COS, ALOG, EXP, SQRT, ATAN) can be used as variable names (VAR003.FOR)
* * *
VAR003.FOR DIMENSION SIN(200) DATA SIN(180)/180.0/,SIN(90)/34./ WRITE(*,*)SIN(180) ,SIN(90) END
Don'ts of VARIABLE ® Special characters not valid in variable names (V ARlOl.FOR)
* *
VAR101.FOR
* BETA' 1.0e8 N ? N +1 END
* Unrecognizable statement (Error 89, Lines 5,6) * The variable & constant are to be separated by
2.1 Sequence Statements
FORTRAN STATEMENTS
31
® Hyphen" -" is an invalid character (VAR I02.FOR)
* * *
VAR102.FOR NO'S '" 17 NMBE-NEMF A"2 2*A*C C**C END
2
4
6 9
® First character of variable name should not be an integer. Some typical invalid variables given in
VAR103.FOR are explained in Table 2.5.
*. * *
VAR103.FOR 3MKCl '" 2.9897 EXTINCTION-COEFFICIENT p-Cl methane'" 345.0 END
16900
Table 2.5 : Reasons for the Invalidity of Variables
Variable
Reason
3MKCl
The first character is a numeric figure
EXTINCTION-COEFFICIENT
Special character '-'
p-CI methane
Special character '-'
Integer Variable Integer variable name can be started with any of the characters I, J, K, L, M or N. Examples for valid integer variables are KVOLTS, NEXP, NP, NSAMP, LANDA etc. A variable name starting with A-H or o-z can be made an integer variable with the statement INTEGER WAVEL, OUT, EMF
Real Variable A real variable name starts with any character A through H or 0 to Z. A few valid real variable names are PH, ABSORB, EPSI, RATE, WAVEL, BETAS etc. By default, a variable st~rting with any character I to N is integer. It can be made real, if desired, by declaring it as REAL KW, K12, LANDA, MU
32
2.1 Seqllellce STatements
COMPUTER ApPLICATIONS IN CHEMISTRY
Assignment Statement ASSIGNMENT is an executable statement and performs three tasks, viz.: (1) calculation of the expression on the RHS of '=' sign, (2) converting the numerical value to the proper type and (3) storing it in the variable on LHS. It should not be before type, array specification and data statements. Some typical assignment statements are given in ASIGNOOl.FOR * * *
ASIGNOOl.FOR
ABSORB = 0.356 CONC = 0.000178 EPSI = ABSORB/CONC WRITE(*,951)ABSORB,CONC,EPSI 951 FORMAT (lX,F25.18) STOP END
Name Class
Type
Offset. P
ABSORB CONC EPSI
REAL REAL REAL
20 24
28
10 11
Name
Type
MAIN Pass One
Size
Class
PROGRAH No Errors Detected 11 Source Lines
Functioning of Assignment In the computer, memory allocations are numbered and the Ilumerical values are stored in them. The language compiler performs one-to-one correspondence hI" ,\ cen names (Chart 2.2) and sequence of memory allocations (Fig. 2.1.1). ABSORB CONt' EPSI ,
23
20
----I
1 - - - - 1
24
27
31
28
0.356
0.356
0.000178
0.356
0.000178
2000
Fig. 2.1.1 : Memory Allocations for Variables
Chart 2.2 Step 01 Step 01 Step 03
A numerical value of 0.356 is placed in the memory allocation starting with 20 and up to 23. A Value of O.1 78000E-3 is placed in the memory at 24 to 27. Now, the values of absorbance and concentration are available to CPU. ALU performs division and the value is placed in memory allocated for EPSI, i.e., 28 to 31.
FORTRAN STATEMENTS
2.1 Sequence Statements
33
Replacement Statement Replacement is an executable statement. The flow chart symbol, position and functions are same as those of ASSIGNMENT statement. The variable on LHS and one of the variables on RHS are same in REPLACEMENT statement. A listing of REPLOOl.FOR and its algorithms (Chart 2.3) follow.
* REPL001.FOR
* *
LANDA = 360 WR1TE(*,951)LANDA
LANDA
360
LANDA
370
LANDA = LANDA + 10 WR1TE(*,951) LANDA 951
FORMAT(' LANDA STOP 'REPL_01' END
=
',14)
Chart 2.3
Step 01
Integer constant 360 is placed in the memory locations (16 to 19) corresponding tu iht: variable LANDA. This is an assignment statement.
Step 02
WRITE statement displays the value of LANDA as LANDA = 360 according to the Format 951.
Step 03
Numerical value of LANDA(360) is added to integer constant 10 and the resulting value, 370, is placed in the same memory allocations 16 to 19. Now the old value (in Step 01) is replaced by the new value (Step 03). At this stage the value 360 is not available. Thus LANDA = LANDA + 10 is a REPLACEMENT statement.
Step 04
WRITE statement displays the current value of LANDA. LANDA = 370
Step 05
STOP statement terminates the execution of the program with the message. STOP - Program terminated.
The advantages of REPLACEMENT -statement are decreasing the number of variables and saving program memory.
Exchange of Values Stored in T~o Variables In the programs EXCOOl.FOR numerical values 370.0 and 0.35 are assigned wrongly to absorbance and wavelength, respectively.
34
2.1 Sequence Statements
COMPUTER ApPLICATIONS IN CHEMISTRY
* *
EXC001. FOR
*
951
ABSORB = 370. WAVEL = 0.35 WRITE(*, 951)WAVEL,ABSORB FORMAT(/' WAVE LENGTH = ',F10.3,5X, 'ABSORBANCE STOP END
[WAVE LENGTH = .350
',FlO. 3)
ABSORBANCE = 370.000
To correctly assign 370 to W A VEL and 0.35 to ABSORB, exchange of the contents of memory allocations (corresponding to ABSORB and WAVEL) is necessary. A set of algorithms (Chart 2.4) and corresponding FORTRAN program follow. Chart 2.4
ABSORB 370.0 0.35
Initial state Expected final state
WAVEL 0.35 370.0
Algorithm I : (XALO I) ABSORB = WA VEL 0.35 0.35 W A VEL = ABSORB 0.35 0.35 As the values of both ABSORB AND W AVEL are 0.35, it is an incorrect algorithm. Algorithm 2: (XAL02) Wf\VEL = ABSORB 370.0 370.0 370.0 ABSORB = WA VEL 370.0 The value of W AVEL is correct but ABSORB is wrong. It is also an incorrect algorithm. TEMP I TEMP 2 Algorithm 3: (XAL03) 370.0 TEMP I = ABSORB 370.0 0.35 0.35 TEMP2 = WAVEL 0.35 370.0 370.0 0.35 0.35 370.0 ABSORB = TEMP2 0.35 0.35 370.0 W AVEL = TEMPI 370.0 0.35 The expected result is possible using two temporary memory allocations (TEMP I, TEMP2) Algorithm 4: (XAL04) TEMPI = ABSORB ABSORB =WAVEL WA VEL =TEMPt
370.0 0.35 0.35
0.35 0.35 370.0
370.0 370.0 370.0
* * *
35
FORTRAN STATEMENTS
2. J Sequence Statements
EXCHANGE OF CONTENTS OF MEMORY ALLOCATIONS (VARIABLES) ABSORB WAVEL
370. 0.35
C
C
INCORRECT ALGORITHM (XAL01)
C
ABSORB = WAVEL WAVEL = ABSORB WRITE (*,9_01) WRITE(*,951) WAVEL,ABSORB
* * *
INCORRECT ALGORITHM (XAL02) 370. ABSORB WAVEL 0.35 ABSORB WAVEL ABSORB WAVEL WRITE (*,9,01) WRITE(*,951) WAVEL,ABSORB
* *
CORRECT ALGORITHM (XAL03)
* 370 .. ABSORB 0.35 WAVEL ABSORB TEMP1 WAVEL TEMP2 TEMP2 ABSORB TEMP1 WAVEL WRITE(*,902) WRITE(*,951) WAVEL,ABSORB
*
*
CORRECT ALGORITHM (XAL04)
. *
901 902 951
370. ABSORB 0.35 WAVEL ABSORB TEMP1 WAVEL ABSORB TEMP1 WAVEL WRITE(*,902) WRITE(*,951) WAVEL,ABSORB FORMAT ( / " INCORRECT ALGORITHM '.) FORMAT(/,lOX, 'CORRECT ALGORITHM ') FORMAT(' WAVE LENGTH = ',F10.3,5X, 'ABSORBANCE STOP END
',FlO .3)
36
2.1 Sequence Statements
COMPUTER ApPLICATIONS IN CHEMISTRY
INCORRECT ALGORITHM
=
WAVE LENGTH
.350
ABSORBANCE
.350
INCORRLCT ALGORITHM WAVE LENGTH
=
370.000
ABSORBANCE
370.000
CORRECT ALGORITHM WAVE LENGTH
=
370.000
ABSORBA"NCE
.350
CORRECT ALGORITHM WAVE LENGTH
=
370.000
ABSORBANCE
.350
Algorithms 3 and 4 achieve the final target. However algorithm 4 is optimum compared to algorithm 3 from the software engineering point of view, as the latter uses only one temporary variable (TEMPI). Although algorithms I and 2 do not have any syntactic errors, they output undesired results. Such errors are known as logical errors.
Expression An expression is formed by the combination of operands and operators. An operator is a constant, variable, function subprogram or an expression enclosed in a set of parentheses: The general form of expression is
[Operator] Operand [Operator Operand] [Operator Operand] An expression has no independent existence except in IF statement. It occurs on the RHS of '=' sign in assignment or replacement statements. There are three types of expressions, viz., arithmetic, relational and logical.
Arithmetic Expression Arithmetic expressions are formed when arithmetic operands and arithmetic operators are written in juxtaposition to each other. The arithmetic operators used in FORTRAN are +, -, t, *, ** where two stars in succession are considered as a single operator representing exponentiation. The simplest expression is a constant or variable [e.g., 3, X, Sin (0.5), ABS (-5)]. A unary or monadic operator has the form OPERATOR OPERAND the examples being -3, +X. An operator sandwiched between operands [OPERANDI OPERATOR OPERAND2] is a dyadic operator.
Example: NIP + NAP, SO
* SO, B - CF, SO ** 3, MOLWTINEC
Evaluation of a valid arithmetic expression results in a numerical constant. FORTRAN arithmetic operators, equivalents in human (algebraic) domain and their function are given in Chart 2.5.
37
FORTRAN STATEMENTS
2.1 Sequence Statements
Chart 2.5
Algebraic Operation
Arithmetic Domain
Computer Domain
A+B A-B AxB A.B AlB
A+"B A-B A*B
Addition Subtraction Multiplication Division
AlB
A
-
B
AB
Exponentiation
A * *B
Don'ts of ARITHMETIC EXPRESSION ® Two or more operators in succession are invalid PH = B + - CF VAR = SD *** 2 B = A ** -1 The correct forms of above expressions are PH = B - CF VAR SD ** 2 B = A ** (-1)
Addition and Subtraction The algebraic sum (addition or subtraction) of two operands involves comparison of exponents, shifting of mantissa, summation of mantissa and normalization to a prefixed number of digits. The algorithm and examples for addition and subtraction of two operands are given in Charts 2.6 and 2.7, respectively. In FORTRAN they are performed as binary operations. Chart 2.6
Step 1
Express the. operands in normalized form
Step 2
If
Step 3
Then
Exponents of the two operands are equal Mantissa are added & Exponent of the result is equal to that of augend/addend
If
Exponent of the two operands are not equal
Then
Mantissa of operand with lower exponent is expressed such that its exponent is equal to that of the other operand & Mantissa are added only to the prefixed number of digits such that its exponent is equal to that of the other operand & Exponent of the result is equal to that of operand with larger exponent
38
COMPUTER ApPLICATIONS IN CHEMISTRY
2.1 Sequence Statements
Chart 2.7 : Illustration of Addition/Subtraction
AUGEND= ADDEND= RESULT=
.123456800E-06 .876543200E-06 .100000000E-05
AUGEND= ADDEND= RESULT=
.123456800E+34 .876543200E+14 .123456800E+34
AUGEND= ADDEND= RESULT=
.123456800E+06 .876543200E+14 .876543200E+14
AUGEND= ADDEND= RESULT=
.123456800E+06 .210456800E+OO .123457000E+06
AUGEND= ADDEND= RESULT=
.123456800E+06 .543210800E+03 .124000000E+06
MINUND= SUBTRAHEND= DIFFERENCE=
.100000000E+36 .888888900E+05 .100000000E+36
MINUND= SUBTRAHEND= DIFFERENCE=
.100000000E+36 .888888900E+34 .911111200E+35
MINUND= SUBTRAHEND= DIFFERENCE=
.100000000E+36 .888888900E+30 .999991100E+35
MINUND= SUBTR.ZI.HEND= DIFFr:RENCE=
.100000000E+36 .888888900E+25 .100000000E+36
MINUND= SUBTRAHEND= DIFFERENCE=
.100000000E+36 .888888900E-06 .100000000E+36
Multiplication and Division These operations are common in all chemical computations. For example, normality of a solution (NOR) is the product of molarity (MOLAR) and number of ionizable protons (NIP). NOR = MOLAR
* NIP
2.1 Sequence Statements
FORTRAN STATEMENTS
39
Similarly, extinction coefficient (EPSI) is the ratio of absorbance (ABSORB) of a coloured compound to its molar concentration (CONC) (see ASIGNOOl.FOR). Division by zero results in infinity and overflow error is displayed in FORTRAN.
Exponentiation The variance (VAR) of an un grouped data from standard deviation (SD) can be calculated as shown in the following box.
101 VAR SD ** 2 102 VAR = SD ** 2.0 106 SD VAR ** 0.5 Thc statements 101 and 102 appear' to be same. But they are calculated by separate algorithms as given in Chart 2.8. Chart 2.8
11 Then If Then
Exponentiation symbol is succeeded by an integer constant/variable Expression is calculated as successive multiplication SD *SD Exponentiation symbol is succeeded by a real constant/variable Expression is calculated in logarithmic mode EXP(2.0 * ALOG(SD»
Obviously, the statement 106 is evaluated as only EXP(O.5 * ALOG(VAR)) ALOG1O (SD) is the FORTRAN equivalent to 10glO SD
Arithmetic Operations on Heterogeneous Constants/Variables When the two operands of the expression are not of the same type, i.e., one is an integer and the other real, evaluation is not trivial. Integer is considered as weaker or simpler object and the real as the stronger (Chart 2.9). The heuristic is to convert the weaker into the stronger. Thus, the result of the arithmetic operation is of the strong type. The result of arithmetic operations on integer (I), real (R) and complex (C) variables is given in Table 2.6. Chart 2.9
If Then If Then If Then If Then
Both operands are of the same type Arithmetic operation is performed directly Operands are of different type Convert weaker operand into the stronger & Perform arithmetic operation Variable on LHS and the constants on RHS are same Constant is assigned to the variable Variable on LHS and the constant on RHS are of different type Transform the constant to the variable type & Assign the constant to the variable
Let us see the calculation of TC (pi) value in real and integer mode. In real mode (TC = 22.017.0), the arithmetic operation results in 3.142857. On the other hand, execution of the statement TC = 2217 results in 3.0000 since the division is performed in the integer mode as 22 and 7 are integer constants. So, the fractional part of the result is truncated. The integer value 3 is to be assigned to a real variable pi.
40
2. J Sequence Statements
COMPUTER ApPLICATIONS IN CHEMISTRY
Hence integer 3 (fixed value) is floated (converted into real number) resulting 3.0000. The other two possible expressions are 1t = 22.017 and 1t = 2217.0 One of the operands (constant) is real and the other is an integer. Real being the stronger, the integer constant is floated. 7 and 22 become 0.7000EO! and O.2200E02, respectively. Now the arithmetic operation (division) is performed in real mode resulting in correct value 3.142857. In another example x = II3, both the operands are integers and the integer division results in integer zero. The variable on LHS of "=" is a real one. The integer constant zero is transformed into a real one 0.0000 OOOOeO and is assigned to x. Thus, the value ofx is 0.0 but not 0.3333333333. Table 2.6
Operand II ---->
Operand I I
R
C
I
I
R
C
R
R
R
C
C
C
C
C
Relational and Logical Expressions A relational, logical or a combination of these expressions results in either true or false. The symbols of operation in the human domain and those used in FORTRAN are given in Table 2.7. The hierarchy of relational operators is in the order EQ .. NE . .GT . .GE. .LT .. LE. and that for Logical Operators .NOT., .AND. and .OR. Table 2.70 : Relational and Logical Operators Human Domain English Form
FORTRAN Symbol
Domain .EQ .
Is equal to Less than
.GT.
Not equal to
.NE.
Greater than or equal to
.GE.
Less than or equal to
.LE.
Table 2.7b : Logical Operators
FORTRAN
Human Domain
Symbol
Domain
And
&
.AND.
OR
I
.OR.
NOT
-
.NOT.
2.1 Sequence Statements
FORTRAN STATEMENTS
41
Arithmetic Operations with more than Two Operands When more than two operators are present in a valid expression, heuristic~ to perform operations are mandatory. Otherwise different end results are obtained depending upon the order of evaluating the expression. Tn the algebraic expressions, parentheses ( ), braces { I and square brackets [ ] are used to pin point the hierarchy of operations. However, in FORTRAN only sets of parentheses are used to any level of nesting. Consider the calculation of root of a quadratic equation ROOTl
=
(-B + SQRT(B**2-4.*A*C))/(2.*A)
The expression on RHS consists of three sets of parentheses, square root function, exponentIatlQn, multiplication, addition and subtraction. The heuristics or rules to perform the hierarchy of operations are given in Chart 2.10. Chart 2.10
*
When more than one set of parentheses occur, the expression in the inner most parentheses, the next in the innermost and so on is evaluated
( ( ( (
) ) ) )
LJ * *
* *
Library functions/function subroutines are evaluated from left to right. Exponentiation is performed. If there is more than one exponentiation, the order of calculation is from right to left. For example, A**B**(C**D) is calculated as::::} A**B**E::::} A**F where E = C**D and F = B**E Multiplication or division have the same hierarchy and are evaluated from left to right Addition or subtraction is performed from left to right.
Considering the expression in the innermost parentheses of the above statement, B**2 -4.0*A*C, the constant (4.0) and the variables A, B, C are all in the real mode. The arithmetic operations performed are in the order exponentiation, multiplication and subtraction. Since, there are two multiplication operations, the expression is scanned from left to right and thus 4.0*A is followed by the multiplication of the result with C. The order of performing arithmetic operations, intermediate results and corresponding one in the human domain are given in Table 2.8. The calculation of standard deviation (SD) from variance (VAR) using apparently similar formulae is implemented in VAR2SD.FOR.
42
2.1 Sequence Statements
COMPUTER ApPLICATIONS IN CHEMISTRY
Table 2.8 : Hierarchy of Operations
Variable
b2
Cl2
**2 4. * A
Cl3
C12
Cl4
Cll-Cl3
~b2 - 4ac
C21
SQRT (C14)
(b 2 - 4ac)
C22
-B+C21
-b +~b2 - 4ac
C31
2
C41
C22/C31
CII
* *
Algebra
FORTRAN
B
*C
*A
VAR2SD.FOR
* DIMENSION SD(6)
4a 4ac
2a -b + ~b2 - 4ac /2a
VAR = 1.0000 1. 000
1. 000 1. 0000 1. 000
1.0000 .500
VAR = 1.4142 1. 414
2.000 1. 4142 1. 000
1.4142 1. 000 .
VAR = 1.7321 1. 732
3.000 1. 7321 1. 000
1.7321 1. 500
DO 10 1=1,3 501
VAR = I SD(l) VAR * * (1. 12 )
502
SD(2)
VAR * * (1/2.)
503
SD(3)
VAR ** 0.5
504
SD(4)
SQRT(VAR)
505
SD(5)
- VAR ** (1/2)
506
SD(6)
VAR ** 1/2
WRITE(*,901)VAR 901
FORMAT (/6X, 'VAR ='FB.3) WRITE(*,902) (SD(J), J =1,6)
902 10
FORMAT(2X,3F10.4/1X,3FI0.3) CONTINUE END
The statements 501 to 504 give the same and correct result. However, statements 505 and 506 produce incorrect values. The statement 505 gives a value of 1.0 irrespective of the value of VAR. Statement 506 always gives SD as half the magnitude of variance. This is due to the hierarchy of the operations. The program SD2V AR.FOR converts standard deviation into variance again by different formulae but the end result is same.
FORTRAN STATEMENTS
2.1 Sequence Statements
* * *
SD2VAR.FOR DIMENSION VAR(4) DO 40 I = 1,3 SD
SD = 1.000 1.0000 1.000
1.0000
1.0000
SO = 1.414 2.0000 2.000
2.0000
2.0000
SO = 1.732 3.0000 3.000
3.0000
3.0000
SQRT (FLOAT (I) )
VAR(l)
SD * SD
VAR(2)
SD**2
AL10
ALOG (10.)
VAR(3) VAR(4)
SD**2. 10.**(2*ALOG(SD)/AL10)
WRITE(*,901)SD 901
43
FORMAT (/5X, 'SD =' ,F8 .3) WRITE(*,902) (VAR(K),K=l,4)
902 40
FORMAT(2X,3F10.4/1X,3F10.3) CONTINUE END
Precision in Arithmetic Operations The word precision in arithmetic operations denotes the number of digits considered for a real constant expressed in the normalized form. In FORTRAN, reaVinteger variable! constant is in single precision by default. The maximum valid integer is 32767. A real constant occupies two 16-bit words (four 8-bit bytes). In ASIGN002.FOR, ABSORB and CONC are stored as 0.3560 DOOOE 00 and 0.1780 0000E-03 and EPSI as 0.2000 0000 E 03 in memory allocations 20-23, 24-27 and 28-31, respectively.
* * *
951
ASIGN002.FOR REAL*4 ABSORB,CONC,EPSI ABSORB = 0.356DO CONC = 6.000178DO EPSI = ABSORB/CONC WRITE(-*,951) ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END
ABSORB REAL CONCREAL EPSIREAL
.356000000000000000 .000178000000000000 2000.000000000000000000
Explicitly, the variables can be defined as single precision using the statement REAL*4 ABSOR,CONC,EPSI
20 24 28
44
2. J Sequence Statements
COMPUTER APPLICATION,S IN CHEMISTRY
These real variables are declared as double precision before the first executable statement as DOUBLE PRECISIONABSOR,CONC,EPSI or REAL*8 ABSOR,CONC,EPqI The program ASIGN003.FOR illustrates implementing ASIGN002.FOR in double precision. The variables ABSORB, CONC and ESPI are allotted eight 8-bit bytes (20-27, 28-35 and 36-43) of memory allocations.
* * *
ASIGN003.FOR
951
DOUBLE PRECISION ABSORB,CONC,EPSI ABSORB = 0.356DO CONC = 0.000178DO EPSI = ABSORB/CONC WRITE(*,951) ABSORB,CONC,EPSI FORMAT (lX,F25.18) STOP END
ABSORB REAL*8 CONC REAL*8 EPSI REAL*8
20 28 36
.356000000000000000 .000178000000000000 2000.000000000000000000
A real constant in D notation (e.g., 0.356DO or 0.000178DO) specifies that it is in double precision. Looking at the results of the two programs, one finds that there is no difference between the values of EPSI. At this stage, one should not hastily conclude that double precision calculations yield the same results as those in single precision. A general notion prevailed among some chemists is that double precision is not necessary' as the readability of a burette is 0.1 or 0.0 I and the chemicals are weighed with a balance of accuracy of four or six decimal places. But an insight into the effect of cumulative and truncation errors during a series of complicated calculations established the need for double precision. The compiler stores the real constants to eight digits only irrespective of the number of digits given in the program. It is an usual practice to give some of the constants to the program by calculating them manually. The number of digits given will be mostly based on prejudice. For example, the value of 7t can be computed in the program itself rather than recapitulating or manually calculating it. The program ASIGN004.FOR gives the values of 7t in single and double precision. The digits after eighth and sixteenth places in single and' double precision are insignificant and are shaded and have no meaning.
* *
ASIGN004.FOR
*
951
DOUBLE PRECISION PIDP PIDP = 22.DO/7.DO PISP = 22.0/7.0 WRITE(*,951) PISP,PIDP FORMAT (lX,F25.18) STOP END
PIDPREAL*8 PISP REAL
20 28
3 .1428 5700 rmItItlillitItltIQ 3.1428571428571430tl
45
FORTRAN STATEMENTS
2.2 Transfer Control...
Even if the values of X and PI are given up to 30 places in the assignment statements (ASIGN005.FOR), they are stored up to sixteen places in double precision.
* ASIGN005.FOR
* *
951
*
REAL*8 X,PI X = 0.6666 6666 6666 6666 6666 6666 6666 6666 6666DO PI = 3.1428 5714 2857 1430DO WRITE(*,951)X,PI FORMAT (lX, 'X ='F40.30/ 'PI ='F40.30) END
X = .6666 6666 6666 6666 0000 0000 0000 00 PI = 3.142 85714285 7143000000000000 000 In numerical analysis, it is a good practice to use double precision to avoid rouHd off errors. The library function SNGL transforms a double precision variable or constant into a single precision one, wh1\e DBL converts single precision entity into a double precision variable.
2.2 TRANSFER CONTROL STATEMENTS
~
The normal sequential execution of statements in a program is altered through transfer control statements. The control is transferred to a user chosen statement by this category, viz., IF, GOTO and Computed GOTO.
If Statement . It is an executable statement and occurs anywhere after-specification statements except as the last statement of the DO domain. The flow chart symbol (Fig. 2.2.1) and pseudo code are given below. Pseudo code IF Then Else Endif
Predicate Process a Process b
Fig. 2.2.1 : Flow Chart Symbol of IF
Selection Between Two Alternate Process Many of the mathematical, statistical and chemical tasks require selection of alternate decisions, calculations or a set of procedures. Some of the widely employed tasks include obtaining absolute value of a variable, calculation of sum of the elements of a vector, comparison of two real numbers, sorting positive and negative numbers, and calculation of pH of a strong acid on successive dilution.
46
2.2 Transfer Control ...
COMPUTER ApPLICATIONS IN CHEMISTRY
Example 2.2.1 The algorithm to take the absolute value of a variable (X) involves a test to decide whether it is negative or not. The relational expression (X < 0) is called predicate and it results in either YES or NO (TRUE or FALSE; ON or OFF; 1 or 0). If X is negative, then it is to be multiplied by -1 to get the absolute value (Fig. 2.2.2). On the other hand, when X = 0 or positive nothing is to be done. Depending upon the answer either process is performed. The pseudo code and the FORTRAN program (IFOO1.FOR) to implement absolute value of a variable are given below.
T
F
Pseudo code (Case 1) X=-5
Predicate (XB Then A is greater than B Else A is less than or equal to B
Step 3
STOP
2.2
Tran,~fer
49
FORTRAN STATEMENTS
Control ...
Steps I and 3 are sequential while Step 2 is a selection statement. The flow chart (Big. 2.2.5) and program are as follows:
F
T .....~>B
A~~"""
Fig. 2.2.5 : Comparison of Two Numbers
*
•
IF005.FOR
*
* 951
WRITE ( * , 951) FORMAT (5X, ' GIVE A AND B READ(*,*)A,B
, \)
c IF (A .GT.B) THEN WRITE(*,901)A,B ELSE WRITE(*,903)A,B ENDIF C
FORMAT (/' A FORMAT (I ' A
901 903
',FS.2,' IS GREATER THAN B = ',FS.2) ',FS.2,' IS LESS THAN OR EQUAL TO B ',FS.2)
END
GIVE A AND B : A = 2.00 IS LESS THAN OR EQUAL TO B 4.00
-,
GIVEAANDB:
A =6-.00 IS GREATER THAN B = 4.00 GIVEAANDB:
A
=4.00 IS LESS THAN OR EQUAL TO B 4.00
50
COMPUTER ApPLICATIONS IN CHEMISTRY
In order to distinguish h4fween the two cases A = B and A < B, another selection is needed for false condition, i:e., LHS of A > B decision box. When A > B is false, the control is transferred to ELSE condition (LHS of A > B decision box). Before it enters the decision box, A is less than or eqrtal to B is true. The predicate used in this decision box is A = B. When it is true, a message A = B is displayed. On the other hand for the false condition, A < B is outputted. This results in aI].other IF THEN ELSE structure within the first IF THEN ELSE statement (Fig. 2.2.6). One or more IF THEN ELSE structures within an IF THEN ELSE statement is called nesting of IF statement (IFOO6.FOR) .
2.2 Transfer Control ...
T
F
A.:S B ] ........ F
• Fig. 2.2.6 : Nested DO Loop
* * * 951
IFOO6.FOR WRITE(*,951) FORMAT (5X, ' GIVE A AND B READ(*,*)A,B
, \)
C
IF (A .GT.B) THEN WRITE(*,901)A,B ELSE IF(A .EQ.B)THEN WRITE(*,902)A,B ELSE WRITE(*,903)A,B ENDIF ENDIF C
902
FORMAT(/' A FORMAT(! , A
' ,FS.2, ' IS GREATER THAN B = ',FS.2) ' ,FS.2, ' IS EQUAL TO B = ',FS.2)
903
FORMAT(/' A
',FS .2, ' IS LESS THAN B
901
END
',FS.2)
2.2 Transfer Control...
FORTRAN STATEMENTS
51
GIVE A AND B : A = 4.00 IS EQUAL TO B = 4.00 GIVEAANDB: A = 2.00 IS LESS THAN B = 4.00 GIVEAANDB: A = 6.00 IS GREATER THAN B = 4.00
There is no limit for the extent of nesting (Fig. 2.2.7). But the readability and understanding of the program rapidly diminishes with the increased nesting. Expert systems - computer programs mimicking experts knowledge in decision - contain hundreds to thousands of rules and they demand readability and understanding rather than condensed source code. Therefore, nesting using a series of IF-THEN statements is avoided as shown in IF007.FOR (Fig. 2.2.8) to implement the algorithm of IF006.FOR program.
Fig. 2.2.7 : Multilevel Nesting
52
COMPUTER ApPLICATIONS IN CHEMISTRY
F
•
2.2 Transfer Control ...
T
.---- 3 is false. So, control is transferred again to the beginning of DO loop. SUM = SUM + X(3), i.e., SUM = 6. I
=I + INC =4.
Since, 1 > 3 is true, the control exits the domain of the DO loop.
Generalization of Summation of Linear Array Whenever there is a change in the number of data points to be summed, editing of the DIMENSION and ASSIGNMENT statements is necessary. Instead, the value of the integer variable (NP) can be given by the user and thus the program can be generalized (D0003.FOR). Further, inputting the values of X through a DO loop eliminates NP ASSIGNMENT statements. The two DO loops can be combined in this case as in D0004.FOR.
* *
*
D0003.FOR PARAMETER (MAX = 1000) DIMENSION X(1000) CALL IXO(NP)
*
* *
* 10
20
*
DO 10 I = l,NP READ(*,*)X(I) CONTINUE SUM = O. DO 20 J = l,NP SUM = SUM + X(J) CONTINUE WRITE(*,*)SUM END
10
DOOOIl.FOR DIMENSION X(1000) READ (*,*)NP SUM = 0 DO 10 I = l,NP READ ( * , * ) X (I ) SUM = SUM + X(I) CONTINUE WRITE(*,*)SUM END
2.3 Do Statement
FORTRAN 8T ATEMENTS
63
Summation is used in calculating standard deviation, linear least squares, product of matrices etc. Similarly, the product of elements in the linear array
(.II I
Xp can be calculated as in DOOOS.FOR.
1=1
* * *
10
)
DOOOS.FOR PARAMETER (MAX = 50) DIMENSION X (MAX) READ(*,*)N PROD = 1. DO 10 I = 1,N READ(*,*)X(I) PROD = PROD * X(I) CONTINUE WRITE(*,*)PROD STOP END
A special case is the product of first N natural numbers, where PROD(N) is the factorial (!) of N. This algorithm (D0006.FOR) is utilized in NCR and NPR programs.
*
*
D0006.FOR
*
CALL IXO(N) PROD = 1. DO 10 I = 1,N PROD = PROD * I 10 CONTINUE WRITE(*,*)PROD END $INCLUDE : 'IXO.FOR'
Implied DO Loop It is an abridged form of DO loop and its utility is restricted to READ and WRITE statements. Its general format is READ(*, *) (Var (Scintvar), Scintvar =init, ifinal Linc D WRITE(*,*) (Var (Scintvar), Scintvar =init, ifinal LincD The input part of D0003.FOR to obtain SUM of elements of linear array is given in DOOO7.FOR. The remaining part of the program remains same. The X array can be displayed as WRITE(*,*) (X(I), 1= 1, NP). Then the programs D0007.FOR and DOOO3.FOR give the same results.
64
2.3 Do Statement
COMPUTER ApPLICATIONS IN CHEMISTRY
* D0007.FOR
* *
DIMENSION X(1000) CALL IXO(NP)
*
20
READ ( * , *) (X ( I) , I SUM = 0 DO 20 J = l,NP SUM = SUM + X(J) CONTINUE
l,NP)
* WRITE(*,*) (X(I),I l,NP) WRITE(*,*)SUM END $INCLUDE '\F77\FOR\IXO.FOR'
Infinite Repetition Structure Repetition structure is a boon. But infinite repetition (loop) is a curse when conditional termination of the loop is not planned (D0008.FOR), unconditional transfer control statement (GOTO) is used (D0009.FOR) or the variable in predicate/the index variable of DO loop is not changed (DOOlO.FOR). The only way to interrupt the process is to boot the computer or to kill the job in multi processing systems. * * * 501
DOO08.FOR N = 0 N = N + 1 WRITE(*,*)N IF (N. LE .10) GOTO 501 END
* * * 501
502
D0010.FOR N = 0 IF(N.LE.10)THEN N = N + 1 WRITE(*,*)N GOT0501 ELSE GO TO 502 ENDIF CONTINUE END
* * * 501
502
Output of DOOO9 .FOR
DOO09.FOR
1 2 3
N = 0 N = N + 1 WRITE(*,*)N GO TO 501 CONTINUE END
25 26 2490
Output of DOOlO.FOR
1 1 1
1
2.3 Do Statement
65
FORTRAN STATEMENTS
Do's of DO ./ The last statement of the DO loop can be any executable statement including CONTINUE . ./ Negative increment or decrement for index variable is valid (DOOll.FOR).
*
*
*
901 691
D0011.FOR L=21 LSQUARE=441
DO 691 L = 21,18,-1 LL = L*L WRITE(*,901)L,LL FORMAT (' L = ',13, 3X, 'L SQUARE CONTINUE END
L = 20 L SQUARE = 400
, ,14)
L = 19 L SQUARE = 361
L = 18 L SQUARE = 324
./ Dummy DO loop - non-executable statements - does nothing except spending or whiling away the computer time (DOOI2.FOR).1t is useful to develop time delay loops.
* *
*
100
*
D0012.FOR DO 100 I = 1,100 CONTINUE END DUMMY DO LOOP
./ An integer expression is valid for initial, final and increment of the index variable (DOO 13.FOR).
* * *
675
D0013. FOR INIT = 1 N = 3 PROD = 1 DO 675 K = INIT*2,N**2,2*INIT PROD = PROD*K WRITE(*,*)K,PROD END
2
2.0000000
4
8.0000000
6
48.0000000
8
384.0000000
./ Premature exit from DO loop is valid. It is useful to count the number of negative numbers in an array sorted in ascending order (DOOI4.FOR).
*
*
*
4 5
D0014.FOR DIMENSION X(121) DATA X ( 1) ,X ( 2) ,X ( 3 ) ,X ( 4) ,X ( 5) ,X ( 6 ) / - 2 , -1, 0 , 1 , 2 , 3 / NN = 0 D04L=1,6 IF (X(L) .LT.0)NN=NN+1 IF (X(L) .GE.O)GOTO 5 CONTINUE WRITE(*,*)NN END
66
COMPUTER ApPLICATIONS IN CHEMISTRY
2.3 Do Statement
., The two logical IF statements (DOOI4.FOR) (DOOI5.FOR) or block IF perform the same function.
* *
D0015.FOR
*
DIMENSION X(121) DATA X(l) ,X(2) ,X(3) ,X(4) ,X(5) ,X(6) /-2,-1,0,1,2,3/ L =1 IF(X(L) .LT.O)THEN NN=NN+1 ELSE GOTO 5 ENDIF L = L+1 GO TO 501 WRITE(*,*)NN END
501
5
As long as the value of current element of X is < 0, the value of NN is increased by one, and the number of negative values. When X (L) ~ 0 the counting is stopped. The testing of other numbers is not necessary as X is an ordered array. So using the transfer control statement (GOTO) execution of the DO loop is terminated. The premature exit saves execution time while working with very large arrays or databases. Program DOOI6.FOR also gives same result except when all six X values are tested. Of course this is useful even when the array is not sorted.
* *
*
4 5
D0016.FOR DIMENSION X(121) DATA X ( 1) ,X ( 2) , X ( 3 ) ,X ( 4) ,X ( 5) , X ( 6 ) / - 2 , -1, 0 , 1, 2 , 3 / I DO 4 L = 1,6 IF (X(L) .LT.O)THEN NN=NN+1 ENDIF CONTINUE WRITE(*,*)NN END
Don'ts of DO ® DO label must not precede the DO statement, because DO works in the forward but not in the reverse direction (DO 10 1. FOR). 1* 2*
* *
*
D0101.FOR
10 WRITE(*,*)IZ DO 10 IZ = 1,10 END
DOlOl.FOR 3* 410 'WRITE(*,*)IZ 5 DOIOIZ=I,10 ***** Error 107 - DO label must follow DO statement 6 END ***** Error 132 - DO or IF block not terminated
2.3 Do Statement
67
FORTRAN STATEMENTS
® Index variable of the DO loop should not be redefined in DO domain (D0102.FOR). I* 2*
* *
DOI02.FOR
* N
= 6
DO 100 I = 1,10 I = N*l WRITE(*,*) I 100 CONTINUE END
DOI02.FOR
3*
4 N=6 DO 100 I = 1,10 5 1 6 1= N*l ***** Error 811 - assignment to DO index variable WRITE(*, *) I
1 7 1 8100
9
CONTINUE END
Comment: Index variable is manipulated in I = N*l statement ®The last statement of a DO loop should not be a transfer control statement (D0103.FOR). 1* 2*
* * *
DOI03.FOR DIMENSION Rl(2 0),R2(2 0),R3 (20) CALL RREAD(Rl,R2)
20
DO 10 I = 1,20 R3(I)
10
=
Rl(I)+R2(I)
GO TO 20 END
D0103.FOR
3* 4
DIMENSION Rl(20), R2(20), R3(20)
5
CALL RREAD (Rl, R2)
620
DO 10 I = 1, 20
7
R3(1) = Rl(I)+R2(1)
810 GOT020 ***** Error 120 - GOTO not allowed here 9
END
Comment: GOTO 20, an unconditional transfer control statement is used where CONTINUE is expected. ® The format specification cannot be used as the last executable statement in a DO loop (DOI04.FOR). * * *
DOI04.FOR DO 6 M = 1,4 WRITE(*,6)M
6
FORMAT (15) END
1* 2* DOl04.FOR 3* 4 D06M=1,4 1 5 WRITE(*,6)M ***** Error 165 -label already used as FORMAT 1 66 FORMAT(15) ***** Error 134 - FORMAT label already referenced 1 7 END ***** Error 132 - DO or IF block not terminated
68
COMPUTER ApPLICATIONS IN CHEMISTRY
2.3 Do Statement
® Statement number in DO statement should not be omitted (DOIOS.FOR). * * *
10
D0105.FOR N = 0 DO I = 1,10 N = N+1 ENDDO END
® The control should not be directly transferred to the domain of the DO loop (DO 106.FOR).
*
I
*
41 10
*
2* D0106.FOR 3* 4 NP=4 5 IF (NP .GT. 2) GOTO 41 6 DO 10 1=1,6 I 7 41 CONTINUE ***** Error 102 - jump into block not allowed 1 8 WRITE(*,*)N 1 9 10 CONTINUE 10 END
D0106.FOR
*
NP = 4 IF (NP .GT. 2) GOTO 41 DO 10 I = 1,6 CONTINUE WRITE(*,*)N CONTINUE END
® Subscript value of an array should not be outside the range of array dimension (D0201.FOR). * * *
10
D0201.FOR
1.0000000 2.0000000 4.203895E-045 6.678588E-042 .0000000 9.612743E-026
DIMENSION X(2) DATA X!1.,;;:. / DO 10 I = 1,6 WRITE(*,*)X(I) CONTINUE END
® The scope of IF statement should not be outside DO domain (DO 107 .FOR).
* *
D0107.FOR
*
10
DO 10 I = 1,4 IF (I .GT.2)THEN WRITE(*,*)I CONTINUE ENDIF END
1 '" 2* 3* 4 1 5
D0107.FOR
DO 10 I = 1,4 IF (I .GT.2)THEN WRITE(*,*)I 1 6 CONTINUE 1 710 ***** Error 113 - improperly nested DO or ELSE block 8 ENDIF 9 END
2.3 Do Statement
FORTRAN STATEMENTS
69
® The index variable should not be a real one (OOIOS.FOR).
* *
DOI08.FOR
*
1
SUM = 0 DO 1 X'= 1, 3 . 0 , 0 . 5 ,SUM = SUM + X WRITE(*,*)X,SUM CONTINUE END
Nested DO Loops The presence ofa DO loop within the domain of the first DO loop results in a first level nested DO loop, I he general form being
DO Stnol IndVarl '" Init, !final [,inc] Process I DO Stn02 IndVar2
=
Init;lfinal [,inc]
Process 2 Stn02
CONTINUE Process 3
Stnol
CONTINUE
Stno I, Stn02
Two different or same statement numbers.
IndVarl,lndVar2
Two different scalar integer variables
First level DO loop is used for Input/output, arithmetic operations (except multiplication), logarithmic and trigonometric operatIOns on matrices. However multiplication of two matrices or input/output of a third order tensor requires a two level nested DO loop. Keeping in view of the mathematical or statistical applications of multi-way data structures, the upper limit of nesting DO loop is restricted to 25. However many real life problems do not require. nesting beyond fourth level.
Do's of NESTED DO Loops ./ The same variable either for initial or final values of index variable is valid . ./ The last executable statement for all DO loops in the nested one can be same
(000I7.FOR).
70
2.3 Do Statement
COMPUTER ApPLICATIONS IN CHEMISTRY
* *
D0017.FOR
*
20 951 10
PARAMETER (MAX=5) DIMENSION A (MAX, MAX, MAX, MAX) ZERO = 0.0 DO 10 11 = 1,MAX DO 10 12 = 1,MAX DO 10 13 = 1,MAX DO 20 14= 1,MAX A(I1,I2,I3,I4) = ZERO CONTINUE WRITE(*,951) (A(Il-,I2,13,I4) ,14 FORMAT(lX,5F8.2) COWrrNUE END
1,MAX)
.00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00
,(The same statement number can be used for all DO loops in the nested one. However, there should be a single executable statement as the last one (DOOI8.FOR).
* * *
10
D0018. FOR' PARAMETER (MAX=5) DIMENSION ZERO (MAX, MAX, MAX, MAX) 1,MAX DO 10 11 1,MAX DO 10 12 1,MAX DO 10 I3 1,MAX DO 10 14 ZERO(I1,I2,I3,I4) O. END
2.3 Do Statement
FORTRAN STATEMENTS
71
Don'ts of NESTED DO Loop ® DO block should not be left unterminated (D0109.FOR). The statement number of the last executable statement should be that of the DO label.
* * *
D0109.FOR DO 316 11 = 1',7,3 DO 316 12 = 1,4 WRITE (* , * ) I1, 12 END
® Nesting should not be improper (DOllO.FOR). The domain of the inner DO loop should end before that of the outer.
* *
D0110.FOR
* DIMENSION A(20,20),B(20,20) ,C(20,20) CALL RREAD(A,B) DO 10 I 1,20 1,20 DO 20 J A(I,J)+B(I,J) C (I,J) 10 20
CONTINUE CONTINUE END
Comment: 20 CONTINUE should be before 10 CONTINUE.
® The DO label and the statement number of the last statement of DO domain should not be different (DO 111.FOR).
* *
D0111.FOR
* DIMENSION UMAT(10,10) DO 10 I =1,10,1 DO 10 J = 1,10 UMAT(I,J) 1. 100
CONTINUE END
72
2.3 Do.Statement
COMPUTER ApPLICATIONS IN CHEMISTRY
® The same index variable should not be used in a nested DO loop (DOI12.FOR).
* * *
D01l2.FOR
DO 10 I = 1,4 DO 20 I = 1,2 WRITE(*,*)I 20 CONTINUE 10 CONTINUE END
1 * D01l2. FOR 2 * 3 * 4 Dt> 10 I = 1,4 1 DO 20 I = 1,2 5 ***** Error 811 - assignment to DO index variable WRITE(*,*)I 1 6 1 7 20 CONTINUE 8 10 CONTINUE END 9
Applications of NESTED DO Loops (2) Printing of multiplication tables (DOOI9.FOR).
* * *
20 10
11
3
11 22 33
II 11 11 II 11 11 11
4
44
II 11
D0019.FOR DO 10 I = 11,13 DO 20 J = 1,10 • IJ = I*J WRITE(*,*)I,J,IJ CONTINUE CONTINUE END
1 2
5 6 7
8 9 10
55 66 77 88
99 110
(2) Input of matrices of size rnxn.
Reading a matrix row wise (D0020.FOR) and reading a matrix column wise (D0021.FOR)
* * *
D0020.FOR DIMENSION A(20,20) READ(*,*)M,N DO 50 K
40 50
=
* * *
DIMENSION A(10,10) READ(*,*)M,N
I,M
DO 40 L = 1,N READ ( * , * ) A ( K, L) CONTINUE CONTINUE END
D0021.FOR
DO 15 I
15
= l,N
DO 15 J = I,M • READ (*,*)A(J,I) CONTINUE END
2.3 Do Statement
FORTRAN STATEMENTS
o Generation of unit (ONES.FOR), zero (ZEROS.FOR) and identity (EYE. FOR) matrices * * *
ONES.FOR
20
* *
SUBROUTINE ONES(A,ROWA,COLA) INTEGER ROWA,COLA DIMENSION A(ROWA,COLA) ONE = 1.0 DO 2 0 I -=l , ROWA DO 20 J=l,COLA A(I,J) = ONE CONTINUE END
ZEROS.FOR SUBROUTINE ZEROS(A,M,N) PARAMETER (MAX = 20) DIMENSION A (MAX, MAX) ZERO = 0.0 DO 10 I = 1,M DO 20 J = 1,N A(I,J) = ZERO CONTINUE CONTINUE END
20 10
* * *
EYE.FOR
50 10
SUBROUTINE EYE (A,ROWA,COLA) INTEGER ROWA,COLA DIMENSION A(ROWA,COLA) ZERO = 0.0 ONE = 1.0 DO 10 I=l,ROWA DO 50 J=l,COLA A ( 1, J) = ZERO CONTINUE A(1, I) =ONE CONTINUE RETURN END .
73
74
2.4 Input/Output
COMPUTER ApPLICATIONS IN CHEMISTRY
.00
.00
.00
1.00
.00
.00
.00
1.00
1.0
I
Another way of producing identity matrix is depicted in EYE02.FOR
* * *
EYE02.FOR PARAMETER (MAX = 20) DIMENSION EYE(MAX,MAX) CALL IXO(N)
*
20 10
DO 10 I = 1,N DO 20 J = 1,N IF(I.EQ.J)THEN EYE (I, J) 1 ELSE EYE (I, J) 0 ENDIF CONTINUE CONTINUE
* CALL WX2(N,N,EYE,MAX) END $ INCLUDE $ INCLUDE
'\MSF-CAC\STATEMENTS\IXO.FOR' '\ MSF-CAC\STATEMENTS\WX2.FOR'
/2.4
INPUT/OUTPUT'
Read Statement It is an executable statement. Its syntax is
READ (unit-number, S [,err = stno] [,END = intvar])list unit-number: Logical device/unit number (indicates the input from a standard input device like keyboard) S
Format specification (directs the input from keyboard)
list
Set of variables, array elements or array(s)
stno
Label of the statement to which control is transferred in case of an error condition in READ operation. When data are read from a keyboard 'END' is never executed. However, it does not create any error message during execution.
END
2.4 Input/Output
75
FORTRAN STATEMENTS
READ(*, *)PH causes a group of characters representing the value of pH to be read from keyboard. It assigns the value to PH. If the number keyed in is 1.690, it is equivalent to the assignment statement. PH = 1.690 However, READ is used instead of an assignment when the value changes from run to run. The program (READOO I.FOR) is thus independent of the numerical values.
* *
READ001.FOR
* CHARACTER*10 NAME WRITE(*,901) 901
FORMAT(' Give name of compound,NEC,WBT') READ(*,*)NAME,NEC,WBT WRITE(*,*)NAME,NEC,WBT END
The compound name (NAME), number of electron change (NEC) and molecular weight of the substance (WBT) are read from the keyboard. The values can be keyed in as 'Oxalic acid', 2, 126.03 or 'Oxalic acid' 2 126.03 The first value 'Oxalic acid' is as'signed to NAME, 2 for NEC and 126.06 for WBT. It is equivalent to three assignment statements. Quotes are mandatory to enclose a character/string data. A prompt on the screen as given in READOO I.FOR avoids the need to remember the sequence and type of variables (READ002.FOR). An error condition occurs in READ operation (READ003.FOR) when fewer values are available than the variables or due to syntactic errors in the constants. To avoid an abrupt halt ERR option is used in the READ statement (READ004.FOR).
* * *
READ002.FOR
* *
CHARACTER*l CHEMIS(lO) READ(*,*) CHEMIS
READOO3.FOR
*
WRITE(*, *)CHEMIS
READ(*,*)EMF WRITE(*, *)EMF
END
END
* *
READ004.FOR
* 100
READ(*,*,ERR=lQO)WT WRITE(*,*) 'ERROR IN READING WT' END
C:\>READ004 2.123. ERROR IN READING WT
76
2.4 input/Output
COMPUTER ApPLICATIONS IN CHEMISTRY
An implied DO loop can be used in a READ statement as is given in READOO5.FOR. Each data line should have NP data points of UV -visible spectrum (wavelength and absorbance).
* *
READ005.FOR
*
801 901
DIMENSION WAVEL(lOO),ABSORB(lOO) READ(*,*)NP READ(*,801) (WAVEL(I) ,ABSORB(I),I = l,NP) FORMAT(6FIO.3) WRITE(*,901) (WAVEL(I),ABSORB(I),I = l,NP) FORMAT(lX,6FIO.3) END
The number of data points on each line can be varied (READ006.FOR) to avoid taking zero values for some of the variables in case of excess format specification.
* *
READ006.FOR
* DIMENSION WAVEL(lOO) ,ABSORB(lOO) READ(*,*)NP
501
801
NOVAR = 2 N2 0 Nl = NE+1 N2 = Nl + NOVAR IF (N2 .GT. NP) N2=NP READ(*,801) (WAVEL(I) ,ABSORB(I),I
Nl,N2 )
FORMAT(6FIO.3) WRITE(*,901) (WAVEL(I),ABSORB(I),I IF (N2 .LT. NP)GOTO 501
901
FORMAT(lX,6FIO.3) END
Do's ofREADjWRITE Statement ./ The list of arguments may be null in a WRITE statement (WRITEOO1.FOR).
* *
WRITE001.FOR
* 901
WRITE(*,901) FORMAT (' LIST OF ARGUMENTS IS NULL') END
Nl,N2)
2.4 input/Output
FORTRAN STATEMENTS
77
./ The list of arguments can be an array element, set of array elements or the entire array (READ007.FOR).
* READ007.FOR
* *
DIMENSION VOL(4) READ(*,*)VOL(l) ,VOL(2) READ(*,*)VOL(3),VOL(4) READ(*,*)PH END ./ Variables of different types can be read from a single statement. In READ008.FOR, NAME is a character string; NEC is an integer and WBT a real constant. A blinking cursor is an indication that data is to be given from the keyboard. READ statement accepts three values from the keyboard. The first one is character string, while the second and third are numerical values.
* *
READ008.FOR
* 901
READ(*,901)NAME,NEC,WBT FORMAT(A10,I2,F8.3) END
Don'ts OF READ/WRITE Statement ® No Comma after the last variable or before the first variable (READI01.FOR).
* *
READ10l.FOR
* READ(*,*)VOL,PH WRITE(*,*) ,VOL,PH END
List directed Input/Out1?ut (I/O) List directed I/O statements transfer data between I/O device and CPU. In this mode an implicit format is used. A comma, blank or group of blanks separates two data items. The blank spaces are ignored. List directed I/O is used for debugging and quick results. When the algorithm is completely tested for different data sets, output is converted into formatted mode.
Formatted I/O Although reading values of variables witholK a format (free format) is preferred, sometimes data recorded by instruments or that compiled by organizations is to be used. It was the earlier practice not to leave a space or insert comma between successive data items, due to the capability of computer to read the data items using a format.
78
2.4 Input/Output
COMPUTER ApPLICATIONS IN CHEMISTRY
In formatted I/O the variables are specified in READIWRITE. The spacing between values is explicitly spelled out in the FORMAT statement. Formatted I/O in application programs gi ves a professional appearance, makes the data visually attractive and easy to read (READ009.FOR).
*
*
* 851
READ009.FOR CHARACTER*l HEAD1(3},HEAD2(3} READ(*, 851}HEAD1,HEAD2,VOL,PH FORMAT(3Al,3Al,F5.2,F7.2} WRITE(*,851}HEAD1,HEAD2,VOL,PH END
To optimize the use of punch cards data input to the computer was planned using odd format specification. In many cases respecification of format was considered mandatory for running different programs. The READ and FORMAT statements together tell the computer in what columns the data is present. A conflict between the type of variable and the data results in I/O execution error.
FORMAT Statement FORMAT is a non-executable statement. It consists of a statement label in 1 to 5 columns followed by the keyword, FORMAT and specification (Table 2.4.1) which is a list of edit descriptors enclosed in a set of parentheses. Table 2.4.1 : Edit descriptors in format statement
Variable Type Integer Real Real Real NumericalCharacter (or) String Blanks/Spaces Logical
Format Fixed Floating E~onential
Double Precision General String Hollereth Logical
Edit Descriptor
Symbol I F E D G A
aI w aFw.d aEw.d aDw.d aGw.d aAw aX (or) aH aLw
X
L
The syntax is Stno FORMAT([SI,][,S2][,S3]) Sometimes the list is null. The format can also be specified as given in READ008.FOR or in READOlO.FOR.
* *
READ010.FOR
*
951
INTEGER ATNO CHARACTER*20 FMT1,NAME FMTl = '(lX,Al0,I2,F8.3)' WRITE (*,951) FORMAT (lX, 'GIVE NAME N~ WBT READ(*,*}NAME,NEC,WBT WRITE (*,FMT1}NAME,NEC,WBT END
,}
2.4 Input/Output
FORTRAN STATEMENTS
79
Do's of FORMAT Statement ./' FORMAT statement can be used by any number of READ or WRITE statements . ./' FORMAT can follow or precede the TlO statements. -/' All or soine of the FORMAT statements can be grouped and placed either before END or after the first executable statement.
Integer Input The general form of the data specification for integers is aIw, where a is repetition factor and w is field width. It indicates that the next w input columns contain a right justified integer. The variable in the READ statement and the data should be in integer mode. The leading blanks are ignored. But the trailing blanks are interpreted as zeros. a is omitted if the repetition factor is one for all numerical fields (integer, real, complex) of any precision. A negative sign occupies one space and the decimal point another space. In the output plus sign is left as blank for positive values. In READ statement user can omit positive sign. Depending upon the ma~nitude of numerical values, different formats are used. For example, the number of ionisable protons (NIP) may not exceed nine, while the number of experimental points (NP) may be in hundreds. The format specifications II and T3 are appropriate for them (READOll.FOR).
* *
READOll.FOR
* 180
READ(*,180)NIP,NP FORMAT(Il,I3) WRITE(*,*)NIP,NP END
If data are given as 2036, FORTRAN interprets NIP as 2 and NP 036 or 36; on the other hand, if the data
are given as 236 then NIP will be taken as 2 while NP as 360. The compiler interprets blank as zero and thus 36 becomes 360.
F Format F Format is used for I/O operation of floating point constant. The general format of F specification is a Fw.d d : number of digits after decimal point The number of digits after the decimal point in the output is chosen depending upon the accuracy of input data. For example, the visible spectrum of KMn04 was recorded using a total concentration of 0.000486. It can be displayed as 0.000486 using F8.6 format which indicates that there are six digits after the decimal point and the total width is 8. The EMF of electrochemical cells is measured to an accuracy of 0.1 mV and its range is -1000 to +2000. Therefore, the minimum width is 7 places; one for the sign, four for integer part, one for the decimal and another for fractional part. The READ statement in READ012.FOR inputs atomic weights of six elements to the computer. The WRITE statement outputs the values with four spaces between each value. The order in which I/O operation occurs is ATWT(I), ATWT(2), ... ATWT(6).
80
2.4 Input/Output
COMPUTER ApPLICATIONS IN CHEMISTRY
* *
READ012.FOR
* DIMENSION ATWT(6) READ(*,*)ATWT WRITE(*,*)ATWT END As the number of points in a titration is not known a priori, the maximum possible size, say, 10 is to be declared in the DIMENSION statement (READ013.FOR). Although 10 memory allocations are reserved, all of them need not be used in the program. The number of data points read is counted as soon as VOLUME and COND are read. If the number exceeds that mentioned in the DIMENSION statement, an error 'Number of data points exceed dimension' message is printed and the job is terminated .
.
* *
READ013.FOR
* DIMENSION VOL (10), COND (10), CCOND (10)
501
MAX
10
NP
0
VO
50.
NP
NP + 1
READ(*,*)A, B IF (A .LE.O. AND. B .LE.O) GOTO 502 IF (NP .LE.MAX) THEN VOL (NP)
A
COND (NP)
B
CCOND (NP)
COND (NP) * (VO + A)/VO
WRITE (*,*) VOL (NP), COND (NP), CCOND (NP) GOTO 501 ELSE WRITE (*, 951) FORMAT ('NUMBER of data points exceed dimension')
951 ENDIF 502
STOP END
Since the number of points changes from titration to titration, the number of points can also be given to the computer through another READ statement. Then the data array can be inputted using li DO loop (READ014.FOR) or implied DO loop (READ015.FOR).
2.4lnplIt'
81
FORTRAN STATEMENTS
JIlt
* READ014.FOR
*
* *
* DIMENSION X(20) READ ( * , * ) NP DO 10 I = l,NP READ ( * , * ) X ( I ) WRITE(*,*)I,X(I) CONTINUE 10 AVE2 = AVE(NP,X,20) WRITE(*,*)AVE2 END $INCLUDE: 'AVE. FOR' $INCLUDE: 'SUMXl.FOR'
READ015.FOR
* DIMENSION X(20) READ(*,*)NP READ ( * , *) (X ( I) , I WRITE(*,*)X AVE2 = AVE(NP,X,20) WRITE(*,*)AVE2 END $INCLUDE: 'AVE.FOR' SINCLUDE: 'SUMXl.FOR'
1,NP)
The READ statement of TFMTOO I.FOR expects the input record with the structure Column I first number Column 2 - 4 second number Column 5 - 8 third number Any input characters beyond 8th column are ignored.
* * * 81
IFMTOOl.FOR READ(*,81) NEXP,NP,IEMF FORMAT(Il,I3,I4) WRITE(*,*) NEXP,NP,IEMF END
Do's of F Format If the format is Fw.O and there is no decimal point in the data, it is understood to be an integer.
./
901 FORMAT (F5,O) 2671 is interpreted as 2671.0 "
If the format is Fw.d and there is no decimal point in the data then a decimal point is inserted d-digits from the right end. 17 -3762
READ (*,17) EMF FORMAT(F5.l)
A decimal point is inserted one digit from right and the input is interpreted as -376.2 . ./
The explicit decimal point in data overwrites format specification for real variables. Then the trailing and leading blanks have no effect.
82 ./
COMPUTER ApPLICATIONS IN CHEMISTRY
2.4 Input/Output
Tfthe format is Fw.d & there is a decimal point in the input data Then Fw.d Format is overridden. r
READ (*/,18) EMF FORMAT(F5.1)
18 -37.25
as per FORMAT specification the decimal point should be before the digit 5. But, the value of EMF is taken as -37.25 only.
E Specification E format is used for the data in the exponential form. The rules for E specification are similar to those for F. W is the number of digits after decimal point plus eight one each for sign, decimal point, digit before decimal point and five positions for signed exponent. If Then
Input has more digits than predefined for the machine Computer stores the number as accurate as possible.
If Then
Exponent is equal or beyond the limit of the software Exponent overflow or underflow occurs
Space Control The horizontal spacing in input/output is specified by X and the general format is aX. The fields of the input specified by X may be blanks or can contain extraneous characters. The output layout of the tables is affected by X format. There is no gap between the two data items, namely, WAVEL and ABSORB if the format is as shown in WRITE002.FOR. For legibility a horizontal space of three columns is given using X format in WRITE003.FOR. Now by changing 901 FORMAT.
* * *
WRITE002.FOR. INTEGER WAVEL(4) DIMENSION ABSORB(4) DAT~ WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,i.156/ NP = 4 DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I)
901 10
FORMAT(I4,F4.2) CONTINUE END
2.4 Input/Output
FORTRAN STATEMENTS
83
* WRITE003.FOR
*
* INTEGER WAVEL(4) DIMENSION ABSORB (4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4
901
DO 10 I = 1,NP WRITE(*,901)WAVEL(I),ABSORB(I) FORi'IAT (14, 3X, F5. 3)
10
CONTINUE END
Column headings will improve the readability of the output and thus another WRITE statement using quote format is used (WRITE004.FOR).
* *
WRITE004.FOR
* INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/
9 51
NP = 4 WRITE(*,951) FORMAT (' WAVE LENGTH' , 3 X, 'ABSORBANCE' )
901
DO 10 I = l,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT(I4,3X,F5.3)
10
CONTINUE END
The numerical values are not centered with respective columns. So, physically counting the columns or using a T fOJ;mat results in the desired output.
TFORMAT T format indicates the computer to 'TAB' to a specified column. The general format is Tn where n is a positive integer specifying the column needed. The T specification is useful to READ the data in a different order from that used to enter the data (READ016.FOR).
COMPUTERApPLICATIO~IS IN CHEMISTRY
84
* * * 171
2.4 Input/Output
READ016.FOR READ(*,171)EMF,VOL FORMAT(T11,F10.3,T1,F10.3) END
The number from the 11th column is read first and assigned to EMF. Then the number from 1st column is read and assigned to volume. Tn fact in the data file the data are entered in the order VOL and EMF.
Printing Bivariate Data Column Wise UV - visible spectrum and electrometric titration data represent paired or bivariate data sets. The modification of WRITE004.FOR for a better output is given in WRITE005.FOR. A formal display of the table is possible (WRITE006.FOR) with a modification of the previous formats.
*
*
*
951
901 10
WRITE005.FOR INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4 WRITE(*,951) FORMAT (T1, 'WAVE LENGTH' ,T15, 'ABSORBANCE' ) DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT(T5,I3,T17,F5.3) CONTINUE END
*
*
WRITE006.FOR
*
952 951
901 10
INTEGER WAVEL(4) DIMENSION ABSORB(4) DATA WAVEL/360,500,600,700/ DATA ABSORB/1.052,1.125,1.425,1.156/ NP = 4 WRITE(*,952) FORMAT(T5,31(lH-)) WRITE(*,951) FORMAT (T5, 'WAVE LENGTH', T22, 1H' ,T25, 'ABSORBANCE' ,T35, 1H) WRITE(*,952) DO 10 I = 1,NP WRITE(*,901)WAVEL(I) ,ABSORB(I) FORMAT (T5, 1H' ;T10, 13, T22, 1H' ,T27, F5. 3, T35, 1H' ) CONTINUE WRITE(*,952) END
2.4 Inpllt/Output
85
FORTRAN STATEMENTS
Input/Output through External Files The I/O operations can also be performed through disk files. Some terms relevant to disk files are given below.
Field The datum corresponding to a variable is called a field. It represents a unit of information. MOLWT and CHEMIS are the fields in the following programs.
* FILE001.FOR
* *
* * *
CHARACTER*l CHEMIS(40) OPEN(l,FILE='DAT') READ(1,801,ERR=2000,END=2000)CHEMIS WRITE(*,*)CHEMIS 801 FORMAT (40Al) 2000 STOP END
FILE002.FOR REAL MOLWT OPEN(l,FILE='MOLWT') READ(l,*)MOLWT WRITE(*,*)MOLWT END
Record A record consists of data. Records may be of the same or of different type. Further the order of variables is ~mmaterial.
* *
FILE003.FOR
* $DEBUG CHARACTER*lO IFILE,OFILE CHARACTER*lO CHEMIS OPEN(3,FILE='TEMPI' ,STATUS='NEW') READ(*,*)CHEMIS,NEC,TVOL WRITE(3,*,ERR=4000)CHEMIS,NEC,TVOL 801 4000
FORMAT(lX,10A1,I2,F10.3) STOP END
File A program can be executed from DOS prompt by giving input data through keyboard and the output can be displayed on the terminal. But a better way of inputting data is to create an ASCII file a collection of records pertaining to an object (DAT). Then the data from this file (DAT) can be inputted to the program (DOOO? .FOR) using the following DOS command.
86
COMPUTER ApPLICATIONS IN CHEMISTRY
2.4 Input/Output
C:\> D0007 D0007 RESULT writes the result in a disk file named RESULT. The files OAT and RESULT can be edited and deleted. FORTRAN has a built in facility of creating, editing and deleting ASCII data files. They may be input, intermediate, scratch or output files. The files are classified as external and internal.
External File It is an ASCII data file present on a floppy, fixed disk or CD-ROM. The external files are sub-classified into sequential and random access files depending upon storage on the auxiliary memory devices.
Sequential File In a sequential file the records are accessed one by one from the first to the last without skipping. It is useful for a large volume of information that rarely changes. The entire data set is read every time the program is executed. Thus it is processed as single unit. Editing record of a sequential file requires accessing all the preceding records. A sequential file is generally updated by creating a new version with another name. The operations involved are copying, inserting new records and deleting undesired ones. The original and new files are called master and working or mother and daughter files. The disadvantage of sequential file lies in updating and retrieving processes. The hardware components keyboard, monitor and printer are sequential devices that are considered as files from the software point of view.
Random Access File In a random access (or direct) file, records are numbered sequentially from one to the maximum number declared by the user. The records are accessed in any order. A list of the physical positions of the records is maintained on the disk by the software. The only restriction for the random access file is that all the records should be of the same size and length.
Do's of Random Access File ./ The contents of a record can be overwritten.
Don'ts of Random Access File ® Deletion of a record is not possible. ® Retrieving the contents of a non-available record results in an error.
Formatted and Unformatted Files External files are classified as formatted and unformatted files. Formatted files are organized as a stream of characters terminated by end of line marker. The characters are transmitted between the file and the program during I/O operation. In formatted files, data are stored in system dependent form. I/O operations are faster than those in unformatted files because no conversion is needed during I/O operation.
Don'ts of Unformatted File ® Formatted and unformatted records cannot be mixed.
2.4 Input/Output
87
FORTRAN STATEMENTS
OPEN It is an executable statement. It can be placed any'where in the program. It establishes a logical connection
between the file on the device (magnetic tape, floppy, fixed disk) and the FORTRAN program. The statement has multifold purposes, namely to •
connect an external file to the unit specified
•
create a file that is pre-connected
•
create a file and connect it to the unit, and
•
change certain attributes of the connection
• The syntax of OPEN statement with all the options is OPEN ([UNTT=intconst I] [,ERR=intconst2] [,STATUS=stat] [,FTLE=fname] f,ACCESS=acel f,BLANK=blnl [,FORM=fmtl f,IOSTAT=intvar] [,RECL=intconst3] OPEN
Keyword. The default options of OPEN are that the file is a sequential and unformatted one.
UNIT
Logical unit number for the file. intconst 1 is integer constant/expression (>0) corresponding to file number.
ERR
It is a label of the statement in the program to which control should be transferred in the event of an error during open operation.
STATUS
It indicates whether the opened file is a new or old one. If the file is old (STAT='OLD'), it makes only the logical connection between the file and the program. If it is new file
(stat='NEW'), it is created with the unit number and is opened. The options are 'NEW', 'OLD', 'SCRATCH' or 'UNKNOWN'. The default option is UNKNOWN. In the SCRATCH option, the software deletes the file after the execution of the program. FILE
Name of the file (fname) to be associated with the unit number.
ACCESS
The two valid options are 'direct' (random access) and 'sequential'.
RECL
It indicates the maximum length of record in the random access file. RECL is used only
for random access file. FORM
The options available are 'FORMATTED' and 'UNFORMATTED'. The default choice is UNFORMATTED.
IOSTAT
It indicates the successful completion or error condition during opening of the file. The
integer variable (intvar) is set to zero upon successful opening of the file. intvar is assigned with positive value in case of an error condition.
Do's of OPEN ./ ./
OPEN statements can be collected together at one place like format statements . A scratch file is defined in FILEOO4.FOR.
88
2.4 Input/Output
COMPUTER ApPLICATIONS IN CHEMISTRY
* * *
FILE004.FOR
•
N = 6 OPEN(24,STATUS= , SCRATCH' ,FILE= 'JUNK') WRITE(24,*)N CLOSE (24) OPEN(24,STATUS='OLD' ,FrLE= 'JUNK') WRITE (24, *)N END
Don'ts of OPEN ® A specifier for a 'unit cannot appear more than once (FILE OOS.FOR) * * * $DEBUG
2001 951 2002 952
FILE005.FOR
OPEN(i5,FILE OPEN(15,FILE N = 4 WRITE(15,A)N CLOSE (15) OPEN(15,FILE= READ (15, *) X WRITE(*,*)X .sTOP WRITE(*,951) FORMAT (' Error STOP WRITE(*,952) STOP FORMAT (' Error END
'TEMP1',STATUS='NEW' ,ERR 'TEMP2',STATUS='NEW',ERR
2001) 2002)
'TEMP2',STATUS='OLD')
in opening file TEMP1')
in opening file TEMP2')
The first OPEN statement opens the existing file on the disk. The second OPEN statement dictates that the same logical unit number is to be used while the first file is open.
CLOSE It is an executable statement and disconnects the file from the FORTRAN unit number. FORTRAN has a built in fail-safe automatic facility of closing all opened files during the normal termination of the execution of the program. However CLOSE statement enables the programmer to explicitly close a specified file. In other words it closes the file that is opened with the unit number. It occurs anywhere in the program after the corresponding OPEN statement. The syntax of CLOSE statement is
2.4 Input/Output
89
FORTRAN STATEMENTS
CLOSE (UNTT= intconst, [,ERR = = intconst 2] l,STATUS=stat] [,TOSTAT=intvar3] CLOSE: Keyword STATUS :The two options are 'KEEP' and 'DELETE'. The default option is 'KEEP'.
Do's of CLOSE Statement ./ CLOSE statement for a non-existing file is equivalent to 'Do nothing' action (FILE006.FOR). '.
* *
FILE006.FOR
*
2001
CLOSE(lO) CLOSE (30) STOP FORMAT('ERROR IN CLOSING FILE 30 ') END
ENDFILE It is an executable statement and should succeed the OPEN statement for sequential execution of the file. The syntax is END FILE: Keyword It writes an end of the file mark. The statement can be followed by CLOSE statement. An ENDFILE is automatically written whenever BACKSPACE, REWIND or CLOSE are executed or during normal termination of the program.
Consider the situation where all WRITE operations of a sequential file are performed in one program and retrieval operations in another program (FILE007.FOR, FILE008.FOR).
* * *
FILE007.FOR OPEN(20,FILE='DAT',STATUS= 'NEW' ) N = 20 WRITE ( 20 , * ) N END
,
* * *
FILE008.FOR OPEN(20,FILE='DAT' ,STATUS READ(20,*)N WRITE(*,*)N END
'OLD' )
90
COMPUTER ApPLICATIONS IN CHEMISTRY
2.4 Input/Output
The END in the first program automatically closes the files after writing the end file mark. The same effect is also achieved by closing the sequential file in the same program (FILEOO9.FOR).
* *
FILE009.FOR
* OPEN(20,FILE='DAT',STATUS= 'NEW') N = 20 WRITE(20,*)N REWIND (20) READ ( 2 0 , * ) N WRITE(*,*)N END However an explicit use of the command is made to rewind the file without closing it (FILEOlO.FOR).
*
*
*
FILEOIO.FOR OPEN(70,FILE='DAT',STATUS= 'NEW') N = 20 WRITE(20,*)N ENDFILE(20) REWIND (20) READ(20,*)N WRITE(*,*)N END
BACKSPACE It is an executable statement used to reposition the pointer at the beginning of a record. It can occur anywhere after a file is opened. BACKSPACE command is useful to overwrite a record and to read a record again when an error occurs due to type or format specification conflict. This command is costly from computer resources point of view and ~o should be used scarcely.
Do's of BACKSPACE ,/ It can be used even when the file is at the beginning (FILEO II.FOR).
*
*
FILEOll.FOR
* $DEBUG PH = 7.5 OPEN(16756,FILE= 'PH',STATUS='NEW') BACKSPACE 16756 WRITE(16756,*)PH CLOSE(16756) OPEN(16756,FILE='PH') READ(16756,*)PHZ WRITE(*,*)PHZ END
2.4 Input/Output
FORTRAN STATEMENTS
91
./
If BAC~SPACE command is used n times, the pointer is positioned at (n-l) th preceding record (FILEOI2.FOR) .
./
If the file is positioned after the end file mark, two BACKSPACE statements position the pointer at the last record (FILE013.FOR).
* *
FILE012.FOR
* DIMENSION VOL(6) DATA VOL/1,2,3,4,5,6/ OPEN(3,FILE='VOL',STATUS='NEW') DO 10 I = 1,6 WRITE(3,*)VOL(I) 10
CONTINUE BACKSPACE (3) BACKSPACE (3) BACKSPACE (3) READ(3,*)X WRITE(*,*)X END
* *
FILE013.FOR
* $DEBUG DIMENSION VOL(6) DATA VOL/1.,2.,3.,4.,5.,6./ OPEN(3,FILE='VOLUME',STATUS='NEW') DO 10 I = 1,6 WRITE(3,*)VOL(I) 10
CONTINUE ENDFILE(3) BACKSPACE (3) BACKSPACE (3) READ(3,*)X WRITE(*,*)X END
When BACKSPACE occurs within a record the pointer is positioned before the current record.
92
COMPUTER ApPLICATIONS IN CHEMISTRY
2.4 inpllt/Owput
Don'ts of BACKSPACE ® It is not possible to BACKSPACE the record of a file that does not exist (FILEWl.FOR).
* *
FILE101.FOR
* $DEBUG BACKSPACE (4) END
REWIND It is an executable statement and used in I/O operations with a file. This command gets the name from computers using tape (magnetic or paper) files. It is an indication that the next record to be read is at the beginning of the file. When a file is opened, the program automatically positions the pointer at the first record. Tn the case of magnetic tape, rewind command physically rewinds the tape. The syntax is REWIND ([UNIT = 1intconst, f,ERR = intconst 21 f,IOSTAT=ntvar3l)
Do's of REWIND ./ It has no effect if the file is already at the beginning (FILEO l4.FOR)
* *
FILE014.FOR
* OPEN (25, FIL1~",' SOL_DATA' ,STATUS=' NEW' ) D = 77. WRITE(25,*)D CLOSE(75) OPEN(UNIT ~ 26,FILE='SOL_DATA') REWIND 26 REWIND 26 READ(26,*)DI WRITE(*,*)DI END
Don'ts of REWIND ® It should not be used when a sequential file is first created (FILE015.FOR).
* *
"
"
FILE015.FOR
* ATWT = 12. ATNO = 6. OPEN(161,FILE='ELEMENTS' ,STATUS='NEW') REWIND 161 WRITE(161,*)ATNO,ATWT END
2.4 Input/Output
FORTRAN STATEMENTS
93
The error is because the end of the file mark is not written when the file is opened. The corrected version of the program is FILEO 16.FOR.
* * *
FILE016.FOR ATWT = 12. ATNO = 6. OPEN(161,FILE='ELEMENTS',STATUS='NEW') ENDFILE (161) REWIND 161 WRITE(161,*)ATNO,ATWT END
® It is not possible to backspace over a record written by list directed output statements.
* *
FILE102.FOR
* $DEBUG EMF = 263.5 IVOL = 10 OPEN(15,FILE='VOLEMF' ,STATUS='NEW') WRITE(15,*)EMF,IVOL BACKSPACE (15) READ (15, *) X, IX WRITE(*,*)X,IX WRITE(15,*)IVOL,EMF END
INQUIRE INQUIRE is an executable statement. It is used to ascertain (1) whether a file exists on the disk, (2) whether a file is opened, and (3) the name of the file opened. The syntax is
INQUIRE (FILE =fname, EXIST = Ivar, NEXTREC = ivar, BLANK=cvar, NAMED=lvar, OPENED=lvar, NAME=cvar, ERR=ivar, IOSTAT=ivar, SEQUENTIAL=cvar, DIRECT=cvar, FORMATTED=cvar, UNFORMATTED=cvar, RECL=ivar, NUMBER=ivar, ACCESS=cvar, FORM=cvar) INQUIRE: Keyword lvar : Logical variable cvar : Character variable
94
COMPUTER ApPLICATIONS IN CHEMISTRY
2.4 Input/Output
Do's of INQUIRE ./
A file can be inquired by unit number or file name. Use of both name and logical unit number is only to keep track of the files (FILEOl7.FOR).
* *
FILE017.FOR
* $DEBUG LOGICAL LVAR1 COND = 2312 OPEN(16,FILE='COND' ,STATUS='NEW') WRITE(16,*)COND CLOSE(16) INQUIRE (UNIT=16,EXIST=LVAR1) WRITE(*,951)LVAR1 951 FORMAT(' FILE EXIST: ',L4) INQUIRE(FILE = 'COND' , EXIST=LVAR1) WRITE(*,951)LVAR1 END
Don'ts of INQUIRE (8)
A variable used as a specifier in INQUIRE should not appear in another specification statement.
* *
FILE103.FOR
* $DEBUG
100 201
LOGICAL LVAR1,LVAR2 K = 1E4 OPEN(16,FILE='TEST' ,STATUS='NEW') WRITE(16,*)K LOGK = LOG (K) WRITE(16,*)LOGK ENDFILE (16) BACKSPACE (16) INQUIRE (UNIT=16,NAMED=LVAR1,OPENED=LVAR2,ERR WRITE(*,*)LVAR1,LVAR2 STOP WRITE(*,201) FORMAT('ERROR IN INQUIRE STATEMENT') END
100)
LV ARl and LV AR2 are declared as logical variables. They are used in INQUIRE statement at NAMED = LVARl and OPENED = LVAR2.
2.5 Dimension
95
FORTRAN STATEMENTS
12.5
DIMENSION'
Type Statement TYPE is a non-executable statement and precedes the first executable statement. It is used to explicitly declare the variable type, to override default convention of real and integer variables and to define logical, complex or character variables. The concept of default convention of variables as real and integer dates back to early nineteen sixties when only real and integer were defined. Logical and character variables were introduced later. However, explicit declaration makes the reader aware of the type of variables. The preferable variable for molecular weight is MOLWT. But, it is an integer variable and the statement MOLWT = 126.06 results in the integer value 126. In order to retain the significant digits after the decimal point, MOLWT is to be declared as real (TYPEOOI.FOR).
* * *
TYPE001.FOR REAL MOLWT MOLWT = 126.06 WRITE(*,*)MOLWT END
The statement REAL MOLWT overrides the default convention that MOLWT is an integer and thus its value is stored as 0.1260 6000 E03. To declare some or all the variables starting with a character, IMPLICIT and EXPLICIT categories are available. The knowledge base for the compiler interpretation of the type of variable when both IMPLICIT and EXPLICIT statements present in a program are given below. (1) If Variable is absent in IMPLICIT type declaration & Variable is absent in EXPLICIT type declaration Default convention prevails Then * * *
TYPE002.FOR
* * *
MOLWT = 126.06 WRITE(*,*)MOLWT END (2)
If
Then
TYPE003.FOR LANDA =- 420 WRITE(*,*)LANDA END
Variable is present in EXPLICIT type statement Variable belongs to {TYPE} EXPLICIT type declaration overrides default convention (TYPEOO4.FOR)
* *
TYPE004.FOR
* REAL LANDA LANDA =- 420 WRITE(*,*)LANDA END
96
2.5 Dimension
COMPUTER ApPLICATIONS IN CHEMISTRY
Comment : LANDA is an integer variable by default. However, the EXPLICIT type statement (REAL LANDA) overrides the default assumption and now LANDA is real. (3)
If
First character (C) of the variable is present in IMPLICIT {TYPEI } & There is no explicit statement.
Then
All the variables starting with character (C) belong to {TYPE I }. {TYPEI}: INTEGER, REAL, COMPLEX, CHARACTER, LOGICAL
* *
TYPE005.FOR
If
TYPE006.FOR
*
*
(4)
* *
IMPLICIT INTEGER*2 (A-Z)
IMPLICIT REAL*4
LANDA = 370.5 WRITE(*,*)LANDA END
MOLWT = 126.06 WRITE(*,*)MOLWT END
(A-Z)
First character of the variable is present in implicit {TYPEI} & Variable (V AR 1) is present in explicit {type2} statement.
Then
(VARI) belongs to {TYPE2} & All other variables belong to (TYPE I ). Explicit type statement overrides implicit type statement. IMPLICIT REAL *4 (A-Z) declares that in TYPEOO6.FOR all variables are real. So the default convention is overridden. However, EXPLICIT specification that LANDA is INTEGER results in that the variable LANDA only is integer and all others though starting with L (LOGBETA) are real.
Do's of TYPE Statement ./
More than one variable can be declared in a TYPE statement. Each variable is to be separated by a comma. REAL KW, MOLWT, N, NAP, NIP
./
There can be more than one TYPE statements of a category (TYPE007.FOR).
TYPE007.FOR REAL KW REAL MOLWT,N DATA KW,MOLWT,N,NAP,NIP/13.99,126.06,6.03E23,2,3/ WRITE(*,*)KW,MOLWT,N,NAP,NIP END
2.5 Dimension
97
FORTRAN STATEMENTS
.I A real variable by default can also be explicitly and/or implicitly be declared as real (TYPE008.FOR).
* *
TYPE008.FOR
*
951
REAL ABSORB ABSORB = 0.356 WRITE(*,951)ABSORB FORMAT (IX,F25.18) STOP END
Don'ts of TYPE Statement ® A variable should not be specified in more than one type statement.
* * *
1 * 2 * 3 *
TYPEI0l.FOR REAL KW REAL*8 KW, PKW PKW = 14.0 KW = 10. ** (-PKW) WRITE(*,*)KW,PKW END
.,
TYPEI0l.FOR
.. REAL KW 4 5 REAL*8 KW, PKW ***** Error 33-identifier already has type 6 PKW = 14.0 7 KW = 10. ** (-PKW) 8 WRITE(*,*)KW,PKW 9 END
Dimension Statement Wave numbers in the finger print region of the IR, chemical shifts in NMR spectrum or mJe values in a mass spectrum are a set of values, called an array or subscripted variable. An array is a collection of variables that have the same name and belong to the same type. Each element of the array is a scalar. The number of elements in an array is called size of the array. The number of subscripts in an array can be one, two or many and they are called one-, two-, or multi- dimensional arrays. If there are four peaks in an IR spectrum of il compound, assignment statements to ,wave numbers are not preferred since it requires as many lines of code as the number of values. The number of variables increases with the number of data items and the calculations become more cumbersome since the same step is to be repcated with each variable. Then the numerical values of the wave numbers can be coded as given in ARRA YOOl.FOR or the values can be read as given in ARRA Y002.FOR.
* * *
ARRAYOOl.FOR NUIRl, NUIR2, NUIR3, NUIR4' / 1650, 3320, 2315, 870/ DATA WRITE (*,*) NUIRl, NUIR2, C NUIR3, NUIR4 END
98
2.5 Dimension
COMPUTER ApPLICATIONS IN CHEMISTRY
* ARRAY002.FOR
*
* READ .(*,*) NUIR1, NUIR2, NUIR3, NUIR4 WRITE (*,*) NUIR1, NUIR2, NUIR3, NUIR4 END In mathematical or statistical literature, a one-dimensional variable is used with a subscript indicating the number of elements (NUIRi, i = 1 to 4). As subscripts and superscripts are not allowed in FORTRAN, a DIMENSION statement is introduced. The two ways of representing the elements of a vector (a set of values) in human and computer domain are given in Table 2.5.1. Table 2.5.1 : Representation of Subscripted Variable in Human and Computer Domains
Computer Domain
Human Domain NUIR1, NUIR2. NUIR3, NUIR4
NUIR1, NUIR2. NUIR3, NUIR4
NUIR, NUIR, NUIR, NUIR
DIMENSION NUIR(4)
NUIR, i = I to 4
NUIR(l), NUIR(2), NUIR(3), NUIR(4)
DIMENSION is a non-executable statement. It is used to specify the maximum number of elements in an array. It precedes the first executable statement (ARRA Y003.FOR) and succeeds implicit statement (ARRA Y004.FOR).
*
*
* *
ARRAY003.FOR
* DIMENSION MBYE(lOO) MBYE (1) =' 46 WRITE(*,*)MBYE(l) END
ARRAY004.FOR
* IMPLICIT REAL*4(A-Z) DIMENSION MBYE(lOO) MBYE(l) = 46 WRITE(*,*)MBYE(l) END
The advantage o{ DIMENSION statement is concise syntax. It renders array manipulation obvious. The source code is reduced drastically. The two programs ARRA Y005.FOR and ARRAY006.FOR perform the same task of reading four values for DELNMR from the keyboard.
* ARRAY005.FOR
* *
DIMENSION DELNMR(4) READ(*,*) DELNMR(l)' READ(*,*) DELNMR(2) READ(*,*) DELNMR(3) READ(*,*) DELNMR(4) WRITE(*,*)DELNMR(l) ,DELNMR(2) ,DELNMR(3),DELNMR(4) END
2.5 Dimension
99
FORTRAN STATEMENTS
* ARRAY006.FOR
* *
10
DIMENSION DELNMR(4) DO 10 I = 1,4 READ(*,*) DELNMR(I) CONTINUE WRITE(*,*)DELNMR END
Functioning of Dimension Statement DIMENSION statement allocates a group of consecutive memory locations to array elements. In other words, a chunk of memory is allocated with the same variable name but sliced into units equal to the number specified based on precision. Each variable occupies a slice of memory. An array element is referred in the program by the name of the variable followed by the number of the element in parentheses. For example, DELNMR(3) is the third element in the array with the name DELNMR. DELNMR in the above programs is a single precision real variable with four elements, DELNMR(1), DELNMR(2), DELNMR(3) and DELNMR(4). They are stored in successive memory locations namely 20-35 as shown below. DELNMR [1]
DELNMR [2]
DELNMR [3]
DELNMR [4]
I III I I III I I III I I III I 20
23
24
27
28
31
32
Do's of DIMENSION ./ Only some elements of the array may be used while processing (ARRA Y008.FOR).
* *
ARRAYOOB.FOR
* DIMENSION DELNMR(4) , TOWNMR (4) DELNMR(l)
3.5
DELNMR(2)
6.0
TOWNMR(1)
10 . * DELNMR ( 1 )
TOWNMR(2)
10. * DELNMR (2)
WRITE(*,*)DELNMR,TOWNMR END
./ ./
More than one DIMENSION statements are valid (ARRAY009.FOR) . More than one array can be defined in a single dimension statement (ARRA Y009.FOR)
35
100
COMPUTER ApPLICATIONS IN CHEMISTRY
2.5 Dimension
* ARRAY009.FOR
* *
DIMENSION NUIR(100) DIMENSION DELNMR(20) , TOWNMR (25) DIMENSION TRCON(9) ,WEIGHT(12) END ./
./ ./
The numerical value of an element of the array can be passed forth and back into subprograms. DIMENSION WAVEL(20), CONC( I 0) CALL SUBl(WAVEL(6), C) The subscript used to refer the element of array is an integer constant, variable or arithmetic expression . Real constant/variable/expression is valid to refer the element of an array. If it is a real value the numerical value is truncated to the nearest integer (ARRA YOIIO.FOR).
* * *
ARRAY0110.FOR
10
DIMENSION VOL (50) DO 10 I = 1,10 VOL (I) = I CONTINUE X
1. 6
V1 VOL(2*INT(X)) V2 VOL(INT(4.6)) V3 VOL (INT (X) ) WRITE(*,*)V1,V2,V3 END ./
The subscript can be an integer variable provided its value is declared in PARAMETER statement. PARAMETER (NP = 10) DIMENSION EMF(NP)
Don'ts of DIMENSION Size of the dummy argument in subprogram should not be greater than the size of the argument in mainline. DIMENSION CONCCIO), ABSORB(IO) CALL BEER(CONC, ABSORB) END SUBROUTINE BEER (X,Y) DIMENSION CONC(20), ABSORB(20) RETURN END ® An array name should not be declared more than once. DIMENSION DELNMR(20), DELNMR(IO)
®
2.5 Dimension
101
FORTRAN STATEMENTS
® Type conflict in array declaration is invalid (DIMC1Ol.FOR).
* * *
DIMl 101.FOR DIMENSION NUIR(20) REAL NUI R ( 2 0 ) INTEGER WAVEL(36) DIMENSION WAVEL(36) NUIR(16) 2000 WAVEL (2) = )70 END
®
1 * 2 * 3 *
DIMl 101. FOR
DIMENSION NUIR(20) REAL NUIR(20) 5 ***** Error 30 - array already dimensioned INTEGER WAVEL(36) 6 DIMENSION WAVEL(36) 7 ***** Error 30 - array already dimensioned NUIR(16) 8 2000 WAVEL(2) = 370 9 END 10 4
Value of the subscript outside the range of the maximum value specified in DIMENSION statement is invalid.
* * *
DIMl 102.FOR DIMENSION VOL(12) VOL(O) VOL (-5) WRITE(*,*) WRITE(*,*) WRITE(*,*) END
VOL (10) 12 0 -5 VOL(12) VOL(O) VOL(-5)
Comment: The values 12,0 and -5 are outside the range 1 to 10. ® The subscript should not be without PARAMETER statement.
* *
DIMl 103.FOR
* DIMENSION EMF(NP) END
1
*
2 3 4
* *
DIMl 103.FOR
DIMENSION EMF (NP) ***** Error 66 - adjustable size declarations only for dummy arrays ***** Error 68 - adjustable bound must be parameter or in COMMON 5 END
Comment: The value of NP is not known at the time of analyzing DIMENSION statement.
102 ®
COMPUTER ApPLICATIONS IN CHEMISTRY
2.5 Dimension
The number of subscripts in the array declaration and array reference should not be different ARRA YlOl.FOR) 1 * 2 * 3 *
* ARRAY101.FOR
* *
ARRAY101.FOR
4 DIMENSION pH(10) 5 WRITE(*,*) pH(1,4) ***** Error 56 - too many' subscripts 6 END
DIMENSION pH(10) WRITE(*,*) pH(1,4) END
Comment: pH is referred as a two-dimensional array element in WRITE statement while it is declared as one-dimensional. In ARRAY I 02.FOR, volume is not declared but is used as a dimensional variable.
* * *
ARRAY102.FOR VO = 50.0 V = VO+VOL(J) WRITE(*,*)V END
Name
Type
MAIN VOL
REAL
Pass One
Size
Class PROGRAM FUNCTION
No Errors Detected 7 Source Lines
Microsoft 8086 Object Linker Version 3.02 (C) Copyright Microsoft Corp 1983, 1984, 1985 Unresolved externals: VOL in file(s):
Comment : Computer interprets YOL(J) as a function subprogram (YOL) with the input/output argument, J. So there is no compilation error. However, at the linking stage an error message "Unresolved externals" is displayed.
2.6 Subprogram
103
FORTRAN STATEMENTS
Two Dimensional Arrays The two-dimensional array, known as matrix, contains rows and columns. The rows of methyl orange data(absorbance at different pH's) matrix are the spectra at different pH's. The columns refer to absorbances at different wavelengths. The elusion profile obtained from HPLC is a vector of size NTIME x 1. If the UV -vis spectrum is taken at a ach elusion time instead of the absorbance at a single wavelength, it produces matrix. In chemical kinetics, the absorbance at a single wavelength (absorbance maximum) is monitored as a function of time to estimate the rate constant of a reaction. If full spectrum is recorded with time, the data are called kinetic spectrum of size nt X nl. These examples clearly establish the need of a two dimensional array. Many statistical computations on bivariate (X and Y) data involve arithmetic matrix operations.
Higher Dimensional Arrays Recent chemical literature shows that three-dimensional data arrays (tensors) are obtained from hyphenated instruments. The fluorescence at different excitation and emission wavelengths as a function of time is a third order tensor of size NEM x NEX x NT. Table 2.5.2 describes the data of different orders obtained from chemical instruments. Table 2.5.2 : Order of Data Obtained from Different Instruments
Number of Subscripts
Variable Chemical
FORTRAN Variable
Absorbance of a no of samples
1
ABSORB (NP)
NIR spectrum of a no of samples
2
ABNIR(NP,NNU)
Spectro chromatograms
2
RESP(NT ,NLANDA)
Excitation Emission Fluorescence spectrum
3
FLURES(NEX,NEM,NT)
Stability constants of ligands, metals, solvents at different Temperatures
4
BETA(NL,NM,NSOL,NT)
Kinetic spectrum
2
ABSOR(NTIME,NLANDA)
12.6 SUBPROGRAM • Intrinsic/Library Functions Statistical/mathematical/chemical computations repeatedly use several procedures. Determination of absolute value of a scalar to solving a set of differential equations is necessary at some stage of implementing an algorithm. Calculation of numerical values of hyperbolic, arithmetic and ordinary trigonometric functions, logarithm and antilogarithm require numerical methods of analysis. Due to extensive use, they are included as intrinsic functions in the computer language iibrary (for example, MATH.LIB and ALTMATH.LIB in FORTRAN) as a substitute to lookup tables. The results of chemical experiments are validated through statistical analysis. Algorithms of the statistical parameters and mathematical procedures are available as subprograms in packages like NUMERICAL RECIPES, Numerical Algorithm Group, Subroutine Package for Social Science and Quantum Chemistry Program Exchange. Estimation of rate/equilibrium constants, curve resolution of overlapping spectra! chromatograms or prediction of concentrations of non-interacting multi-component compounds are complex tasks. They demand a set of user written subprograms.
104
COMPUTER ApPLICATIONS IN CHEMISTRY
2.6 Subprogram
Function Subprogram Function subprogram is an executable statement. It is edited and saved like the main program. It has the advantage of avoiding the repetition of source code for similar calculations. The user is assured of the results of function subprogram and therefore can pay attention to the logic of main program. A function subprogram is referred or called in an expression. It exists as one of the variables on the RHS of assignment or replacement statement. The resulting value now becomes the operand of the expression. The. name of the function should be assigned with a value at least once in a function subprogram. The preceding statement to RETURN must be an assignment statement with function name as variable on the LHS. The input/output statements and calling SUMX1.FOR are collected in a driver program (SUM1.DEM) known as the mainline.
* * *
10
SUMX1.FOR FUNCTION SUMX1(N,X,MAX) DIMENSION X(MAX) SUMX1 '" O. IF(N .GT.MAX)PAUSE 'X DIMENSION EXCEEDS MAXIMUM VALUE' DO 10 I '" 1,N SUMX1 + X(I) SUMX1 CONTINUE RETURN END
*
*
WX1.FOR
*
952
* * *
SUBROUTINE WX1(NP,X,MAX) DIMENSION X(MAX) WRITE(~,952) (I,X(I) ,I '" 1,NP) FORMAT(100(' X(',I2, ')",',Gll.4, '; X(',I2, ')",',G11.4, * '; X ( , , 12, ' ) '" ' ,G 11 . 4, '; X ( , , 12', '.) '" ' , G11 . 4/) ) RETURN END
SUMX1.DEH
DIMENSION Y (3) DATA Y/1.5,2.6,3.8/ NP '" 3 CALL WX1(NP,Y,3) SUM", SUMX1(NP,Y,3) WRITE(*,*}SUM END $INCLUDE: 'SUMX1.FOR' $INCLUDE : 'WX1.FOR'
2.6 Subprogram
105
FORTRAN STATEMENTS
Working details of FUNCTION Subprogram Execution of main program starts at the first executable statement in the top down manner. When a function subprogram is encountered in an assignment or replacement statement • • •
The program counter remembers the current statement. The control is transferred to the function subprogram. The values of input arguments in the mainline now become the values for the dummy arguments for function subprogram.
• •
The function subprogram is executed up to RETURN. The value of the function name is its current value.
• The control is transferred back to the same statement where the mainline execution is halted. • The execution of mainline resumes. • The value of the function name is now used in the assignment or replacement statement. • The execution of main program continues till STOP statement. These steps are pictorially shown in Fig. 2.6.1. Thus the function subprogram directly processes the arguments in the mainline through the dummy ones present in the function. The consequences. of the function name in function subprograms and mainline in an expert system mode follow. SUMXI.DEM
SUM ~
SUMXI (N,X,MAX,)
SUMX1(NP,Y'~ RETURN
END Fig. 2.6.1: Execution Profile in a Function Subroutine (1) IF
Then
There is no TYPE in the function statements TYPE of the result is same as that of the variable used as function name (ISUM1.DEM).
* * *
ISUM1.DEM DIMENSION X(500) MAX 500 X(l) 1.5 X(2) X (3)
* * *
3.8 2.6
NP 3 CALL WX1(NP,X,3) IZ = ISUM1 (NP, X:) WRITE(*,*)IZ END $INCLUDE: 'ISUM1.FOR' $INCLUDE : 'WX1.FOR'
10
ISUM1. FOR FUNCTION ISUM1(NP,X) PARAMETER (MAX = 500) DIMENSION X (MAX) ISUM1 = O. DO 10 I = 1,NP ISUM1 ISUM1 + X(I) CONTINUE RETURN END
106
2.6 Subprogram
COMPUTER ApPLICATIONS IN CHEMISTRY
Comment: ISUM is integer variable by default and there is no TYPE in function statement. Hence the resulting ISUM is integer. (2) If TYPE of expression in main line is different from the TYPE of function name Then Value is transformed according to the rules of assignment statement (ISUM2.FOR).
* * *
* *
ISUM2.DEM
ISUM2.FOR
*
DIMENSION X(500)
FUNCTION ISUM2(NP,X)
MAX = 500 X(l) 1.5 X(2) 3.8 2.6 X(3)
PARAMETER (MAX=500) DIMENSION X (MAX) SUM2 = O.
NP 3 CALL WX1(NP,X,3) 10
SUM = ISUM2(NP,X) WRITE(*,*)SUM
DO 10 I = l,NP SUfv12 SUM2 + X (I) CONTINUE ISUM2 RETURN END
END $INCLUDE: 'ISUM2.FOR' $INCLUDE : 'WX1.FOR'
SUM2
Comment: ISUM2 calculated in the function is in the integer mode while SUM in the mainline is a real variable. So, the value ofISUM2 is converted to real while assigning the value to SUM. (3) If There is TYPE in the function statement Then Default conversion of variable name is overridden (ISUM3.DEM)
* * *
ISUM3.DEM
DIMENSION X(500) REAL ISUM3 MAX 500 X(l) 1.5 X(2) 3.8 X(3) 2.6 NP = 3 CALL WX1 (NP, X, 3) = ISUM3 (NP, X) SUM WRITE(*,*)SUM END $INCLUDE: 'ISUM3.FOR' $INCLUDE : 'WX1.FOR'
* * *
10
ISUM3.FOR REAL FUNCTION ISUM3(NP,X) PARAMETER (MAX=500) DIMENSION X(MAX) ISUM3 O. DO 10 I l,NP ISUM3 ISUM3 + X (I) CONTINUE RETURN END
2.6 Subprogram
FORTRAN STATEMENTS
107
Comment: TSUM3 is integer by default, but real function TSUM3 overrides the default convention and TSUM3 i~ a real variable. (4) The implicit statement in calling program does not alter the type of intrinsic function.
* * *
951
SIN103.FOR IMPLICIT REAL*B (A-Z) PI = 22 .DOn .DO DEG = 30.DO RAD = PI/1BO.ODO * DEG Xl = SIN(RAD) X2 = DSIN(RAD) WRITE(*,951)DEG,Xl,X2 FORMAT (lX, 'DEG ='FB.2,5X, 'SIN(DEG) --, , F15.10/5X, 'DSIN(DEG) =' ,F25.20) * END
DEG = 30.00 SIN(DEG) = .5001825022 DSTN(DEG) = .500 I 8250 2199 6698 0000
Comment: All variables are double precision real ones. SIN(RAD) and DSIN(RAD) are calculated in single and double precision, respectively. The numerical values of Xl and X2 differ after eight digits. Do's of FUNCTION Subprogram ...j A variable with function subprogram name can occur any number of times in assignment/replacement statements in the domain of function subprogram. ...j A function subprogram may not have even a single argument (LF103.FOR).
* * *
LF103.FOR LOGICAL X,ERROR X = ERROR() WRITE(*,*)X END
951
LOGICAL FUNCTION ERROR() WRITE(*,951) FORMAT(' Error in program') ERROR = .TRUE. RETURN END
108
COMPUTER ApPLICATIONS IN CHEMISTRY
2.6 Subprogram
.J An argument of different precisions in main line and function subprogram can be used after suitable conversion in the function (ROOTI.FOR). * * *
ROOT1.FOR Xl 1. x2 -lE-9 X3 -1.E-14 X = ROOT1(Xl,X2,X3) WRITE(*,*)X PH = -LOG10(X) WRITE(*,*)PH END
* * * FUNCTION ROOT1(Al,Bl,Cl) DOUBLE PRECISION X,A,B,C A DBLE (Al) B DBLE(Bl) C DBLE (el) X (-B + DSQRT(B*B - 4.0 * A* C))/(2.0* A) ROOTl = SNGL(X) RETURN END
Output of ROOT1.FOR
1.0050l2E-007 6.9978280 The actual arguments in mainline are Xl, X2, X3 and the corresponding dummy ones in ROOT! are AI, B I and Cl. Both sets have three elements and are single precision real variables (Fig. 2.6.2). Since the calculations of the roots of the quadratic equation are contemplated in double precision, the arguments AI, B 1, CI are converted to double precision through the intrinsic function DBLE. One root of the equation (X) is calculated in double precision. As ROOT 1 is in single precision, X is converted to single precision using intrinsic function SNGL.
.J .J
More than one return statement is valid . A function subprogram passes back the values of variables through arguments
(SUMX4.FOR).
2.6 Subprogram
109
FORTRAN STATEMENTS
Mainline XI
X2
X3
DDD 16
19
20
23
27
24
Function Subroutine Al
Bl
CI
DDD
o
3
4
7
8
11
Fig. 2.6.2 : Correspondence of Variables of Mainline and Function Subroutine
* *
951
SUMX4.FOR DIMENSION X(lOO) NP = 4 ERR = 0 SUM = SUMX4(NP,X,ERR) WRITE(*,951)ERR FORMAT(' ERROR: ',F5.0) END
* * *
10
FUNCTION SUMX4(NP,X,ERR) DIMENSION X(2) IF(NP .GT. 2)THEN ERR = 1 RETURN ELSE SUMXl = O. l,N DO 10 I SUMXl + X(I) SUMXl CONTINUE ENDIF RETURN END
" The variable names and statement numbers in a subprogram may be same as those used in the mainline.
110
COMPUTER ApPLICATIONS IN CHEMISTRY
2.6 Subprogram
Don'ts of FUNCTION Subprogram ®
The number of arguments in the main line and subprogram should not be different.
* *
FSUB301.FOR
* DIMENSION X(100) X (1)
SUM
= 5. = SUMX1(X)
END $INCLUDE:
®
'SUMX1.FOR'
There should not be any TYPE or precision conflict between the actual arguments in the main line and those in the subprogram.
* * *
FSUB302.FOR DIMENSION X (100) X (1)
=
5.
SUM = SUMX3(1,X) WRITE(*,*)SUM END
* SUMX3.FOR
*
* INTEGER FUNCTION SUMX3(NP,IX) PARAMETER (MAX = 100) DIMENSION IX (MAX) SUM
= O.
DO 10 I SUM 10
= 1,NP = SUM +
CONTINUE SUMX3 RETURN END
SUM
IX(I)
2.6 Subprogram
®
FORTRAN STATEMENTS
The number of 4imensions and size of the array in the main program and that in the subprogram should not be different.
* FSUB201.FOR
*
* DIMENSION X(100) NP = 2 X (1)
1
X (2)
3
SUMY END
SUMX(NP,X)
* FUNCTION·SUMX(NP,X) PARAMETER (NP = 20) DIMENSION X(NP) RETURN END ®
111
The program name should not conflict with any local name in the calling program.
*
*
FSUB202.FOR
* LOGICAL X,ERROR X = ERROR WRITE(*,*)X END
* * 951
LOGICAL FUNCTION ERROR WRITE (*,951) FORMAT(' Error in program') ERROR = .TRUE. RETURN END
® A function should not call itself.
* * *
FSUB203.FOR FUNCTION SUMX(NP,X) DIMENSION X(NP) N1 = 1 N2 = N1 + 1 SUM = SUMX(N1,N2,X) RETURN END
.
112
COMPUTER ApPLICATIONS IN CHEMISTRY
2.6 Subprogram
® The dummy argument should not be in the COMMON statement.
* * *
FSUB204.FOR DIMENSION X(lOO) COMMON NP NP'=,2 X(l) 1 X(2) 3 SUMY SUMX(NP,X) END
* *
* FUNCTION SUMX(NP,X) COMMON NP DIMENSION X(NP) RETU~N
END Comment: NP is both in COMMON and as dummy argument.
Subroutine Subprogram If the number of output arguments is more than one, a subroutine subprogram is used instead of function subprogram. For example, the least squares analysis of bilinear data outputs slope, intercept and correlation coefficient. Nowadays the preference is to develop subprograms of 50 to 100 source lines to increase clarity and to follow the implementation of algorithm step by step. In spite of the availability of several subroutine packages, there is always a need to develop subprograms to solve specific tasks. A suit of programs for chemical tasks in chemical kinetics, equilibrium chemistry, quantitation etc. using mathematical and statistical procedures are given from Chapter 3 onwards.
Working Details of Subroutine Subprogram The mainline program is executed from top to the statement at which the subroutine is called (Fig. 2.6.3). •
The current statement is remembered by storing the program counter.
• • •
Execution of mainline is halted and the control is transferred to the subroutine. The arguments for which the values are known are the input arguments. Subroutine subprogram is executed up to return statement. The values of output arguments are available through the corresponding variables in the mainline.
•
The control is transferred to the main line.
•
Execution of the main line resumes at the succeeding statement to the call statement.
•
The mainline continues till stop statement.
2.7 Data
113
FORTRAN STATEMENTS
Main Line
Subroutine LXYl(N,X,Y,MAX)
Call LXYl(N,X,Y,MAX)
----RETURN ~ Subroutine
LLSI (X,Y,N,SLOPE,CEPT,CC,LLS)
Call LLSl(X,Y,N,SLOPE,CEPT,CC,LLS) ~
~RETURN STOP Fig. 2.6.3 : Program Execution with Subroutines
Features of Subprogram The actual arguments in the main program are constants, variable, arithmetic expressions, array elements or intrinsic functions. The values of actual arguments are not passed to the dummy arguments in the subroutine. The information about the location of memory allocations of actual arguments is transferred to the subroutine. Thus only addresses of memory locations are available in the subprogram. The values for the corresponding dummy arguments are then fetched. The arguments in the call statements and those in the subprogram should match in number, order, TYPE and precision. The normal mode of exit from the subprogram is executing the return statement. A stop statement in subprogram not only stops the execution of subprogram but it terminates the mainline. If subprograms are developed with adjustable array dimension, they are less prone to run time errors. Then the actual size of the array is an argument.
2.7 DATA
I
DATA Statement Data is the first executable statement in a program. It (DATAOOl.FOR) is an abridged form of several assignment statements (ASSIGN006.FOR).
* * *
DATA001.FOR DATA SUMX,SUMY,SUMXY,SUMXXj4*O.Oj WRITE(*,*)SUMX,SUMY,SUMXY,SUMXX END
114
2.7 Data
COMPUTER ApPLICATIONS IN CHEMISTRY
* * *
ASIGN006.FOR SUMX = 0 SUMY = 0 SUMXY = O· SUMXX = 0 WRITE(*,*)SUMX,SUMY,SUMXY,SUMXX END
Data statement is also used to develop unit, identity and zero (DATA002.FOR) matrices.
* * *
DATA002.FOR
S
DIMENSION UMAT(5,5) DATA UMAT/25*O.O/ WRITE(*,5) ((UMAT(I,J) ,J. =1,5) ,1=1,5) FORMAT(lHO,15X,20F5.1) STOP END
1.0000000
.0000000
.0000000
.0000000
The data statement is convenient for variables whose values are unchanged throughout the program. Such variables include logical record numbers of sequential/random access files and chemical/physical constants (DATA003.FOR and DATA004.FOR).
* DATA003.FOR
* *
REAL N DATA N,C/6.03E23,l.0E10/ WRITE(*,*)N,C .END 6.030000E+023
1.000000E+0 10
* D~TA004.FOR
* *
REAL MWOX Data MWOX/126.06/,NIH/2/ DATA PKW,DH20/13.987,77.2/ WRITE(*,*)MWOX,NIH,PKW,DH20 END 126.0600000
2
13.9870000
77.2000000
2.7 Data
115
FORTRAN STATEMENTS
Do's of DATA ./ If more than one variable have the same value, they can be declared in a DATA statement. ./
If all the variables except one or a few have the same magnitude, the entire set is initiated first. Then those variables which have different values are redefined (DATA005.FOR). Instead of two data statements they can be clubbed together (DATA006.FOR).
* DATAOOS.FOR
* *
DIMENSION VOL(5) DATA VOL/5*3.0/ DATA VOL(4),VOL(3)/2.9,3.1/ WRITE(*,*)VOL END
*
* *
DATA006.FOR DIMENSION VOL(S) • DATA VOL/S*3.0/, VOL(4),VOL(3)/2.9,3.1/ WRITE(*,*)VOL 3.0000000 3.0000000
3.0000000
3.1000000
2.9000000
./ Variables of even different mode can be declared in a data statement.
* DATA007.FOR
*
* DATA SUMX/49.S/,NP/10/ AVE = SUMX/NP WRITE (*,117) SUMX ,AVE,NP FORMAT(lHO,lOX,F10.5,F10.2,I3) END
117
./
49.50000
4.95 10
Hexadecimal values can be assigned to integer variables. They are specified by the letter Z followed by one to four hexa decimal digits.
* *
DATA008.FOR
* 162
DATA lA, IN,I6/Z0016,ZABB, ZOOlA/ WRITE(*,162)IA,IN, IC FORMAT(//' IA = ZOQ16 =',15//' IN * 'IC = zOOlA =',IS) END
ZABB
, ,15/ / ' ,
116
COMPUTER ApPLICATIONS IN CHEMISTRY
2.7 Data
Numbers Hexa decimal: 0 1 2 3 4 5 6 7 8 9 ABC D E F Decimal: 0 1 23456789 10 11 12 13 1415
Don'ts of DATA Statement ® Dummy arguments are not allowed in data statement
* * *
DATA10l.FOR DATA TNP, AVE I 6.0,O.2/,TOT/43.21 SUMX == TNP * AVE END
1 *
2 * 3 * 4
DATA10l.FOR
DATA TNP, AVE! 6.0,O.2/,TOT/43.21 ***** Error 77 - constant expected ***** Error 38 - "I" expected ***** Error 38 - "I" expected 5
SUMX
6
END
TNP * AVE
The variable TOT is not used anywhere in the program except in data statement. So it is a dummy variable
® The number of variables and constants should not be different.
* DATA102.FOR
* *
DATA ABSOR, EPSI,CONC 10.567,6376.01 END
1 * 2 * 3 * 4
DATA102.FOR DATA ABSOR, EPSI,CONC 10.567,6376.0/
***** Error 79 - number of variables does not match 5
END
2.7 Data
FORTRAN STATEMENTS
117
Comment: There are three variables and two constants. ® The mismatch of the type of variables and their corresponding constants is invalid.
*
*
DATA103.FOR
*
*
*
DATA ABSOR/1/ DATA NIP!2. 0/ WRITE{*,*) NIP,ASORB END The variable name and specified constant do not agree in mode.
1 2 * 3 * DATA103.FOR 4 * 5 DATA ABSOR/l/ 6 DATA NIP/2.0/ ***** Error 833 - cannot convert constant 7 WRITE(*,*) NIP,ASORB 8 END
Comment: ABSORB is real variable and 1 is integer constant; NIP is integer variable and 2.0 is a real constant. ® Data statement cannot be succeeded by a dimension or type specification.
* * *
* *
DATAI04.FOR DATA VOL/5*50./,PH/3*2.0,3.,2*4.00/ DIMENSION VOL(5) ,PH(6) WRITE(*,*)VOL(2) ,PH(2) END DATA statement cannot be preceded by a dimension or type specification statement.
1 * DATA104.FOR 2 * 3 * DATA VOL/5*50./,PH/3*2.0,3.,2*4.00/ 4 ***** Error 79 - number of variables does not match ***** Error 79 - number of variables does not match DIMENSION VOLtS) ,PH(6) 5 ***** Error 100 - statement ord'er WRITE(*,*)VOL(2),PH(2) 6 END 7
118
COMPUTER ApPLICATIONS IN CHEMISTRY
2.8 Stop, Parameter
® Variables appearing in COMMON cannot be initiated by Data Statement.
* DATA105.FOR
* *
*. *
COMMON VO DATA VO/50.0/ TV =VO END Variables appearing in common cannot be initiated by Data statement.
12.8 STOP, PARAMETER ~ 2.8.1 STOP It is an executable statement but optional. It can appear anywhere in the program. The syntax of the
statement is STOP A FORTRAN program runs and exits normally to the operating system even without a STOP statement. It is used to terminate the execution and thus represents the logical end of the program.
Do's of STOP Statement v' Stop statement can be optionally followed by a five digit number, or a character constant or string.
* * * 101 203
v' If
STOP001.FOR STOP 989 STOP 'ABC' STOP 12345 END
Then
Argument is present in STOP statement It is displayed on the screen when the program terminates
Else
Message 'STOP - Program terminated' is displayed.
STOP002.FOR is a complete FORTRAN program but does nothing.
* * * 101
STOP002.FOR STOP 'ERROR IN INPUT' END
2.8 Stop, Parameter
119
FORTRAN STATEMENTS
./ More than one STOP statements in a program are permissible (STOPO03.FOR).
* *
STOP003.FOR
* STOP 111 STOP 99999 STOP
'ABC'
END
./
A STOP statement in a subroutine/function subprogram terminates the job. It is generally used to abort the run when a fatal error is encountered.
Don'ts of STOP Statement ®
More than five digits in the number following STOP is invalid. *
STOP1010 FOR
* *
2
STOP 'ABC'
145
STOP 9999999
345
STOP 'DSJSDJFSJFSJ99999,
23
STOP 'JLLHL
56
STOP, 367 END
1 * 2 *
STOP101.FOR
3 * 4 2 5 145
STOP 'ABC' STOP 9999999
***** Error 13 - too many digits in constant
6 345 7 23
STOP 'DSJSDJFSJFSJ99999, STOP 'JLLHL
***** Error 15 - character constant not closed
STOP, 367 8 56 constant expected 77 Error ***** END 9
/
120
2.X Stop. Parameter
COMPUTER ApPLICATIONS IN CHEMISTRY
® The Quotes for the character constant or string is mandatory.
* * *
1 * 2 * 3 *
STOPI02.FOR
101
STOP102.FOR
CHARACTER*10 CH CH = 'ABC' STOP CH ***** Error 77 - constant expected 7 101 STOP 'EBROR ~N INPUT' END 8 4
CHARACTER*10 CH CH = 'ABC' S'T'OP CH STOP 'ERROR IN INPUT' END
5 6·
Type
Name CH
Offset P Class
CHAR*10
16
STOP should not be used after return in a subprogram.
®
* STOPI03.FOR
* * 341
RETURN STOP END
1* 2* 3* 4 ***** Error 127 5341 6
STOP103.FOR RETURN RETURN not allowed here STOP STOP
Explanation: RETURN statement passes control to the calling program and so the next statement cannot be executed.
RETURN It is an executable statement, used in subroutine/function subprogram.
This statement instructs the computer to go back to tbe program that invokes or calls it. Thus the normal way of terminating the processing of subprogram is through RETURN.
Do's of RETURN Statement ./
More than one RETURN statement is valid.
Don'ts of RETURN Statement ./
® RETURN is not permitted in mainline.
* *
*
RETI01.FOR T = 273.16 WRITE(*,*)T RETURN END
2.8 Srop, Parameter
FORTRAN STATEMENTS
121
END It is a non-executable statement and occurs only once in a mainline, subroutine or function subprogram. It is thus the last physical statement of every program but not the physical end of the job. The syntax is END It indicates the compiler that there are no more FORTRAN statements for translation into machine code. There is no op code generated for END, as it is a non-executable FORTRAN statement.
Do's of END St:rtement ./ The shortest program ever possible is ENDOO1.FOR.
* *
END001.for
* END
Don'ts of END Statement ® Continuation lines for the END statements are invalid
* * *
ENDI01.FOR VOL = 50 END * 'END OF PROGRAM'
1
*
2 *
ENDI01.FOR
3 *
4 VOL = 50 5 END 'END OF PROGRAM' ***** Error 23 - extra characters at end of statement ® More than one END statement results in ignoring all succeeding statements after the first END.
* * *
ENDI02.FOR PH = 5.5 END FH = 10.**(-PH) END
122
COMPUTER ApPLICATIONS IN CHEMISTRY
2.8 Stop, Parameter.
1 *
2 *
END102.FOR
3 *
PH = 5.5 END
4 5 Name
Type
Offset P Class
PH
REAL 6 ***** Error 34 ***** Error 70 7
-
16 FH = 10.**(-PH} identifier already declared more than one main program END
. ®
A END statement label is not permitted for END
* *
END103.FOR
* 101 END
1 * END103.FOR 2 * 3 * 4 101 END ***** Error 89 - unrecognizable statement 5
***** Error 91 - missing END statement
2.8.2 PARAMETER It is a non-executable statement and comes after IMPLICIT type declaration and before explicit type declaration. When there is no TYPEIDIMENSION statement in the program, it should be before the first executable statement. It declares the numerical/logical values of variables/ constants frequently used in the program. The general format is PARAMETER (NAME 1 = CONSTl [,NAME2 = CONST2][,NAME3 = CONST3]) The advantage of parameter statement is that whenever the size of the array is changed, only the values in the parameter statement are altered. It avoids editing errors. It is used to declare the size of arrays in the main program (PAROOl.FOR), to declare physical or chemical constants whose values cannot be calculated by simple number crunching (PAROO2.FOR), to declare polynomial coefficients of McClure series, to calculate transcendental, logarithms etc. and to specify tolerance, maximum number of iterations etc. in mathematical computations (PAR003.FOR).
2.8 Stop, Parameter
123
FORTRAN STATEMENTS
* * *
PAR001.FOR PARAMETER (MAX = 100) DIMENSION ABSORB (MAX), CONC (MAX) ABSORB(l) = 0.50 CONC(l) = 0.5E-03 WRITE (* , *) ABSORB(l), CONC (1) END
* PAR002.FOR
* *
PARAMETER (C = 2.99792458) PARAMETER (PLANK = 6.63E-34, AVGAD WRITE(*,*)C,PLANK,AVGAD,FARAD END
6.602E-23, FARAD
* PAR003.FOR
* *
PARAMETER(TOL = 1.0E-6,MAXIT WRITE(*,*)TOL,MAXIT END
100)
Do's of PARAMETER ./
More than one constant (PAR002.FOR) can be declared in single parameter statement.
./ ./
More than one parameter statement (PAR002.FOR) can be used . Logical constants can be declared by a parameter statement (PAR004.FOR).
* PAR005.FOR
*
* LOGICAL QUAD,CUBIC PARAMETER(QUAD = .TRUE.,CUBIC WRITE(*,*)QUAD,CUBIC END
.FALSE. )
96500 )
124
COMPUTER ApPLICATIONS IN CHEMISTRY
Don'ts of PARAMETER ® Real Constant should hot be declared by parameter statement. *
PAR10l.FOR
* PARAMETER ( Al WRITE(*,*)Al END
1 * 2 3
-10.0)
PAR10l.FOR
*
PARAMETER ( Al = -10.0) ***** Error 22 - integer constant expected WRITE(*,*)Al 4 END 5
* * *
PAR102.FOR INTEGER*4 NP PARAMETER(NP = lE3) DIMENSION X(NP) X(l) = 112 END
1 * 2 * 3 * 4
PAR102.FOR
INTEGER*4 NP PARAMETER (NP = lE3) ***** Error 833 - cannot;: convert constant DIMENSION X(NP) 6 ***** Error 22 - integer constant expected expected ***** Error 26 X(l) = 112 7 END 8 5
")
II
2.8 Stop, Parameter
2.8 Stop, Parameter
®
FORTRAN STATEMENTS
Arithmetic expression is not allowed in parameter statement.
* *
PARI03.FOR
* PARAMETER(NEXP = 5,NP1=50) PARAMETER(NP = NEXP*NP1) WRITE(*,*)NEXP,NP1,NP END
1 * PAR202.FOR 2 * 3 * PARAMETER (NEXP = 5,.NP1=50) 4 PARAMETER(NP = NEXP*NP1) 5 ***** Error 89 - unrecognizable statement WRITE(*,*)NEXP,NP1,NP 6 7 END
125
13.1 MATRIX OPERATIONS' Matrix algebra provides powerful tools to implement many mathematical and statistical procedures whose results are of chemical significance. The basic subroutines required to implcmcnt matrix operations are described here.
Length of a Vector Euclidean norm (II . II) or the length of a vector is equal to the square root of the sum of squares of the elements of the vector. It is calculated as the square root of the product of a vector and its transpose. . V
=
[VI V2 V3 .............. ]T
IIv II
=
~VT * V
A point in 2D space is represented by [VI,V2] and the length of the vector is equal to
IIvW
=
[VI V2]
* [~~J =
V12+ V2
2
;
IIvll
= ~VI2+ V2 2
Calculation of Euclidean norm is implemented in NORM.FOR
* NORM. FOR
*
NORM.DEM
*
* REAL FUNCTION NORM(V,N) DIMENSION V (20) ZNORM = O.
00 10
* *
10 I = 1,N
ZNORM = ZNORM + V(I) * V(I) CONTINUE NORM = SQRT(ZNORM) END
REAL NORM DIMENSION V(20) V(l) 1. V(2) = 2. V(3) = 3. ZNORM = NORM(V,3) WRITE(*,*)ZNORM END $INCLUDE : 'NORM. FOR' .
3.1 Matrix Operations
127
SOFTWARE METHOD BASE
Dot Product of Two :Vectors The dot product of two column vectors VI and V2 (of size NxI) is a scalar and'is equal to the sum of products of corresponding elements of V I and V2 (DPV 1V2.FOR)
Y
VI
=
[VII V2I V3I V4I ........... VNI
V2
=
[VI2 V22 V32 V42 ........... VN2]T
DPVIV2
=
Vl.V2
= VI T * V2 =
N
LVIi
* V2i
i= 1
* * *
* *
DPVIV2.DEM
DPVIV2.FOR
* DIMENSION Vl(20) ,V2(20) Vl(l)
1.
Vl(2)
2.
Vl(3)
3.
V2(1)
2
V2(2)
4
V2(3)
6
10
SUBROUTINE DPVIV2(Vl,V2,V3,N) PARAMETER (MAX =20) DIMENSION VI (MAX) ,V2(MAX) V3 = O. DO 10 I = l,N V3 = V3 + Vl(I) * V2(I) CONTINUE END
CALL DPVIV2(Vl,V2,V3,3) WRITE(*,*)V3 END $INCLUDE :
'DPVIV2.FOR'
Cross Product of Two Vectors The cross product (V3) of two vectors VI and V2 is
* V2(3) - VI(3) * V2(2)] VI(3) * V2(I) - VI(!) * V2(3) VI(l) * V2(2) - VI(2) * V2(l) VI(2)
V3
=
[
The vector cross product in higher dimensional space is complicated but can be calculated. The geometric interpretation (Fig. 3.1) of the dot and cross product of vectors can be understood from the relationships Vi.V2
=
IIVil1
*
1IV211
* Cos (J
Vi®V2
=
IIVill
*
IIV211
* Sin (J*Z"
1\
Z is a unit vector of length 1 and is perpendicular to the plane containing two vectors.
128
COMPUTER ApPLICATIONS IN CHEMISTRY
* CPV1V2.DEM
* *
DIM~NSION
Vi (1)
V1(20) ,V2(20),V3(20)
1.
vl (2)
2. Vl(3) 3. V2(1) 2 V2(2) 4 V2(3) = 6 CALL CPV1V2(V1,V2,V3) WRITE(*,*) (V3(I),I 1,3) END $INCLUDE : 'CPV1V2.FOR'
* * *
CPV1V2.FOR SUBROUTI~E
PARAMETER DIMENSION V3(1) V3(2) V3(3) RETURN END
CPV1V2(V1,V2,V3) (MAX=20) Vl(MAX),V2(MAX) ,V3(MAX) V1(2)*V2(3) - Vl(3)*V2(2) V1(3)*V2(1) - Vl(1)*V2(3) V1(1)*V2(2) - Vl(2)*V2(1)
VI
1\
,"
,,'
Z
Fig. 3.1 : Geometric Representation of Two Vectors
3.1 Matrix Operations
3.1 Matrix Operations
SOFTWARE METHOD BASE
129
Angle Between Two Vectors The angle (8) between two vectors VI and V2 is related to the dot product and lengths of the vectors by the formulae. VI * V2 Cos () =
* 1IV211
IlvI11
Cos ()
=
II Vi 112 + IIV211 2 -IIVi - v211 2 2
* IIv/1I * IIV211
* *
*
10
* * *
ANGV1V2.FOR SUBROUTINE ANGV1V2(V1,V2,N,MAX) REAL NORM DIMENSION V1(MAX),V2(MAX) DIMENSION V1MV2(100) V1NORM = NORM(V1,N) V2NORM = NORM(V2,N) DO 10 I = 1, N V1MV2(I) = V1(I) -V2(I) CONTINUE V1V2NORM NORM(V1MV2,N) COSTH1 (V1NORM **2 +V2NORM **2 -V1V2NORM ** 2)/ * (2*V1NORM*V2NORM) THETA1 ACOS(COSTH1) CALL DPV1V2(V1,V2,V3,N) COSTH2 = V3/V1NORM/V2NORM THETA2 = ACOS(COSTH2) WRITE(*,*)THETA1,THETA2 WRITE(*,*)COSTH1,COSTH2 END
ANGV I V2.DEM
$DEBUG DIMENSION VI (20),V2(20),V3(20) VI(l) = 1. VI(2) = 2. VI(3) =3. V2(l) =2 V2(2) =4 V2(3) = 6 CALL ANGVIV2(VI,V2,3,THETA) END $INCLUDE : 'ANGV I V2.FOR' $INCLUDE : 'NORM.FOR' $INCLUDE : 'DPV I V2.FOR'
1.0000000 9.999999E-OOI .0000000 3.452670E-004
130
3. J Matrix Operations
COMPUTER ApPLICATIONS IN CHEMISTRY
Transpose of Matrix Transpose operation of A involves making the rows of A as the columns of AT. The transpose is represented as AT,
A, A'. An element of AT is equal to A(J,I) (MTRANS.FOR). 1.1
*
10 20
2.1 3.1
MTRANS.FOR
* *
SUBROUTINE MTRANS(A,ROWA,COLA,AT} INTEGER ROWA,COLA PARAMETER (MAX = 20) DIMENSION A(20,20},AT(20,20} DO 20 I = 1,ROWA DO 10 J = 1,COLA AT(J,I}= A(I,J} CONTINUE CONTINUE END
* *
1.1
1.2
1.1
2.1 3.1
1.2 2.2 3.2
2.1 2.2
3.1 3.2
1.2 2.2 3.2
MTRANS.DEM
* DIMENSION A(20,20},AT(20,20},ATT(20,20} A(l,l) 1.1 A(2,1} 2.1 A(1,2} 1.2 A(2,2} 2.2 A(3,1} 3.1 A(3,2} 3.2 M
3
N
2 CALL MPRIN(A,M,N}
CALL MTRANS(A,M,N,AT} CALL MPRIN(AT,N,M} CALL MTRANS(AT,N,M,ATT} CALL MPRIN(ATT,M,N} END $ INCLUDE
'MTRANS.FOR'
$ INCLUDE :
' HPRIN . FOR'
3.1 Matrix Operations
131
SOF1W ARE METHOD BASE
* *
MPRIN.FOR
* SUBROUTINE MPRIN(A,ROWA,COLA) CHARACTER* 20 FMT INTEGER ROWA,COLA DIMENSION A(20,20) FMT = '( IX, 8G8 .2) , WRITE ( * , 951 ) DO 30 I = 1,ROWA WRITE (*,FMT) (A(I,J) ,J =l,COLA) CONTINUE FORMAT (!) END
30 951
Transpose of a transposed matrix (AT) T is the original matrix. Thus (AT) T - (A) is a zero matrix of the same size. The sum of transpose of a skew symmetric matrix and the original one is a unit matrix. The transpose of the product of two matrices A and B is equal to the transpose of B post multiplied by transpose .
~~
(A
* Bl
= BT
* AT
If X is a rectangular matrix both XT *X and X*XT are square matrices and are called information matrix
and dispersion matrix, respectively. They are also called row wise or column wise covariance matrices and are useful in factor analysis. Nowadays, to avoid truncation errors, Singular Value Decomposition of X is used in least squares, orthogonal polynomials, eigen vector analysis etc.
Diagonal of Matrix The diagonal of a square matrix (A) is a vector of size equal to the dimension of ~
*
* *
10 951
DIAGX.FOR SUBROUTINE DIAGX(A,M,N,DIAG) PARAMETER (MAX = 20) DIMENSION A(MAX,MAX),DIAG(MAX) IF (M .NE. N)THEN WRITE(*,951) STOP ENDIF DO 10 I = 1,N DIAG(I) A(I,I) CONTINUE FORMAT(' It is not a square matrix'/lx, * , So, Diagonal is not possible'/) END
132
COMPUTER ApPLICATIONS IN CHEMISTRY
3.1 Matrix Operations
• * * *
1.1 2.1
DIAGX.DEM
DIMENSION A(20/20) /DIAG(20) A(l,l) 1.1 A(2/1) 2.1 A(1,2) 1.2 A(2,2) 2.2 A(3,1) 3.1 A(3/2) 3.2 CALL MPRIN(A,2/2) CALL DIAGX(A/2,2,DIAG) CALL MPRIN(DIAG,2/1) CALL MPRIN(A,3,2) CALL DIAGX(A,3/2/DIAG) END $ INCLUDE 'MPRIN.FOR' $INCLUDE : 'DIAGX.FOR'
1.2 2.2
1.1 2.2
1.1 2.1 3.1
1.2 2.2
3.2
It is not a square matrix So, Diagonal is not possible
Stop - Program terminated.
Trace of Matrix The trace of matrix is equal to the sum of diagonal elements.
*
* *
TRACE.DEM
DIMENSION A(20,20) A(l,l) 1.1 A(2/1) 2.1 A(1,2) 1.2 A(2/2) 2.2 A(3,1) 3.1 A(3/2) 3.2 CALL MPRIN(A,2,2) T TRACE(A,2/2) CALL MPRIN(T,l,l) CALL MPRIN(A,3/2) T = TRACE(A/3/2) END $ INCLUDE 'TRACE. FOR' 'MPRIN.FOR' $ INCLUDE 'DIAGX.FOR' $ INCLUDE 'SUM1.FOR' $ INCLUDE
* * *
1.1 2.1
TRACE. FOR FUNCTION TRACE(A,M/N) PARAMETER (MAX = 20) DIMENSION A (MAX, MAX) ,DIAG(MAX) CALL DIAGX(A,M/N/DIAG) TRACE = SUM1(DIAG,N,20) RETURN END 1.2 2.2
3.3 1.1 1.2 2.1 2.2 3.1 3.2 It is not a square matrix So, Diagonal is not possible
Stop - Program terminated.
3.1 Matrix Operations
133
SOFTWARE METHOD BASE
.*
SUM1.FOR
* *
FUNCTION SUM] (X,N,MAX) DIMENSION X (MAX) SUMl = O. DO 10 I = 1,N SUMl = SUMl + X(I) CONTINUE RETURN END
10
Determinant of Matrix The detenninant of a square matrix is useful to test the presence of linear dependence of rows or columns. A matrix with zero determinant is called singular and it has no inverse. However, in many real data sets, the determinant is not zero but is a very small quantity rendering it to be nearly singular. For such matrices, inverse is calculable but unreliable. The regression coefficients calculated from this inverse matrix are in high error.
* * *
DET.DEM DIMENSION A(20,20) A(l,l) 1 A(2,1) 0 A (1,2) 0 A(2,2) 1 A(3,1) 0 A(3, 2) 0 A(1,3) 0 A(2, 3) 0 A (3,3) 1 DETl DET2 DET3
DET(A,l,l) DET(A,2,2) DET(A,3,3) CALL MPRIN(A,l,l) WRITE(*,*)DETl CALL MPRIN(A,2,2) , WRITE(*,*)DET2 CALL MPRIN(A,3,3) WRITE(*,*)DET3 CALL MPRIN(A,2,3) DET4 = DET(A,2,3) WRITE(*,*)DET4
END $INCLUDE 'DET.FOR' $INCLUDE : 'MPRIN.FOR'
1.0 1.0000000
1.0 .00
.00 1.0 1.0000000
1.0 .00 .00
.00 1.0 .00 1.0000000
.00 .00 1.0
.00 .00 1.0 .00 1.0 .00 NOT A SQUARE MATRIX Stop - Program terminated.
134
COMPUTER ApPLICATIONS IN CHEMISTRY
3.1 Matrix Operations
The determinant of 2 x 2 or 3 x 3 matrix is calculated in DET.FOR.
*
DET.FOR
* *
951
FUNCTION DET(A,NC,NR) PARAMETER (MAX =20) DIMENSION A(MAX,MAX) IF (NR .NE. NC)THEN WRITE(*,951) STOP ENDIF IF (NR .EQ. l)THEN DET = A(l,l) ENDIF IF (NR .EQ. 2)THEN DET = A(l,l)*A(2,2) - A(l,2) * A(2,l) ENDIF IF (NR . EQ. 3) THEN DET A(l,l)*(A(2,2)*A(3,3) - A(2,3) * A(3,2)) DET DET - A(l,2)*A(2,l)*A(3,3) - A(2,3) * A(3,l) DET DET + A(l,3)*(A(2,l)*A(3,2) - A(2,l) * A(3,l)) ENDIF FORMAT(' NOT A SQUARE MATRIX ') RETU~N
END
Initiation of Matrix Initiation of a third order tensor is given in ZER03.FOR Two-dimensional zero, unit and identity matrices can be generated _ using the subprograms ZEROS.FOR, ONES.FOR and EYE. FOR, respectively (Chapter 2.3).
* * * *
30 20 10
ZER03.FOR SUBROUTINE ZER03(A,I,J,K,Z3) PARAMETER (MAX=20) DIMENSION A (MAX, MAX, MAX) I 2 J = 3 K = 4 ZERO = 0.0 DO 10 I1 = 1,1 DO 20 12 = 1, J D030I3=l,K A(I1,I2,I3) ZERO CONTINUE CONTINUE CONTINUE END
3.1 Matrix Operations
135
SOFTWARE METHOD BASE
Addition of Matrices Two matrices of same size can be added or subtracted. Elements of the resulting matrix are the sum or difference of the corresponding elements of the augend and addend matrix. MADD I.FOR is a subroutine subprogram for the addition of two matrices.
* * *
20 10
MADD1.FOR SUBROUTINE MADD1(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) DO 10 I = l;ROWA,l DO 20 J = l,COLA C(I,J)= A(I,J) + B(I,J) CONTINUE CONTINUE RETURN END
*
*
MADD1.DEM
* DIMENSION A(20,20) ,B(20,20),C(20,20) A(l,l)=1.1 A(2,l)=2.1 A(l,2) = 1.2 A(2,2) =2.2 A(1,3) =1.3 A(2,3) =2.3 B(1,l)=8.8 B(2,l)=7.8 B(l,2) = 8.7 B(2,2) =7.7 B(l,3) ;=8.6 B(2,3) =7.6 CALL MPRIN(A,2,3) CALL MPRIN(B,2,3)
* CALL MADD1(A,B,C,2,3,2,3) CALL MPRIN(C,2,3) END $ INCLUDE 'MADD1.FOR' $INCLUDE : 'MPRIN.FOR'
1.1 2.1
A 1.2
2.2
1.3 2.3
B
8.8 7.8
8.7 7.7
9.9 9.9
9.9 9.9
8.6 7.6
C
9.9 9.9
136
COMPUTER ApPLICATIONS IN CHEMISTRY
3.1 Matrix Operations
MADD I.DEM is driver routine to test addition operation. The program fails when the matrices are not compatible for addition. In MADD2.FOR the 'rows and columns of addend and augend matrices are tested for equality.
* MADD2.FOR
* *
SUBROUTINE MADD2(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB LOGICAL ROWCOMP,COLCOMP DIMENSION A(20,20) ,B(20,20) ,C(20,20) ROWCOMP = .FALSE. COLCOMP = .FALSE. IF (ROWA .EQ. ROWB )THEN ROWCOMP = .TRUE. ELSE WRITE(*,lOOl) ENDIF IF( COLA .EQ.COLB)THEN COLCOMP =.TRUE. ELSE WRITE(*,1002) ENDIF IF (ROWCOMP .AND. COLCOMP)THEN
* * *
MADD1.FOR -------
108 106 *
DO 106 I = 1,ROWA,1 DO 108 J = 1,COLA C(I,J)= A(I,J) + B(I,J) CONTINUE . CONTINUE
*
1000 1001 1002
ELSE WRITE(*,1000) RETURN ENDIF FORMAT(lHO, ' SO MATRICES ARE NOT COMPATIBLE FOR ADDITION') FORMAT (lHO, ' ROWS OF A ARE NOT EQUAL TO ROWS OF B') , COLUMNS OF A ARE NOT EQUAL TO COLUMNS OF B') FORMAT ( RETURN END
3.1 Matrix Operations
137
SOFTWARE METHOD BASE
Error messages Rows of A are not equal to rows of B
**** ****
Columns of A are nQt equal co columns of B
are displayed in case of incompatibility. Otherwise matrix addition using MADD I is performed. This is an example of a tiny knowledge based numerical program.
Multiplication of Two Matrices It is similar to vector dot product. Two matrices A and B are compatible for multiplication only when the number of columns of A is equal to the number of rows of B. The product A *B is read as A is post multiplied by B or B is pre-multiplied by A. The element Cij of the matrix C resulting from A *B is the inner product of ith row of A with/h column of B (Chart 3.1). Chart 3.1 : Matrix Multiplication
ll
b l2
b l3
b 21
b 22
b 23
b 31
h32
b 33
][
b
[
all
a l2
a l3
a 21
a'2'2
a 23
[
*
]
=
all *b ll +a I2 *h21 +a 13 *h31
all
* h 12 + a l2 * h22 + a l3 * b32
* hll + a 22 * b 21 + a 23 * h31
a 21
* hl2 + a 22 * b 22 + a 23 * b 32
a 2l
ACOL
C II
=
Lalk
*
b kl
=
!..=I
[all
a l2
a l3 ]
*
[h"] b l2
=
]
a(1,:)
* be:,!)
a(2,:)
* be:, I)
a(1,:)
* b(:,2)
b l3
ACOL
=
I
L a Zk
*
bkl
*
b k2
1..=1
ACOL
La
1..=1
lk
=
I
138
COMPUTER ApPLICATIONS IN CHEMISTRY
ACOL
C 22
=
a L 2k k=!
*
b k2
=
[a 21
a 22
a 23 ]
*
=
La'k k=1
*
b k3
[all
a l2
a lJ ]
*
=
a L 2k k=!
=
* b(:,2)
a(2,:)
[h"] b 23
=
a(l ,:)
* b(:,3)
=
a(2,:)
* b(:,3)
b 33
ACOL
C 23
b 22
b 32
ACOL
C I3
[h,,]
3.1 Matrix Operation.1
*
b k3
=
[a 21
a 22
a 23 ]
*
[hU] b 23
b 33
The multiplication of two vectors can be performed using MMUL1.FOR. The dimensions of a column vector are given as rowvl while that of row vector lcolv. The normal equations in linear least squares for bivariate data can be calculated from matrix operation. If X and Y are column vectors of bivariate data and one is a vector of same size containing 1.0 then the design matrix is fone, xJ.
* *
*
MMUL.DEM DIMENSION A(20,20) ,B(20,20) ,C(20,20) INTEGER ROWA,COLA,ROWB,COLB WRITE(*,*) 'ROWA,COLA,ROWB,COLB' READ(*,*)ROWA,COLA,ROWB,COLB READ(*,*) ((A(I,J) ,J = 1,COLA),I=1,ROWA) READ(*,*) ((B(I,J) ,J =l,COLB),I =l,ROWB) CALL MPRIN(A,ROWA,COLA) CALL MPRIN(B,ROWB,COLB)
CALL MMUL1(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) CALL MMUL2(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) CALL MMUL3(A,B,C,ROWA,COLA,ROWB,COLB) CALL MPRIN(C,ROWA,COLA) END 'MMUL1.FOR' $ INCLUDE 'MMUL2.FOR' $INCLUDE 'MMUL3.FOR' $INCLUDE 'MPRIN.FOR' $ INCLUDE
A 2.0 4.0
1.0 3.0 B
6.0 8.0
5.0 7.0 C 19. 43.
22. 50.
19. 43.
22. 50.
19. 43.
22. 50.
3.1 Matrix Operations
* *
*
* *
*
SOFTWARE METHOD BASE
MMULl.FOR SUBROUTINE MMULl(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20),B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) STOP ENDIF DO 10 I=l,ROWA DO 20 J=l, COLB C(I,J)=O.O DO 30 K=l,COLA C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT (lHO, 'MATRICES A AND B ARE NOT COMPATIBLE FOR' *' MULTIPLICATION '/,' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END
MMUL2.FOR
SUBROUTINE MMUL2(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) RETURN ENDIF DO 50 I = l,ROWA DO 50 J = l,COLB 50 C(I,J) = O. DO 10 J=l,COLB DO 20 K=l,COLA DO 30 I=l,ROWA C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT(lHO,!MATRICES A AND B ARE NOT COMPATIBLE FOR' * 'MULTIPLICATION '//,' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END
139
140
3.1 Matri:r Operations
COMPUTER ApPLICATIONS IN CHEMISTRY
* MMUL3.FOR
* *
SUBROUTINE MMUL3(A,B,C,ROWA,COLA,ROWB,COLB) INTEGER ROWA,COLA,ROWB,COLB DIMENSION A(20,20) ,B(20,20) ,C(20,20) IF (COLA.NE.ROWB) THEN WRITE(*,952) STOP ENDIF DO 50 I = 1,ROWA DO 50 J = 1,COLB 50 C(I,J) = O. DO 10 K=l,COLA DO 20 I=l,ROWA DO 30 J=l, COLB C(I,J)=C(I,J)+A(I,K)*B(K,J) 30 CONTINUE 20 CONTINUE 10 CONTINUE 952 FORMAT (lHO, 'MATRICES A AND B ARE NOT COMPATIBLE FOR' *' MULTIPLICATION ' j , ' AS ROWS OF A ARE NOT' * 'EQUAL TO COLUMNS OF B') RETURN END
MMUL1.FOR uses inner product to calculate the multiplication of two matrices. The programs MMUL2.FOR and MMUL3.FOR are based on middle product and outer product. All these programs result in identical numerical results. But the speeds of execution of the source codes are widely different depending upon the architecture of the computer.
Application of Matrix Multiplication In a two-dimensional coordinate system, point rotation keeping the axis fixed and axis rotation keeping the point fixed are useful in chemical interpretations like group theory and factor analysis. Formulae for the transformations are given in Chart 3.2. Chart 3.2
XNEW=T*X;
Point rotation XNEW w.r.t. stanadard frame
[
cos ()
sin () ]
-sin () cos ()
C·8~) 1.24
X=
G)
() = 30°
Frame/axis rotation X w.r.t. frame
[
cos () -sin () ] sin ()
cos ()
(-O.13J 2.24
3.1 Matrix Operations
SOFTWARE METHOD BASE
Rota~ion
of the coordinate system through multiplication ofU with the data matrix (d).
r
U XZ
* *
=
l
It
141
radians about z axis can be achieved (XYZROT.FOR) by
cos(u)
sin(u)
-Si:(lI)
cO:(lI)
o~
1 J
u*X XYZROT.FOR
* DIMENSION U(20,20) ,V(20,20) ,W(20,20),E(20,20) DIMENSION A(20,20) ,ARZ(20,20) ,ARY(20,20) ,ARX(20.,20) WRITE(7,*) 'M,N' READ(*,*)M,N AROW = M ACOL = N READ (*, *) ( (A (1, J) , J= 1, N) , I ~1, M) CALL MPRTN(A,M,N) WRITE(*,*) 'ZR,YR,XR' READ(*,*)ZR,YR,XR WRITE(*,*)ZR,YR,XR CALL EYE(E,3,3) U ( 1, 1) COS (ZR) U(1,2) SIN(ZR) U(2,1) -U(1,2) U(2,2) U(l,l) CALL 11PRIN(U,N,N) V(l,l) COS (YR) V (1, 3) SIN (YR) V(3,1) -V(1,3) V(3,3) V(1,l) CALL MPRIN(V,N,Nl W(2,2) COS(XR) W(2,3) SIN(XR) W(3,2) -W(2,3) W(3,3) W(2,2) CALL MPRIN(W,N,N) CALL MMUL1(U,A,ARZ, N,N,ROWA,COLA) CALL MMUL1(V,A,ARY, N,N,ROWA,COLA) CALL MMUL1(W,A,ARX, N,N,ROWA,COLA) CALL MPRIN(ARZ,N,3) CALL MPRIN(ARY,N,3) CALL MPRIN(ARX,N,3) END
142
COMPUTER ApPLICATIONS IN CHEMISTRY
3.1 Matrix Operations
Similarly rotation about Y (through v radians) and X (through W radians) axes can be performed making use of V and W matrices. COS
V
= [
-Si~
(v)
(v)
o
sin (v) ]
1
0
o
cos (v)
A knowledge-based program MATKB.FOR diagnoses the type of matrix.
* MATKB.FOR
* *
SUBROUTINE MATKB(A,ROWA,COLA) CHARACTER *40 INF1,INF2,INF3,INF4,FMT INTEGER ROWA,COLA DIMENSION A(20,20) A(1,l) =1 INF1 'Scalar' INF2 'Row Vector' INF3 'Column Vector' INF4 'Matrix'
* 901
FMT = '(12X,A40)' FORMAT ( 'Since No. of Rows: ',13,' and No. of Columns: ',13!)
* IF (ROWA .EQ. 1 .AND. COLA .EQ.1)THEN WRITE(*,FMT)INF1 ENDIF IF (ROWA .GT. 1 .AND. COLA .EQ.1)THEN WRITE(*,FMT)INF2 ENDIF IF (ROWA .EQ. 1 .AND. COLA .GT. 1) THEN WRITE(*,FMT)INF3 ENDIF IF (ROWA .GT. 1 .AND. COLA .GT. 1) THEN WRITE(*,FMT)INF4 ENDIF WRITE(*,901)ROWA,COLA RETURN END
3.1 Matrix Operations
SOFTWARE METHOD BASE
* *
143
MATKB.DEM
* DIMENSION A(20,20) INTEGER ROWA,COLA A(l,l) =1 ROWA 1 COLA 1 CALL MATKB(A,ROWA,COLA) ROWA = 4 COLA = 1 CALL MATKB(A,ROWA,COLA) ROWA = 1 COLA = 4 CALL MATKB (A, RO~!A, COLA) ROWA = 5 COLA = 6 CALL MATKB(A,ROWA,COLA) END $INCLUDE : 'MATKB.FOR'
Sorting of One- and Two-Dimensional Arrays Estimation of robust statistics like median or Inter Quartile Range (IQR) requires sorting of univariate data. Further, finding the minimum and maximum of the data is a prerequisite in the detection of outliers and preparation of control charts. Printing of X, Y graphs demands sorting of Y in descending order followed by obtaining the X values corresponding to Y values. Minimum Value of Vector Assume the first element X(l) of the array to be the minimum. It is compared with the second element X(2). If the second element is less than the first, then the second one is the minimum. Otherwise the first one is the minimum. The comparison is continued for all the data points. The flow chart (Fig. 3.2), program (MIN 1. FOR) and the results for exhaustive test data set (Table 3.l) follow.
* * *
MINl.FOR
MINIMUM OF TWO NUMBERS
SUBROUTINE MINI (X,N,IMIN,XMIN) PARAMETER (MAX = 50) DIMENSION X (MAX) XMIN = X(l) IF(X(2) .LT. XMIN)THEN XMIN X(2) IMIN = 2 ENDIF END
144
3.1 Matrix Operations
COMPUTER ApPLICATIONS IN CHEMISTRY
MINI
XMIN=X(l)
.
XMIN=2 IMIN=2
RETURN Fig. 3.2 : Minimum of Two Numbers
* *
MIN1.dem
*
PARAMETER (MAX = 50) DIMENSION X(MAX) WRITE(*,901) READ{*,*)X(l) ,X(2) IMIN = 1 CALL MIN1(X,2,IMIN,XMIN) WRITE{*,951) IMIN,XMIN 901 FORMAT(/' GIVE X(l) AND X(2} FORMAT (! " ELEMENT ( , ,11, , ) 951 END $INCLUDE : 'MIN1.FOR'
: '\} , , Gl0. 2,' IS MINIMUM ')
3.1 Matrix Operations
145
SOFTWARE METHOD BASE
Table 3.1: Output of MIN1.For to Some Typical Data GIVE X(l) AND X(2) : 6.0,8.0 ELEMENT (1) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 8.0 6.0 ELEMENT (2) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 6.0 6.0 ELEMENT (1) : 6.00 IS MINIMUM GIVE X(l) AND X(2) : 0.0 0.0 ELEMENT (1) : .00 IS MINIMUM GIVE X(l) AND X(2) : -lE-30 -2E-30 ELEMENT (2) : .20E-29 IS MINIMUM
Consider a vector of size NP x 1. The minimum of this array is obtained by the above algorithm, except repeating the process of comparison NP times. The number of elements of the array which is the minimum of the vector is also marked. A FORTRAN subprogra~ (MIN2.FOR) effects the algorithm (Table 3.2). Table 3.2: Algorithm for Rnding the Minimum of an Array
I 1=1
X(I)polgn X 2... . X N are distinct values of X in the interval XL and XU. YI> Y2... YN are the corresponding values of a continuous function of X. If the interpolated argument, XIE satisfies the condition XL < XIE < XU, cal~ulation of YIE is termed as direct interpolation. In inverse interpolation, the value of XIE is computed for a desired value of YIE. The key to the success of interpolation is invoking an appropriate polynomial.
I 6.1
LINEAR AND QUADRATIC METHODS _
6.1.1 Linear Interpolation , Linear interpolation considers a polynomial of first degree, a straight line. Two points (Xi, YI) and (X2' Y2) are chost:n such that XI < XIE < X2. Since the three points are collinear (Fig. 6.1), the slopes calculated from any two points (XI and 'X 2 or XI and XIE) are equal (Deriv. 6.1). 2
2
_~ *I1 Y- L..J Y i
i=I
.,
}
=I
XIE -Xi X-X I
... 6.1
J
j,,-; When the range of X is large, linear interpolation is less reliable. The truncation error in approximating the quadratic function to a linear one is E_trunc = (XIE - X k)*(XIE - X k+l) where X k and X k+1 are the two points chosen from the data for interpolation.
214
COMPUTER ApPLICATIONS IN CHEMlsTf'{y
6.1 Linear and Quadratic Methods
0.9 0.8 0.7
>-
0.6 0.5
0.4 0.3 0.2 0
0.2
0.4
x
0.6
0.8
Fig.6.1: Unear Interpolation
6.1.2 Quadratic Interpolation The inaccuracy of linear interpolation is sUrmounted by adopting the polynomial of second degree P2(X) = a2
* X2 + aJ * X + a o.
ao, aJ and a2 are calculated such that the polynomial passes through the expet;imental points. The coefficients are used to compute the function at XIE (Deriv. 6.2). Obviously, at least three points are needed in quadratic interpolation. Although choice of the three points is arbitrary, one of them should be smaller than XU and the other larger than XIE (Fig. 6.2).
A smooth function is more accurately interpolated by higher order polynomial. A piece-wise lower order polynomial (straight line, Fig. 6.3) results in inaccurate values. A function with sharp comers or rapidly changing higher order derivatives is stiff. A higher order pelynomial approximates stiff function less accurately. Sometimes lower order polynomials work better. Some exponential or rational functions are smooth, but are badly approximated by higher order polynomials. There are some pathological functions for which any interpolation scheme fails . For example, I(X) =3*X2+~. *ln~1t" _X)2}+ 1 is singular at X = n. Any interpolation based on the values X = 3.13, 3.14, 3.15, 3.16 results in a wrong value for X = 3.1416. However a graph of these five points loo~ very smooth (Fig. 6.4).
6. J Linear and Quadratic Methods
>-
215
NUMERICAL INTERPOLATION
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
>-
0.5
0.4
0.4
0.3
0.3
0.2
0.2
, 0.1
0.1 0 0.5
X
0
· 0.5
· X
Fig. 6.2: Choice of Data Points for Quadratic Interpolation
*
Fig 6.3: Errors Due to Piecewise Linear Interpolation of a Non-linear Function
216
COMP'UTER ApPLICATIONS IN CHEMISTRY
6.2 Lagrange and Modified Procedures
30.55
[a] 30.8
[b]
30.5
30.6
~30.45
>.
30.4 30.4 30.2 3.12
3.13
3.14
30.35 3.135
J:]6
3.15
x
(a) Smooth [3.12 to 3.16]
3.14 3.145 x (b) Valley [3.135 to 3.145]
30.5
30.45
[c]
~--"'"
30.4
30.35 3.141
3.142 3.141.5 x (c) Break down [3.141 to 3.142]
Fig. 6.4 : Zooming Effect on the Path of Pathological Function in Different X Ranges
I 6.2 LAGRANGE AND MODIFIED PROCEDURES
~
6.2.1 Lagrange Interpolation Quadratic Interpolation is also inadequate as real data sets follow oblique trends. A cubic polynomial interpolation is a better choice using four points. Lagrange proposed the general formula for interpolation employing nth order polynomial passing through all n+ 1 experimental points. The polynomial given in Eq. 6.5 is called Lagrange polynomial (Deriv. 6.3) . .
Y
=
3
3
j; l
j ;j
XIE-X
Lll X
j
-
... 6.5
J*Yj Xj
j~i
_ X )* ~ Yj /(XIE - Xj) YlE-Il(xIE k .£.. N POL + I
II (X -xJ j
j;l
j ~l
... 6.6
6.2 Lagrange and Modified Procedures
217
NUMERICAL INTERPOLATION
The advantage of the method is that functional relationship between X and Y need not be known. However, knowledge of the order of polynomial is a prerequisite. The limitation of the method is that it does not give any error estimate. Further, different sets of data points produce different interpolated values. For example, the interpolated values (Fig. 6.5) obtained with the first four and last four points are substantially different.
2
2
o
o
-2
-2
o
2
4
6
2
4
6
o
2
4
6
2
o -2
Fig. 6.5 : Effect of Data Range on the Interpolated Value
6.2.2 Modified Lagrange Formula The difficulties in selecting the experimental points are obviated in modified Lagrange method. M2 M2 /(XIE X) YIE= II(XIE-Xkf L/~2 k=Ml I=Ml II(x;-xJ
-;
j=Ml
i#;
Ml and M2 are the initial and final points of the chosen segment of the data set with the interpolated M2
argument as centroid. The product
II(XIE-Xk ) ~l
tends to minimum resulting in very low error. .
The algorithm and Fortran program for any desired order of the polynomial are given in Chart 6.1 and Appendix 6.2.
218
COMPUTER ApPLICATIONS IN CHEMISTRY
6.2 Lagrange and Modified Procedures
Chart 6.1: Algorithm of Lagrange Interpolation
Step 1
x, Y are vectors of size NP x
1
NPOL : Order of polynomial M 1 & M2 first and last points of the segment XIE : Interpolated argument . Step
a
YIE f - - - 0.0
P
f---
1.0
M2
Step 1·
pf--
fI (XIE-X
k )
MI
Step 2
For i = M 1 to M2 calculate TERM 1
~ (XIE-X.)
Evaluate TERM =
TERM! -:M=2:-----
fI(Xj - XJ j=ml )
YIE Step 3
.".
= YIE + TERM
YIE*P
Example 6. 1 Data for a quadratic function, Y=X2 (Chart 6.2) are simulated in the X range of 1 to 6. LANIN < Ll. DAT at the DOS prompt gave interpolated values without error when at least three points are considered. Chart 6.2 : Interpolation Results for a Quadratic Function
2
2 Ll.DAT 6 1
1
2 3 4
4 9 16
5 6 2 1.5 4.5
25 36 2.25 20.25
6.2 Lagrange and Modified Procedures
=
XIE
DATA USED IN INTERPOLATION
K
219
NUMERICAL INTERPOLATION
.1500E+01
YIE
K
DATA USED IN INTERPOLATION
YIE
1
1
TO
2
2.500
2
1
TO
3
2.250
3
1
TO
4
2.250
4
1
TO
5
2.250'
5
1
TO
6
2.250
XIE
K
=
DATA USED IN INTERPOLATION
.4500E+01
YIE
K
DATA USED IN INTERPOLATION
YIE
, 1
1
TO
2
11. 50
2
1
TO
3
20.25
3
1
TO
4
20.25
4
1
TO
5
20.25
5
1
TO
6
20.25
Example 6.2 Protonation constants of ligands can be calculated by graphical interpolation of pH on the formation curve (pH versus nbarH) at nbarH = 0.5, 1.5 etc. Since nbarH is a function of pH, the problem turns out to be an inverse interpolation. The interpolation of experimental formation function data (Chart 6.3) indicates that a minimum of four data points are required. The stepwise stability constants of metal ligand complexes can also be calculated by using this program considering nbar versus pL. C:\CAC> Lanin-
0.01
0.005 0 -0.5
0.5
0 X
246
COMPUTER ApPLICATIONS IN CHEMISTRY
7.4 Integration o/Ordinary....
Chart 7.8
If
Then If
Then If
Then
Integrand is sharply concentrated in one or more peaks Transform the function into differential equations. Location of singularity is known Divide the interval into two parts at the point of singularity. There are breaks, bumps or singularities at unspecified point integration is difficult.
7.4 INTEGRATION OF ORDINARY DIFFERENTIAL EQUATIONS' Change in the concentration of a first order reaction with time
dP / dt=k, *(a - x) is a first order differential equation because it contains a first order derivative. The x is the concentration of P at time t and ks is the specific rate constant. The thermal decomposition of phenyl n-butyl diazirine (A) in DMSO is a two-step consecutive unimolecular reaction and can be represented as a set of first order linear differential equations (Chart 7.9). The solution can be readily sought as the rate constants are of similar magnitude. On the other hand the reduction of TI(III) to TI(I) by Fe(II) (Chart 7.10) can be modeled by a set of first order coupled differential equations.
Chart 7.9 A~B~P
d[A] dt B:
P:
. d[B] dt
-kl [ A ] , - -
1 - Phenyl-I-diazo pentone Carbon, N z kl =6.8*10-4 S-I ; k z = 2.2*10-4
S-I
7.4 Integration of Ordinary ....
NUMERICAL INTEGRATION
247
o.ort7.10 Fe(/I)+Tl(ll/) ~ Fe(JIl)+Tl(IJ)
~ Tl(JI)+Fe(IJ)~Fe(III)+Tl(J) Kinetic pattern can be modeled by a set of first order coupled differential equations
d[Fe(III)] k) *[Fe(II)r [Tl(III)}-k 2 * [Fe(III)r [Tl(II)]+k 3 *[Fe(II)r[Tl(Il)]
dt
d[Tl(I)] k *[Tl(Il)r[Fe(Il)]
dt
3
Since the initial concentrations of reactants, products and the intermediate and the rate constants are known at t =0, the solution of linear ordinary differential equations (ODE) represents concentrations of the species at desired time. Schroedinger wave equation is a second order differential equation relating the potential energy and total energy with distance. A vibrating diatomic molecule can be modeled using harmonic oscillator approximation to the potential energy. Diffusion (u) of a solute in a solvent and movement of blood/nutrients in human· body are represented by partial differential equations. The solution of differential equation (s) from a knowledge of u and its derivative with time describes the propagation of u with time. d 2u dt
--=v 2
or
2*
d2 U
-2
dx
~~ = -~x( D*~: )
Classification of differential equations and the methods in vogue are described in Chart 7.11. The solution of a set of ODES by numerical techniques follows.
248
COMPUTER ApPLICATIONS IN CHEMISTRY
7.4 Integration of Ordinary....
Chart 7.11
Differential Equations
Ordinary Differential Equations
Initial Value
Euler
Partial Differential Equations
Two Point Boundary
RK
Bullirsch
Elliptic
Hyperbolic
Parabolic
Taylor's Series
Predictor Corrector
/\ /\ 15th Order'
4th
Order
7th Order
Modified Gill
ABM
RKF
20th Order
Hamming Milne
7.4.1 EULER METHOD A simple first order differential equation (Eq. 7.3) can be approximated to the forward difference (Eq. 7.4).
dy
- = F(x,y)
... 7.3
dy dx
... 7.4
dx
Ay
Yi+I-Yi
Ax
Xi+! -Xi
where Yi and Yi+1 are values of the function for the limit,
Yi+1 - Yi
Ax
Xi
and
Xi+h
respectively. When
is equal to the derivative. The finite difference X,+i
- X; is
Xi+1 - Xi
(= Ax) approaches
denoted by step size (h). Then
Eq. (7.4) becomes Eq. (7.5).
. .. 7.5
249
NUMERICAL INTEGRATION
7.4 Integration of Ordinary ....
In this case the initial value Yo at x
=Xo is known. The features and algorithm are given in Chart 7.12. The
solutions of typical ODEs are depicted in Table 7.7. The results for the ODE, dy
dx
= X+ y
in the x range
0.0 to 1.0 using Euler method are given in Table 7.8. The values of y by Euler method (ycal) and those by analytical solution (yexact) are comparable for h values of the order 10-5 ensuring the applicability of the numerical procedure. Chort7.120
Assumption First order differential equation is approximated to forward difference formula. Result It estimates the value of the function (Y) at any desired value of X.
Applications It is the basis of other methods like RK, BS etc. It demonstrates the principle by a simple procedure and thus has only pedagogical value. Limitations It is least accurate and the method is not stable. Accumulated round off errors result in inaccurate end results.
Chart7.12b
Step 0 Step 1
i = 0; step_ = 0; The derivative is extrapolated to find the next values at YI Yl = Yo + h
Step 2
* F(xo,)'o);
i = i + 1; Xi =
Xo
+i
h
h=2
*h
Yi = Yi-I + h*F(Xi_J,Xi)
Step 3
If
Yi - Y,-l
> TOL
Then go to step 2
< TOL
Then go to step 4
Yi-l
If
I ~:i-ll Yi
Step 4
step_ = step_ + 1; If step_ > 20, then Stop, Else go to step 1
Step 5
it = it + I; If It > 20, then Stop, Else go to step 0
250
7.4 Integration o/Ordinary....
COMPUTER ApPLICATIONS IN CHEMISTRY
Table 7.7 : Solution of typical ODEs
ODE
Solution
dy -=x+y
Y=exp(x)-x-I
dy -=x dx
x y=2
dy 2 -=x dx
x y=-
dy 3 -=x dx
y=-
dy 4 -=x
y=-
dx
dx
2
3
3
X
4
4
X
5
5
dy Table 7.8: Solution of - = x + y by Euler Method dx
RE
x
Yeal
Yexaet
0.1
0.0051655
0.0051709
0.10435
9.7656e-005
0.2
0.021391
0.021403
0.055726
9.7656e-005
0.3
0.049839
0.049859
0.039656
9.7656e-005
0.4
0.091796
0.091825
0.031729
9.7656e-005
0.5
0.14868
0.14872
0.027063
9.7656e-005
0.6
0.22207
0.22212
0.024031
9.7656e-005
0.7
0.31368
0.31375
0.021936
9.7656e-005
0.8
0.42545
0.42554
0.020428
9·.7656e-005
0.9
0.5595
0.5596
0.019313
9.7656e-005
1.0
0.71815
0.71828
0.018477
9.7656e-005
h
Taylor's Series It offers solution to any differential equation, provided, the derivatives of the functions can be calculated. The value of y (x) at x = Xo is known and that at x = Xo + h is obtained by expanding it by Taylor's infinite series. . .. 7.6
7.4 Integration of Ordinary....
NUMERICAL INTEGRATION
251
Here TE is truncation error. If the first and second derivatives are considered, Eq. 7.6 reduces to Eqs. 7.7 and 7.8, respectively. . .. 7.7
.:.7.8 If the derivatives are calculated then, Y (xo + h) can be obtained. Chart 7.13 incorporates the applications and limitations of the method. Comparison of Eqs. 7.7 and 7.5 reveals that Euler method can be considered as Taylor's series truncated to first order terms. The only difference is that Euler's method approximates the derivative to finite difference.
Chart 7.13
• • • • • •
Assumption The differential equation has derivatives and the function is continuous. Result The solution is accurate only when truncation error is minimum. Application Using tinite difference method, Taylor's series of 15 th or 20th order is employed to solve kinetic equations. It is of pedagogical value. Limitation Higher order derivatives of many non-linear functions are difficult. Remedy Runge-Kutta and Bulirsch-Stoer methods.
7.4.2 RUNGE-KUTTA METHOD Fourth or seventh order Runge-Kutta (RK) method is one of the best choices to solve a system of linear differential equations. It is successful even for rough and discontinuous profiles in adoptive step size mode. Further modification is rapid and crosses singular points within the interval of integration. Algorithm The relation between the ordinates )'0 and )'i is linear in the Euler's method (Eq. 7.4). Thus j(x, y) is the slope. RK method employs the principles of modified or improved (polygon) Euler's method (MEUM). In the modified method the average of slopes at Xo and (xo + h) is used, while in the polygon method, the average of the points Xo and Xo + h is considered (assuming the linear relation between two points). The formula for second order RK method (Eq. 7.10) 1
Yi+l =Yj+-(k 1+kz) 2
... 7.10
can be derived (Chart 7.14) from the modified Euler's formula (Eq. 7.9). The corresponding equations for fourth order (RK4) are given in Chart 7.·15. RK4 has lower truncation errors than the second order method.
252
COMPUTER ApPLICATIONS IN CHEMISTRY
7.4 Integration o/Ordinary ....
Cheri 7.14: Derivation of RK Method The modified Euler's formula is
Yi+l = Yi+!!:.* [F(Xi'Yi) + F(Xi+l'Yi+l)] 2
... 7.9
Substituting Eqs. 7. 5 in 7.9, gives
Y,+I
= yj+~*[ h * F(x"y,) +% F(XI+I'Y,+I)] 1
r h*F(x"y;) +"2[F(x,+h),y;+hF(X)]J h l
= y;+"2*L Let kl Then
= h*F(x"y,) y,
+ 1
and k z
h = "2*[F(x;+h),y,+h*F(y.)]
= y, + ~ (k l + kJ
... 7.10
Chart 7.15 : RK4 Method Basis •
It employs Modified Euler's steps between two successive points (X;, XH1 ). Derivatives . d at .. , I (X) ' at th 'd ' (Xi +2Xj +1) are 0 b tame lilltla ;, filOaI (X) ;+1 and tWice e fill pomt
I]
Formulae = F( x,y );
12 =F(x+hl2,y+t/2); 13 = F( x+hl2, y+t 212); 14 = F(x+h, y+t 3 ) ; y
i+l
t]
=h * I];
t2 =h* 12; t3 = h * 1 3 ; t4 =h* 14;
=y.+1.((t] +2*t 2 +2*t? +t4 )/6.) I 2 -
•
Result Accuracy depends upon the step size. The solution is fairly reliable. Features
•
Order of RK4 method corresponds to approximately the order of Taylor's series.
•
Evaluation of derivatives of the function is not required. It uses only the information from the immediate preceding point.
•
It succeeds even for an intransient problem. In such cases BS method fails. Advice
•
BS method can be used when a highly accurate solution is needed.
7.4 Integration of Ordinary....
253
NUMERICAL INTEGRATION
results (yRK4)of RK4 for dy = x+ y given in Table ·7.9 are close to the analytical solution dx (yexact) with h values of the order 10-2 • RK4 method converges faster than Euler procedure, since the stap size (h) in RK4 is lar.ger tl-jan that in Euler (\ 0- 5) . The
Table 7.9 : Solution of dy = x + y by RK4 Method
dx
X
YRK4
Yexact
RE
h
0.1
0.0051709
0.0051709
6.8 I 39e-006
0.025
0.2
0.021403
0.021403
3.6388e-006
0.025
0.3
0.049859
0.049859
2.5894e-006
0.025
0.4
0.091825
0.091825
2.0718e-006
0.025
0.5
0.14872
0.14872
6.9522e-006
0.05
0.6
0.22212
0.22212
9.2428e-006
0.05
0.7
0.31375
0.31375
1.0438e-005
0.05
0.8
0.42554
0.42554
1.1 I 18e-005
0.05
0.9
0.5596
0.5596
1. I 54e-005
0.05
1.0
0.71828
0.71828
1.1827e-005
0.05
Bulirsch-Stoer (BS) procedure is a hybrid method that exploits the basic philosophies of midpoint procedure, Richardson's deferred approach and rational function. It is reliable and yields accurate solution of ODE with minimum computational effort. The heuristics for selection of this procedure are given in ,Chart 7.16. Monte Carlo (Me) procedure finds application in integrating the function and in solving ODE. Production of high quality pseudo random numbers is the key for the success of MC method. The differential equation corresponding to a first order process can also be viewed from probability point of view and the concentrations of the species at specified time is computed. Chart 7.16
Then
No singularity in the interval (a, b) & Profile is not rough & There are no discontinuities & High accuracy is needed use BS method.
If Then
Extrapolated function has a pole at any of evaluation points Rational function extrapolation fails.
If
Rational function failed & Profile is not rough Use polynomial extrapolation for two steps & Resort to rational polynomial.
If
Then
, 1~.1 ~IGEN VALUES' Eigen value~,also called hidd'en roots, find extensive applications in quantum chemistry, m~ltivariate quantitative analysis and in solution processes. Many physico-chemical phenomena in multi-dimensional ' space can be expressed as eigen problems. Methyl orange exists in two forms (HL and L), which absorb in visible region. Thus. absorbances in the. wavelength range 350 to 600 nm (NWL) at different pH's (NPH) . form a matrix ABSORB of size NPH*NWL. Two eigen values for the data matrix indicate the presence of both the speCies. Similarly, the number of independent reaction paths contributing 'to the rate law is equal to the number of eigen values of kinetic data matrix. -
Geometric Representation of Matrix A matrix can be represented as the coordinates of the points in N-dimensional space. For a 2 x 2 matrix the space is a flat plane and the two vectors originate from the origin of the coordinate system. The absorbance values of solutions at two different pH values and at two distinct wavelengths are represented by a 2 x 2 matrix (A). The rows correspond to the spectra at a given pH and columns correspond to absorbance data at a specific wavelength. The mathematical relationships connecting the extinction coefficients (£) and (2) and concentrations (c) and C2) are given in Eq. 8.1.
...8.1 The size of the data matrix grows when solutions at different pH values are considered or the absorbance values are measured at several wavelengths, but it contains the same intrinsic information, namely, the presence onwo species. Thus, irrespective of the size of the data matrix only two significant eigen values are obtained. All other eigen values are negligible.
255
NUMERIC INTERPOLATION
8.1 Eigen Value.s
Eigen Values of a 2 x 2 Matrix Differentiation of the function [exp(a*x)] results in (dldx)
[exp(a*~)]
= a*exp(a*x)
... 8.2
where dldx is an operator resulting in the same function multiplied by the constant 'a'. The constant is calIed eigen value (A) and the function, eigen vector (E). In general
... 8.3
A*E=A*E
=
* 2
2
* 2
2
A is a N x N real symmetric matrix (A - AT = 0), caIIed Hermitian. It has N eigen values (A!. 1..2, ••• AN) and eigen vectors [EJ, E 2 ,··· EN]. A*E-A*I*E=O (A-A*l)*E=O
... 8.4
One trivial solution is E = 0, which means that the basis vectors reduce to a point which is absurd. Hence a non-trivial solution (E::t 0) is satisfied only when A - 1..*1 = O. It leads to the fact that lA-A *
II ::t O. The
eigen values of A are determined by expanding the determinant in terms of A and finding the roots of characteristic equation. A-A * I =
[ala l
21
a 12 ] a 22
_
A* [10] =0 01
The eigen values indicate the magnitude of stretch or stretch followed by reflection. The eigen vectors are then calculated by substituting the values of A one by one in Eq. 8.4. Eigen vectors are the new (basis) directions along which stretch or reflection takes place. Generally it is not solved as a root finding problem but tackled by advanced algorithms lik,: Given's method or Singular Value Decomposition (SVD). All roots may not be distinct. Some may be nearly zero or relatively small. For a 3 x 3 matrix the characteristic equation is cubic and for an Nth order matrix it is an Nth order polynomial.
256
8.2 Eigen Vectors
COMPUTER ApPLICATIONS IN CHEMISTRY
18.2 EIGEN VECTORS' Normalization of Eigen Vectors The eigen vectors are nonnalized by dividing each element by the length of the corresponding column
(IIEJ)
For a 2D data matrix, the eigen values are equal to the major and minor axes of the ellipse and the data points lie on the ellipse. The eigen vectors represent the direction of the new orthogonal basis vectors and the product of eigen values and eigen vectors also can be represented with in the same ellipse (Fig. 8.1). 2.5
2
1.5
0.5
o -0.5 -1
-1.5
-2 -2.5
-2
-1.5
-1
-0.5
o
0.5
1.5
2
2.5
Fig. 8.1: Collinear Vectors
Example 8.1 1.0000 -0.1340 ] is 30°. Eigen value analysis The angle between column vectors of the data set [ 2.0000 2.2321 shows that the percentage explainabilities (PEs) in the two column spaces are 78.8 and 21.1. A large amount of variance in the data is explained by the first eigen vector, that is first column of the eigen vector matrix.
8.2 Eigen Vectors
0.7071 [ 0.7071
257
EIGEN VALUES
-0.4472J . The residuals with one component are high and they are completely accounted by - 0.8944
the second one. Eigen vectors and eigen values along with the data are pictorially represented in Fig. 8.2. 10 8 6
4 2 0
-2 -4
-6 -8 -10 -8
-6
-4
-2
0
2
4
6
8
Fig. 8.2: Eigen Ellipse for Non-orthogonal Vectors
Example 8. 2 0 1.0 0.0] are orthonormal (i.e., angle between them is 90 ). The eigen value [ 0.0 1.0 and eigen vector matrices are same and numerically equal to identity matrix. The PE of each eigen value is 50. Hence, the eigen vectors in 2D space reduce to a circle (Fig. 8.3).
The vectors of the data set
Example 8.3
The angle between the vectors of 1.0 [ 2.0 the eigen values
[~J
1.0] is zero, which obviously represents a straight line. The first of 2.0
explains 100% variance in data inferring one dimension is sufficient. Fig. 8.1 depicts
E 1, E2 axes and data points in x - y plane. As the minor axis is zero (Janda 2 = 0) the ellipse collapses to a straight line and both the points lie on the EI axis. Thus, transformation of the coordinate axis to eigen vector space results in dimension reduction.
258
8.2 Eigen Vectors
COMPUTER ApPLICATIONS IN CHEMISTRY
0.8
0.6 0.4 0.2
o -0.2 -0.4 -0.6 -0.8 -1 -1
-0.6
-0.8
-0.4
--0.2
o
0.2
0.4
0.6
0.8
Fig. S.3 : Eigen Vectors for Non-orthogonal Vectors
Example 8.4: Methyl Red
The spectra of Methyl red at different pHs (2.0 to 7.0) have one maximum (Fig. 8.4) and eigen vector analysis indicates the presence of two species. The absorbance matrix (Table 8.1 a) is highly correlated (Table 8.1b) and the column vectors are all non-orthogonal (Table 8.1c). The first two eigen values explain 73.77 and 25.11 % of the variance in the data (Table 8.1 d).
Table S.la: Spectra of Methyl Red at Different pH Values Absorbances at the Wavelengths (nm) pH
575
525
475
425
3.4
0.4600
0.3420
0.1080
0.0180
4.6
0.9930
0.7420
0.2820
0.0770
5.4
0.4000
0.3650
0.3100
0.2880
6.2
0.0600
0.1520
0.3200
0.4000
8.2 Eigen Vectors
259
EIGEN VALUES
[a]
[b)
0.8
0.8
~ 0.6
0.6 '
-e
Sl 0.4
.0
~,~,~:~~:-!
0.2
o L.-~_+_--+-.,.---+----'
OL-----if------if------if--~---I
400
450
500
550
600
3
5 pH
4
wI
Fig. 8.4: (a) Absorbtion Spectra. (b) Variation of Absorbance with pH for Methyl Red
Table 8.lb: Correlation Matrix
X2
Xl
1.00
X2
0.99
1.00
-0.13
-0.00
1.00
-0.73
-0.64
0.77
Table 8.le: Angles between Column Vectors
Cl
C2
Cl
0.00
C2
7.78
0.00
41.14
33.37
0.00
1.00
6
7
260
8.2 Eigen Vectors
COMPUTER ApPLICATIONS IN CHEMISTRY
Table 8.ld: Eigen Values and Cumulative Per cent Explainability (CPE)
No
A.
PE
CPE
1
73.77
73.77
2.428
2
2S.77
99.54
0.2963
3
. 0.361
99.9
0.0000S82
4
0.097
100
4.pSe-006
Example 8.5: Acido Basic Equilibria of Chloranilic Acid The spectra of chloranilic acid at different acid concentrations, 3D-surface and 2D-contour are presented in Fig. 8.S. Contour plots are 2D representation of 3D surface in variable axes at iso-response values. Each iso-contour represents the lines of equal responses. It gives quantitative information of the variation of the response with the simultaneous variation ofthe two variables.
o
0
2,------t-----t------, 1.S
0.5
o L..;;~:::......__r_-=:::;;;=-=~F_---.J 250
300
350
400
o
5
10
Fig 8.5: (a) 3D Surface of ChloraniHc Acid, (b) Contour Plot, (c) Spectra at Different Concentrations, (d) Spectra at Different Wavelengths
15
t 261
EIGEN VALUES
8.2 Eigen Vectors
The correlation between column vectors is 1.0 and angle is
Standard Normal Variate Standard normal variate (z,) has a zero mean and unit SD. It is applicable for data set with sample size greater than 30. When NP < 30, even if Zj is less than 2.5, no inference should be drawn regarding outliers. It is calculated as X,-MEANX SO IfZj > 2.5 & NP > 30 Then outlier If ~ < 2.5 & NP > 30 Then X is one among the set
Example 9.5 For the data set X = [-0.90, 0.67, 0.0, 1125.17, -0.67] the statistics are Mean = 224.85, Median = 0.0, SD = 503.29 and Zj = [-0.44855, -0.44543, -0.44677,1.7889, -O.4+:'lj. The outlier (1125.17) invalidates mean and standard deviation. Hence no conclusion is possible regarding outliers.
Mean Deviation Mean deviation is the average of the absolute of the deviations. Multiplication of mean deviation with ~ 7r I 2 ensures that the value approaches the SD for normal distribution.
·1
MEDEV
~ L DEV.! NP
Skewness It is the third moment about the mean. It is a dimensionless quantity indicating the shape of the distribution. A value of zero for skewness indicates that the underlying distribution is symmetrical. However, for any set of measured values, skewness zero is absurd.
If Then
Skewness> 0 Peak of distribution of the data is to the left of the mean & there is smaller scatter of lower values than larger ones
If Then
Skewness < 0 Peak is to the right of the mean & Scatter of the lower values is larger than those of larger ones
If Then Else
Skewness = 0 Distribution is symmetrical Distribution is asymmetrical
(Fig.9.4a)
(Fig.9.4b) (Fig.9.4c)
278
COMPUTER ApPLICATIONS IN CHEMISTRY
9.3 Moments of Data
[b)
[a]
[c)
Fig. 9.4: Skewness is (al Greater than Zero (bl Less than Zero (el Zero
* * *
SKEW. FOR
10
951 952
FUNCTION SKEW(X,N,MAX) DIMENSION X(MAX) REAL MEAN1,MED AVE = MEAN1(X,N,MAX) VAR = O. DO 10 I = 1,N VAR = VAR + (X(I) - AVE) ** 2 CONTINUE SD = SQRT(VAR/(N-1)) ZMEO = MED(X,N,MAX) SKEW = 3.0 * (AVE-ZMED)/SD WRITE(*,951)AVE,ZMED,SD WRITE(*,952)SKEW FORMAT (' Mean ',f10.4,' Median C FlO.4) FORMAT (' Skewness ',F10.4) ·RETURN END
$INCLUDE: $INCLUDE:
'SUM1.FOR' 'MEAN1.FOR'
',FIO.4,
' S D
UNIVARIATE ANALYSIS
9.3 Moments of Data
* * *
279
SKEW. OEM
$DEBUG DIMENSION X(5) DATA X/2.0,3.0,7.0,8.0,10.0/ N = 5 CALL WX1(N,X,N) Z = SKEW{X,N,5) END $INCLUDE: 'SKEW.FOR' $INCLUDE: 'MEO.FOR' $INCLUOE: 'WX1.FOR'
X(2)= X(l)= 2.000 X(4)= X(3)= 7.000 X(5)= 10.00 Mean 6.0000 Median 7.000 • S ];) 3.3912 -.8847 Skewness
3.000 8.000
Kurtosis It is the fourth moment about the mean indicating the sharpness of the peak. This is a measure of peakedness of the distribution near a modal value. It is also a dimensionless quantity (Fig. 9.5) sculates kurtosis and indicates the type of distortion of the distribution. Its demonstration routine KURT.FOR calculates is KURT.DEM
* * *
KURT. OEM
$DEBUG PARAMETER (MAX = 500) REAL KURT DIMENSION X(MAX) CALL IX1(N,X,MAX) CALL WX1(N,X,MAX) Z = KURT(X,N,MAX) END $INCLUDE: 'KURT.FOR' $INCLUDE: 'WX1.FOR' $INCLUDE: 'IX1.FOR'
280
9.3 Moments of Data
COMPUTER ApPLICATIONS IN CHEMISTRY
[b]
[a]
[c] Fig. 9.5: Kurtosis (0) leptokurtic (b) Mesokurtic (c) Platykurtic
* * *
KURT. FOR FUNCTION KURT(X,N,MAX) REAL*4 KURT DIMENSION X(N) SX2
= =
O.
SX4 O. DO 10 I .rsX2
10
= =
SX4 CONTINUE
1,N SX2 + X (I) ** 2 SX4 + X(I) ** 4
9.3 Moments of Data
UNIVARIATE ANALYSIS
KURT
* * '*
281
N * SX4/SX2/SX2 KNOWLEDGE BASE
IF (KURT .LT. 3.) THEN WRITE(*,951)KURT ENDIF IF (KURT .EQ. 3.) THEN WRITE(*,952)KURT ENDIF IF (KURT .GT. 3) THEN WRITE(*,953)KURT ENDIF RETURN 951
FORMAT (' Kurtosis =' ,f8.2,' Platykurtic')
952
FORMAT (' Kurtosis =',f8.2,' Meso or normal kurtic')
953
FORMAT (' Kurtosis =',f8.2,' Lepto kurtic') END
X(l)= 1.000 X(2)= 1.000 X(4)= 1.000 X(3)= 1.000 Kurtosis = 1.00; Platykurtic
Limitations of Skewness and Kurtosis Skewness and kurtosis are less robust than lower moments (mean, SD). So they should be interpreted with caution.
Confidence Interval It is the distance on either side of Xmean (Fig. 9.6a) within which one expects to find the true central value
within a specified probability. The range increases with increase in confidence level (Fig. 9.6b). The confidence interval at the significance level (ex) is given as -Z to +Z assuming a two tailed distribution (Fig. 9.7) The probability off(nding the mean is between the limits -1.959 and +1.959. Z values (Appendix 3) at different ~onfidence levels are given in Table 9.2.
282
COMPUTER ApPLICATIONS IN CHEMISTRY
Mean-3SD
Mean+3SD
Mean [a] 99
95 9
1 °1 99.5 99.9 [b]
Fig. 9.6: Confidence Interval for Univariate Data
Table 9.2: Z Values at Different Confidence Levels
Alpha (a)
% Confidence
Z
level
0.0
100
00
O.l X 10-7
99.999999
5.37
0.01
99.0
2.57
0.05
95.0
1.95
0.10
90.0
1.64
0.30
70.0
1.03
0.50
50.0
0.67
If probability of Z < 5% then the datum can be deleted or retained as an outlier.
9.3 Moments of Data
One Tailed
Two Tailed
Level Rejection
Acceptance
0.05
95%
cz
~ ~ m
;0
0.01
99%
»z
:t>
~
iii
-2.33
-2.33
2.33
Fig. 9.7: 'Rejection and Acceptance Regions for Standard Normallzl Variable tv
00 ....,
\
284
o o
o
COMPUTER ApPLICATIONS IN CHEMISTRY
~
o o
o
t
9.3 Moments ofData
.,
9.4 Robust Methods
285
UNIVARIATE ANALYSIS
Student t-test If the samples are from a normal distribution and the sample size is small (NP < 30), then the student t-test value is used to compute the confidence limits of the mean by the formula LU = X
±
t
* SQRT(SDINP)
The value of t is taken from standard t-tables (Appendix 4) and it depends upon the level of probability (a.) and the degrees of freedom. If a value of 0.95 is used for a., it means that 95% of the observed values will be within calculated limits LV (Table 9.3). The distance on either side of Xmean increases as the confidence one bestows increases. Table 9.3: t-Values at Different Confidence Levels
t-value(DF, a.); a.
% Confidence
5.841
0.05
99.5
4.541
0.Q1
99.0
2.353
0.05
95.0
1.638
0.10
90.0
NP=4
Example 9.6 The mean and SD for ten replicate titrations of anhydrous sodium carbonate with 0.1 mol dm- 3 HCI are 0.1008 and 3.38e-04. The 95% confidence interval for the mean is 0.1005 to 0.1010.
19.4 ROBUST METHODS 1 The statistics discussed so far (mean, SD, skewness and kurtosis) for univariate data are based on the normal distribution of noise, absence of systematic errors and outliers. The methods resistant to distribution of errors and outliers are referred as robust methods, which are classified as distribution free methods (Gnostic) and non-parametric ones (Median, MAD, Box-Whisker plot, etc.). Percentiles The 25th and 75th percentiles are called the first (lower) and third (upper) quartiles. The difference between the third and first quartile is Inter quartile range (IQR), which is a robust estimator of dispersion. Median The 50th percentile is the second quartile or popularly known as median. For a data set in 'ascending order, median is the middle value in the data set with odd number of points while it is the mean of the middle two values for even number of observations. 50% of elements lie below and the other 50% lie above the median. Some of the characteristics of median are given in Chart 9.7.
286
COMPUTER ApPLICATIONS IN CHEMISTRY
9.4 Robust Methods
Chart 9.7 •
Median detects outliers and is robust to them.
•
It is insensitive to obliqueness of distribution.
•
It is a maximum likelihood estimator of central tendency for Laplace distribution. It is a biased estimator for normal distribution.
•
(or) Area of the tail is large 50% or more of the observations are outliers Then Median fails
If
* * *
MED.FOR FUNCTION ZMED(X,N,MAX) DIMENSION X(N) N2
=
N/2
IF (2*N2 .EQ. N)THEN
=
ZMED
O.5*(X(N2) + X(N2+1))
ELSE X(N2+1)
ZMED ENDIF RETURN END
Median Absolute Deviation
all
Median of absolute deviations from the sample median is called Median absolute deviation (MAD). MAD is useful to detect outliers and the breakdown point is 50.
MAD=
X -MED" I
1 *__ _
MED(Res) 0.6745 IF
Res/Med(res) > 5.0
Then Outlier
Here 0.0/0.6745) is correction factor consistent with usual scale of Gaussian distribution. MAD.FOR is the FORTRAN function subprogram for calculation of MAD .
•
9.4 Robust Methods
UNIVARIATE ANALYSIS
* *
MAD. FOR
* $DEBUG REAL FUNCTION MAD(X,N,MAX) REAL MED DIMENSION X(N) ,RES(200) ZMED = MED(X,N,MAX) DO 10 I = 1,N RES (I) = XII) - ZMED WRITE ( * , *) I, X (I), RES ( I) 10 CONTINUE RESMED = MED(RES,N,MAX) WRITE(*,*)RESMED 'MED/0.6745 DO 20 I = 1,N R = ABS(RES(I)/RESMED) IF (R .GT. 5)THEN WRITE(*,951)I,X(I),R 951 FORMAT(I3,G15.4,F8.2,' OUTLIER') ELSE WRITE(*,952)I,X(I),R 952 FORMAT(I3,G15.4,F8.2) ENDIF ·20 CONTINUE WRITE(*,*)RESMED,ROBVAR END $INCLUDE : 'MED.FOR'
*
* *
UNIVAR.FOR
PARAMETER (MAX=200) DIMENSION X(MAX), CALL IX1(N,X,MAX) CALL ASOR02(N,X,MAX) CALL WXI (N,X,MAX) ZSKEW SKEW (X,N,MAX) ZKURT = KURT (X,N,MAX) END $INCLUDE: 'WXI.FOR' $INCLUDE: 'IXI.FOR' $INCLUDE: 'SUMI.FOR' $INCLUDE; 'SD:FOR' $INCLUDE: 'MEANI.FOR' $INCLUDE: 'ZMED.FOR' $INCLUDE: 'ASOR02.FOR'
287
288
9.4 Robust Methods
COMPUTER ApPLICATIONS IN CHEMISTRY
c:\>univaruniv1.res X(l)= 2.000 ; X(2)= 3.000 ; X(3)= 7.000
X(4)= 8.000;
X(5)= 10.00 6.0000 Mean Median 7.0000 S D Skewness Kurtosis
3.3912 -.8847 1.62 Platykurtic
UNIV AR.FOR is useful for exploratory statistical analysis of univariate ,data. It calls many of the function subprograms discussed earlier and the mode of calling is given in Fig. 9.8.
Hybrid Techniques Median and IQR are biased estimates of central tendency and dispersion, although they are robust. So the power of these point estimators and that of MAD is used to detect chemical outliers as well as data outliers. The outliers are filtered from the data set and the unbiased estimates (mean and SD) are calculated. A combination of biased estimators to detect outliers and unbiased ones to estimate the univariate statistics has become popular as a hybrid technique. IXI
ASOR2
MEAN 1
SD
SUM 1
MEAN 1
SUMI
MEAN 1
SUMI
UNIVAR
SKEW SD
KURT
MEAN 1
KURTKD
Fig. 9.8: Calling Sequence of Subprograms in UNIVAR.FOR
SUM 1
289
UNIVARIATE ANALYSIS
9.4 Robust Methods
Example 9.7: Simulated Data In an utopian situation, if there is no error in replicate measurements, the mean and median \viII exactly be the same and SD will be zero. The residuals with respect to mean and median are also exactly zero. MAD (
MED * -1- MED(Res) 0.6745
=[ Xi -
J)'
IS
. . zero. thus a Not a Number (NaN) as denommator IS
MEAN
SD = .000000
= 1.000000 MEDIAN = 1.000000 Residual wrt
NO
X
1
1.00000
.00000
.00000
NaN
2
1.00000
.00000
.00000
NaN
3
1.00000
.00000
.00000
NaN
4
1.00000
.00000
.00000
NaN
MAD
Mean
Median
Example 9.8: Without Outlier Mean and standard deviation for the replicate determination of potassium acid phthalate with sodium hydroxide indicate that the SD is far less than the mean. It infers that the variation in the replicate measurements is only due to tolerable random errors. There are no outliers since the mean and median are very close. MEAN MEDIAN
NO
X
= =
.033450 .033425
Residual wrt Mean
1 2 3 4 5 6
.03322 .03338 .03340 .03345 .03361 .03364
SD = .000156
-.00023 -.00007 -.00005 .00000 .00016 .00019
MAD
Median -.00020 -.00004 -.00003 .00003 .00019 .00022
1.7826 0.3913 0.2174 0.2174 1.6087 1.8696
Example 9.9: Single Outlier The data are LD-50 activity estimates in log (llmolar concentration). They are assurp.ed to form normal distribution with a variance of 0.7 and are independently distributed. Point 5 is an outlier and mean and median differ by 0.3 units.
290
9.4 Robust Methods
COMPUTER ApPLICATIONS IN CHEMISTRY
data = [5.27, 6.85, 4.94, 5.01, 4.62] MEAN MEDIAN
5.338000 5.010000
= =
NO
SD = .876396
Residual wrt
X
Mean
Median
MAD
1 2
4.62000 4.94000
-.71800 -.39800
-.39000 -.07000
3
5.01000
-.32800
.00000
4
5.27000
-.068"00
.26000
1.0000
1.84000
7.0769
5*
6.85000
1.51200
1. 5000
0.2692 0.0
Example 9.10: Two Outliers Points 6 and 7 of this data set are outliers. After their removal mean and median are equal to first decimal place in the next phase.
MEAN MEDIAN
. NO
= 1. 311428 = 1.1500 X
SD =
.342171
Residual wrt Mean
Median
MAD
1 2
1.05000 1.08000
-.26143 -.23143
-.10000 -.07000
1.4286 1.0000
3
0.7143
1.10000
-.21143
-.05000
4
1.15000
-.16143
.00000
O.
5
1.20000
- .11143
.05000
0.7143
6*
1.70000
.38857
.55000
7.8571
7*
1.90000
.58857
.75000
10.7143
9.4 Robust Methods
291
UNivARIATE ANALYSIS
After Elimination of Outliers X = [1.05,1.08,1.10,1.15.1.20] MEAN MEDIAN
= =
NO
1 2 3 4 5
1.116000 1.100000
SD
=
.059414
Residual wrt
X
1.05000 1.08000 1.10000 1.15000 1.20000
Mean
Median
-.06600 -.03600 -.01600 .03400 .08400
-.05000 -.02000 .00000 .05000 .10000
MAD
1.0000 0.40.00 0 1.0000 2.0000
Example 9.11: Three Outliers There are three outliers out of four data points (> 50 per cent outliers). Thus the break down point of even median has reached and no inferences can be attempted.
x = [4.62, 5.01, 6.89, 7.02] MEAN MEDIAN
NO
=
5.885000
SD = 1.246876
= 5.950000 X
Residual wrt Mean
1 2* 3* 4*
4.62000 5.01000 6.89000 7.02000
-1.26500 -0.87500 1.00500 1.13500
MAD
Median -1.33000 -0.94000 0.94000 1. 07000
1.3234 0.9353 0.9353 1.0647
Box-Whisker Plots The data pertaining to industrial production, environmental or biosamples are analyzed by classical statistics although the errors do not follow Gaussian model and outliers occur frequently. Thus mean and SD are greatly influenced demanding a method which is distribution-free and insensitive to outliers.
292
COMPUTER ApPLICATIONS iN CHEMISTRY
9.4 Robust Methods
Box-Whisker plot displays location (Chart 9.8), spread, skewness (Chart 9.9), tail length and outliers even for a small univariate data set. It is a non-parametric test and thus is resistant to a wrong assumption of the distribution of data.
Chart 9.8: Algorithm of Box-Whisker Plot Step 0 : X : A vector of univariate data set Step 1 : Sorting of X into ascending order Step 2: Finding minimum [X(I)], maximum [X(n)], median Step 3 : The value in the middle to the right and left of median (HU : Upper hinge; HL : Lower hinge) RF=HU-HL Lower fence (LF) = HL - 1.5 * RF Upper fence (UF) = HU + 1.5 * RF Median variability or robust confidence.interval RL = median - 1.57 * RFISQRT (n) RU = median + 1.57 * RF/SQRT (n) Step 4 : Draw rectangular box between the h;nges (HL, HU) Mark the medi"n by a cross bar Draw a horizontal line (Whisker) from HU to UP Draw another horizontal line from HL to LF Mark each value falling outside the range (RL to RU) with a star. Draw vertical lines corresponding to the notches (RL, RU)
Chart 9.9: Inferences Detection of outliers in the lower and upper limits of data set Skewness of the distribution Kurtosis from the tail If left hand section of the box is longer than the right Then distribution is negatively skewed and has tail along left. If right hand section of the box is longer than left Then distribution is positively skewed and has long tail towards right.
Example 9.12 The scatter plot for 41 samples of Arsenic indicates no outliers with the confidence interval 2 to 35 ppm. However, Box-whisker plot prescribes tolerance bounds to be 12 and 26. Now, there af~ more than 20% low-lying outliers. This controversy is as a result of the wrong assumption of the Gaussian distribution indicated in histogram.
Example 9.13 An outlier (0.035) is incorporated in the dataset given in Example 9.8 and the box-whisker plots are drawn for the data with and without outlier (Fig. 9.9).
0.035
0.035
0.0348
0.0348
0.0346
0.0346
0.0344
0.0344
0.0342
0.0342
'"
'"
C1)
..El ~
293
UNIVARIATE ANALYSIS
9 5 Compari~on of Univariate Data Sets
C1)
::!
~
0.034
0.034
0.0338
0.0338
0.0336
0.0336
0.0334 0.0332
?
0.0334 0.0332
Column Number
Column Number
(a)
(b)
Fig. 9.9: Box Plots (a) With and (b) Without an Outlier (0.035)
9.5 COMPARISON OF UNIVARIATE DATA SETS' With Standard Reference Value Today standard reference materials are available from international agencies for almost all types of materials like alloys, timber, bio-fluids, drugs, fertilizers, pesticides or food materials. The analysis of standard material is performed by a prescribed procedure to check the functioning of instrument, skills of the analyst and adaptability of the method. An accredited laboratory analyses field samples to ascertain whether the value significantly differs from the prescribed one. These tasks involve comparison of the sample characteristics with those of population. Chart 9.10 describes comparison of mean of a data set with a standard value.
294
COMPUTER ApPLICATIONS IN CHEMISTRY
9.5 Comparisoll of Ullivariate Data Sets
Chart 9.10 Input
: X = [Xl> X2 , X3 ... XNP]; Xstandard
Object of test Assumption
Ho: Xmean = Xstandard; No outliers
Test selection KB
If NP > 30, then use Z test, else perform t test
z=(
Formulae
HA: Xmean ;j:. Xstandard
Xmean- Xs tan dard SD
DF=NP-l Test Interpretation KB : If
Z (or t) > Z_table (or Uable)
Then
Accept Ho [i.e., data is not different from standard]
Else
Reject
Ho
[i.e., accept HA, i.e., data is different from standard]
Example 9.14 The mean concentration of 10 samples of anhydrous Na2C03 (standard value: 0.1000) titrated ",gainst 0.10 N HCl is 0.1008 with a variance of 1.14 * 10-7 • (-test is applied NP = 10,
( = 0.1008- 0.1000 = ~1.14xlO-7
df = 10 - 1 =9;
0.0008 4 3.376 * 10-
Alpha = 0.05 ;
=2.36 t_ table = 2.262;
t > C table; Ho accepted
Thus, sample analysis is reliable as X = X_standard.
With Another Data Set More often a judicious choice between two analytical procedures, instruments or laboratories is crucial. The two data sets in this case can be considered as two different popUlations with means III and 112 and variances 2 al ami a/. So, the samples of sizes nl and 112 are considered as representatives of two populations. Assuming each population to be homogeneous, the chemical task -is to arrive at the conclusion whether the precision and/or accuracy of the two sets are same or not. A few typical chemical tasks are illustrated in Chart 9.11.
Chart 9.11 • • • •
Calibration of a pipette from two manufacturers Nicotine in blood samples (comparison of precision of GC for two different concentrations) P 20 S in fertilizer (citrate and sulphuric acid methods) Dinitro cresol in herbicides (Polarographic and titrimetric methods)
UNIVARIATE ANALYSIS
9.5 Comparison of Univariate Data Sets
•
Iodine in soyabean oil (Flame ionization and atomic absorption spectroscopy)
•
Folic acid in two drugs
•
Analysis of Europium in radioactive material by two different laboratories
•
Efficiency of two catalysts in hydrogenation of oils
•
Comparison of a newly developed method of analysis with a standard one.
295
This comparison requires a priori knowledge of the type of distribution, sample size and independence of the gata sets. The F test for equality of variances of the two samples is given in Chart 9.12.
Chart 9.12 Ho
: Varl =Var2
HA
Varl:t; Var2
Input XI = [XII X~I X 31 ...... ]T X 2 = [X12 X 22 X 32 ....... ]T Calculate Mean l, Mean2, SD l, S02 Step 1 Step 2 : IF varl> var2 Then F =VARlNAR2 DFI = Nl - l; DF2 = N2 - 1 Else F =VAR2NARI DFl =N2 - 1 ; DF2 = NI - I End Step 3
If F < Ftable then accept Ho, else accept HA
If the distribution is normal and outliers are absent, classical Z and t tests are employed to check the
equality of means of the two samples. For data of unknown distribution Wilcoxon,. Cochran and MannWhitney tests (Chart 9.l3) can be used.
Chart 9.13 If Then
small sample, non-normal and independent use u-test of Mann-Whitney
If Then
small sample, heteroscedastic and non-normal use Cochran test
If
samples are related, small samples and non-normal use Wilcoxon - t - test.
Then
296
COMPUTER ApPLICATIONS IN CHEMISTRY
9 5 Comparisoll of Univariate Data Set'
However, in recent times, both types of tests are performed and the contradictory conclusions are rationalized by carefully inspecting the data. The implementation of the formulae with test conclusions is given in Chart 9.14. Chart 9.140: Z Test for Comparison of Means of Two Samples
Input
Xl == [Xli X 21 X 31 X 2 = [X12 X 22 X 32
Objective
...... ]T ...... ]
Xmeanl
Xmean2
Xmean 1
*"
Xmeall2;
Assumption
large sample, independent & normal If Then use Z statistic
Options
One tailed and two tailed test
Formula
Z
Xmeanl- Xmean2
If Z < Ztable, then accept Ho, else reject Ho
Chart 9.14b:
t-test for Comparison of Two Samples
Input : Xl and X z vectors Objective: Ho : Xmean I == Xmean2 HA : Xmeanl
*"
Xmeall2 (Fig.9.3h)
(s~ , s~
)&
If
small sample & homogeneous
Then
normal use t I test
If
(J'try ' (J'zz are unknown, III > 8 & nz> 8
Then
Use t I test
If
Small sample &
Then
Heterogeneous Normal lise t2 test
(s~
*" si)
(Fig. 9.3i) &
ry
If
nl
*"
nz &
s~;:::: S2
Then use t2 test If t < table then accept Ho, else accept HA
297
UNIVARIATE ANALYSIS
9.5 Cornpari.wll oj Univariate Data Seh
Chart 9.14c: Formula base
*
Xmeanl-Xmean2
~(n, -1)* s[ +(n2 -1)* si dlt, =
.
(nl
/nl*n 2*(n,+n 2 -2) 11 1+n2
+ n2 - 2) Xmeanl- Xmean2
Normal
Non Normal
~
Small
Small
Large
Fig. 9.10: Comparison of Means ofTwo Samples Example 9.15
The means and variances of two samples of size 36 and 31 are 42.8,40.3 and 6.75, 4.25, respectively. It is a case of large sample (NP > 30); hence, Z-test is performed for comparison of means. The null hypothesis (Ho) is not acceptable since Z_cal (4.388) :> Z_table (1.645, alpha=O.05, DFI = 35; DF2 = 30). The variances of the two samples are same (Homoscedastic) as F3al (1.5882) is less than the table value (1.84). Example 9.16
In an international comparative study, duplicate samples of per cent nitrogen (Table 9.4) in whole meal flour were analyzed by Kjeldahl method. The means and variances are statistically indistinguishable based on t- and F- tests.
298
COMPUTER ApPLICATIONS IN CHEMISTRY
9.5 Comparison of Univariate Data Set
ttable
Example 9.18
The estimation of concentration of a trace element (Table 9.6) with two different methods shows that the content is same although one method is more precise (smaller variance) than the other. Table 9.6 : Concentrations by Different Methods
2.0374
2.3441
1.8043
2.8163
1.9276
1.5985
2.0174
2.4005
2.0227
2.0297
2.0476
2.7964
1.9871
1.8745
1.9475
2.1487
300
COMPUTER ApPLICATIONS IN CHEMISTRY
1.8495
2.0679
1.9395
1.9750
2.0941
1.6584
1.9796
2.5841
meanx
1.9712
2.1912
var
0.0070
0.1640
np tvalve =-1.8431 Fvalue = 23.5591
95 Comparison of Univariate Data Set'
,
12
12
ttable = 2.0740 ;
xmean 1 = xmean2 is acceptable
alpha=0.05 ; df:22
Since Itvaluel Ftable
Comparison of Means of More than Two Samples When more than two methods or instruments are compared, a simple F-test (Chart 9.15) throws light whether the means are equal or not. However, it does not indicate which sample is different from the rest. Chart 9.15 : Comparison of Means of More than Two Samples
Ho: III = 112 = 113 ... = Ilk; HA
x
Ln, *Xf
:
III
7:
112
7:
113 ...
7:
Ilk
DF= k-l
n k
Ln F=
i
*(XMEAN i _X)2
_i_=I_----:-_-----:-_ _ __
(I ~:)* s/ If F < Ftable Then accept Ho (Means are equal)
Else Reject Ho (Mean.s are not equal)
Example 9.19 The means and variances of three data sets with different sample sizes (np = [10,12,14],) are [234], and [0.60.8 1.2]'. As F3al (15.3191) is less than the table value (19.0) at confidence level (a = 0.05, DFI : 2; DF2 : 2), the means are statistically indistinguishable.
Comparison of Variances of More than Two Samples A procedure or laboratory with lower precision is discarded in int~r comparison studies. Further, the validity of t and Z tests is based on the assumption of equality of variances of data sets. Bartlett proposed a parameter following x2distribution (with (k-l) degrees of freedom)·for testing the equality of variances (Chart 9.16).
301
UNIVARIATE ANALYSIS
9.5 Comparison of Univariate Data Sets
Chart 9.16: Bartlett Test
= [Xmeanl Xmean2 Xmeank] Var2 Vark] = [Varl NP2 NPk] NP = [NPI Hypothesis: Ho: Varl = Var2 = ..... = Vark
Data Mean Variance
p
p
BARTLEIT = I(n;-I)ln(i) - I(n;-I)ln(sh ;=1
S: : Variance of t
procedure with 11; determination
I : DF
11; -
HA
h
:
One or more ofthe equalities fail
BARTLETT parameter follows a X 2 distribution If BARTLETT < Table, then accept HQ , else accept HA •
HA : Varl
'* Var2,* ....... '* Vark
Example 9.20
The SDs for three simulated data sets are tested for their statistical equality. Since the the table value, Bartlett test infers the homogeneity of all the variances.
X2
= 0.0096615
SD
Variance
0.1000
0.0100
0.1050
0.Q11O
0.1020
0.0104
X2 _table = 0.103; df: 2
x 2'value is less than
varl = var2 = var3 ... is acceptable Since X2 X2 _table
Example 9.22 In the determination of molybdenum in molybdite ore by AAS, the percentage SDs of five groups containing different concentration ranges are 6.82,8.44,4.44, 7.13 and 5.80. The Bartlett parameter 5.03 is lower than the table value at 95% confidence limits. Thus, there is no significant difference between the variances between the groups covering different concentration ranges.
I 9.6 ANALYSIS OF VARIANCE' Analysis of a standard material by a number of analysts in different laboratories does not produce identical numerical values even if the same standard analytical procedure is adopted, due to systematic and random factors (Chart 9.17). Chart 9.17
* * *
*
*
Sample · Sampling · Sample preparation Pretreatment · Separation of some compounds · Conversion of a compound into a derivative Calibration · Concentration range · Instrument · Calibration standards Calculation · Algorithm · Software Report · Traditions/Conventions · Human factor
The variation can some times be ignored by inspecting the numbers. But a foolproof and unbiased approach for accreditation purpose is a sound statistical procedure like Analysis of Variances (ANOV A). In chemical analysis, ANOV A is popular under different heads, viz., one way (ANOV A I), two way (ANOV A II) and multiway (MANOVA) depending upon number of influential factors considered are one, two and many, respectively. A few of the recent applications of ANOY A I in food science, environmental pollution, natural raw materials, chemical industry etc., are given in Chart 9.18.
303
UNIVARIATE ANALYSIS
9.6 Analy.,j,,· of Variance
Chart 9.18
*
Standardization · Titration of NaOH with oxalic acid
*
Quality of a chemical compound · AgN0 3 from different manufacturers
*
Efficiency of catalysts · Yield of an organic compound using catalysts
*
Analysis of a trace metal/drug/food product
* *
Replicate samples by different laboratories with same analytical method Food Science · Aflatoxin M I in milk · Lead in milk powder · Fat in dried meat product · Nitrogen in cereal product · Commercial agar powder · Compounded animal feed stuff · So'ya protein · Whole meal flour · Skimmed milk powder
.'"
· Fish meal
*
Trace metals · Soil samples ( 137Cs, 134CS,40K) · Treated wood (Cu, Cr, As) · Radioactive reference material (Eu)
Features ofANOVA The ANOV A separates variation in the response into explainable factors (11) and random effects (E) using NF
a linear additive model Y
= L J.l i + e .
It indicates the influential factors operating on the system.
i~1
The mean of means (grand mean) of the data matrix and total sum of squares (TSS) of deviations of data points from grand mean are calculated. TSS is an algebraic sum of the sum of squares due to factors (SSFACT) attributable to identifiable source of variation and the sum of squares of residuals (SSR) due to random factors. SSFACT represents the variability between different levels of given factors while SSR is the variability within the factors. The repeatability of unexplained variances (SR2) and those between explainable factors (SL2) are calculable, the sum of which is termed as reproducibility of the total system. The relevant formulae to compute sum of squares due to factors (Chart 9.19) and a FORTRAN program ANOV A.FOR are given.
304
COMPUTER ApPLICATIONS IN CHEMISTRY
9.6 Ana/ysi.. of l'ariallLe
Chart 9.190 Data structure for one way ANOV A YII
Y/ z
YJ3 .... Y/,k
Y2I
Yn
Ya
YNP,/ YNP,z YNP,3
.... YZ,k
....
YNPJ
NP : Number of replicates; .k = Number of factors Factors (treatments) : Laboratories, methods, analysts Model Yij = 11 + 11 j
+ e'l
11 : Overall mean·; I1j: Effect of jth factor e lJ : Random error Y 1J : Normally distributed with mean (11, Ilj) and variance a 2 Result 11 + Ilj is estimated by Xmean for jth factor Chart 9.19b : Formulae of one way ANOVA
Algebraic Notation
Matrix Notation
MSSFACT = SSFACT/(K - I)
MSSFACT = MSSFACT/(k - I)
MERSS = ERSS/(TOTOB - K)
MERSS = ERSS/(TOTOB - K)
GMEAN = TSUMITOTOBS
GMEAN = TSUM/TOTOBS
F = MSSFACTIMERSS
F = MSSFACTIMERSS
NP
J= L~J TSUM = L SUMC j SUMC
1=1
!
(r,c) = size(Y) SUMC = SUM(Y)
K
TSUM = SUM(SUMC)
J=I
TOTOBS
= .tnJ
TOTOBS =r*c
(LLy,jf LLY;J- TOTOBS Ly SSFAC,T = __'_i _ (LL~r
TSS = SUM(SUM(Y,1\2» - TSUM.1\2ITOTOBS
2
r
ERSS=
SSFACT = SUMC.1\2/r - TSUM.1\2ITOTOBS
TOTOBS
LY, Ltl LLY'- 2
IJ
i-I
NP
I
i
J=I
TSS=
I I
ERSS = SUM(SUM(y.1\2» - TSUM.1\2/r
I I
UNIVARIATE ANALYSIS
9.6 Analysis of Variance
* * *
ANOVA.DEM
$DEBUG PARAMETER (MAX=4) DIMENSION X (MAX) ,Y(MAX) DATA X,Y/1.,2.,3.,4.,1.,2.,3.,4.02/ NP = 4 CALL WXY1(NP,X,Y,MAX) CALL ANOVA(X,Y,NP,MAX) END $INCLUDE : 'ANOVA.FOR' $INCLUDE : 'WXY1.FOR'
* * *
10
ANOVA.FOR SUBROUTINE ANOVA(X,Y,N,MAX) REAL X (MAX) ,Y(~~X) ,YCAL(20) ,R(20) ,MRSS,MTSS CHARACTER*l IAO,IAl REAL*4 Tl(10),T2(10) ,CHIL1(10) ,CHIL2(10) REAL SUMX,SUMY,SUMXX,SUMYY,SUMXY,XX,XY,YY,XMEAN,YMEAN REAL AO,A1,CC,THETA,TSS,RSS,SREG,SX2,C1,SD REAL F,T,CCF,CCT,CCC,SI,SY2,C9,FEHR,AOL,AOU,A1L,A1U,TAO,TA1,SYI REAL YCALL,YCALU,SDL,SDU COMMON /Q/A1,SD,CC,CCC,THETA,SI,F,FEHR DATA T1/63.66,9.92,5.84,4.60,4.03,3.71,3.50,3.36,3.25,3.17/ DATA T2/31.82,6.96,4.54,3.75,3.36,3.14,3.00,2.90,2.82,2.76/ DATA CHIL1/7.88,10.6,12.8,14.9,16.7,18.5,20.3,22.0,23.6,25.2/ DATA CHIL2/6.63,9.21,11.3,13.3,15.1,16.8,18.5,20.1,21.7,23.2/ SUMX = O. SUMY = O. SUMXY = O. SUMXX = O. SUMYY = O. DO 10 J=l,N SUMX+X(J) SUMX SUMY+Y(J) SUMY SUMXY+X(J)*Y(J) SUMXY SUMXX+X(J)*X(J) SUMXX SUMYY+Y(J)*Y(J) SUMYY CONTINUE
305
306
COMPUTER ApPLICATIONS IN CHEMISTRY
xx XY
9.6 Anal>."" ofVariante
SUMXX-SUMX**2/N SUMXY-SUMX*SUMY/N
YY SUMYY-SUMY**2/N YMEAN SUMY/N XMEAN SUMX/N C .... ANOVA TSS 0.0 RSS 0.0 SREG 0.0 SX2 0.0 SD 0.0 DO 20 I=LN YCAL(I) AO+Al*X(I) R(I) YCAL (I) - Y (1) TSS TSS+(Y(I)-YMEAN)**2 RSS+(Y(I)-YCAL(I))**2 RSS SREG SREG+(YCAL(I)-YMEAN) **2 SX2 SX2+(X(I)-XMEAN)**2 SD+R(I)*R(I) SD 20 CONTINUE NMl N-l NM2 = N-2 Cl RSS/TSS SD SQRT(SD/(N-l)) MTSS = TSSI (N-l) MRSS = RSSI (N-2) F MTSS/MRSS IF (F.LT.l.) F=l./F T = SQRT(F) WRITE (* ,951) WRITE(*, 952)NM1,MTSS,F,T,SREG,NM2 J MRSS 24 CONTINUE RETURN 951 FORMAT (' ANOVA I' I) 952 FORMAT(75(lH.)I' SUM OF SQUARES',5X, 'DEGREES OF FREEDOM' * 5X, 'MEAN SUM OF SQUARES',5X,lHF,9X,lHT,4X/78(lH.)/' CORRE' * 'CTED FOR' ,lOX, 'N-l=',I3,T40,E15.4,T62,2E8.3/3X, 'MEAN' * '(MSS) 'II' DUE TO FACTOR',lOX, 'P-l = l',T40,E15.41 * 5X,' (SREG) 'II' RESIDUAL (RSS) ',9X, 'N-P =',I3,T40,E15.4
*
/78(lH.)111)
END
9.6 Anal) \i.~ oj
~'ariam.. e
307
UNIVARIATE ANALYSIS
Example 9.23 The output of ANOV A program for a simulated data set is given in Table 9.7 Table 9.7: One Way ANOVA for Simulated Data
X(1)= 1.000 ; Y( 1)= 1.000 ; X(2)= 2.000 ; Y(2)= 2.000 X(3)= 3.000 ; Y(3)= 3.000 ; X(4)= 4.000; Y(4)= 4.020
SUM OF SQUARES
DEGREES OF FREEDOM
MEAN SUM OF SQUARES
CORRECTED FOR MEAN(MSS)
N-I = 3
.1687E+Ol
DUE TO FACTOR (SREG)
P-I = 1
.2510E+02
RESIDUAL (RSS)
N-P=2
.1508E+02
F
T
.894E+OI .299E+01
................................................................................................................................................................. Example 9.24 The data for estimation of the concentration of sodium hydroxide by titrating standard oxalic acid with phenolphthalein indicator is analyzed by one-way ANOV A. The data (Table 9.8) consists of three samples each in duplicate and F-test shows the concentration is reliable as the error due to random fluctuations is insignificant. Table 9.80 : Titration Data for Standardization of Sodium Hydroxide (TITRANT) with Oxalic Acid (TITRAND)
K
L
Cone. of'
Volume in ml
std. soln.
Cone. of' NaOH
Titrant
Titrand
Initial Volume
Final Volume
.lO1594
.390746
1.820
7.000
.000
1.820
1
1
1
2
.386499
1.840
7.000
.000
1.840
2
1
.388986
2.220
8.500
.000
2.220
2
2
.388986
2.220
8.500
.000
2.220
3
I
.390746
2.600
JO.oon
.000
2.600
3
2
.390746
2.600
JO.oon
.000
2.600
Conc.: Concentration; std. soln.: standard solution No. of Samples (k) : 3; No. of observations in the kth sample: 6.
308
COMPUTER ApPLICATIONS IN CHEMISTRY
9.6 Allalysi., of Variance
Table 9.8b: One Way ANOVA for Standardization of Alkali Source of Variation Between means (Treatment) SST Within Samples (ERROR) SSE Total TSS
Sum of Squares
DF
Mean Sum of Squares
F
.5186e-05
2
.2593e-05
.8586e+OO
.9060e-05
3
.3020e-05
. 1425e-04
5
F is less than F_tabJe(5,1 ,0.05) = 6.61. Therefore, error within the samples is negligible.
The heuristics for the magnitude of SR2 and SL2 are given in Chart 9.20. ANOV A is insensitive to moderate deviations from the model assumptions. However, the results are in error if the samples are interdependent. Thus it should not be taken for granted to apply ANOV A for any type of problem. Chart 9.20 : Heuristics for SR2, SL2 If
Method of analysis is not specified and interlaboratory comparison is attempted
Then SR2 and SL2 are of high magnitude. '.
If
Method of analysis is specified and interlaboratory comparison is adopted
Then SR2 and SL2 are of low magnitude.
Fixed and Random Effect Models In the interlaboratory comparison, two ANOV A models are in vogue, viz., fixed and random effect models. When a few accredited laboratories are chosen, the mean values of analysis are almost same and thus the grand mean is considered as fixed and the corresponding ANOV A is called fixed effect ANOV A. Comparison of effect of two or more factors is possible if a factor has fixed effect and one-way ANOV A is performed. On the other hand, when a large number of laboratories, procedures or analysts is considered for accreditation purpose the mean of each set is a random variable and is referred as random effect ANOV A model. The assumptions of these models and KB for validation of results are given in Charts 9.21 and 9.22.
Chort9.21
* Influence of a single factor on a single response for a sample * Each column (k) of data is from normal distribution * All observations are uncorrelated * Random errors anI mutually independent and normally distributed
* Random samples are selected from k factors.
9.11 AI/a/pis of Variance
UNIVARIATE ANALYSIS
Chart 9.22: KB for ANOVA
If Then Else
Ratio of Mean sum of squares to residual sum of squares (F) > Table value Reject Ho [effect of the factor is significant] Effect of factor is insignificant.
If
Variances are heterogeneous & ANOVA performed Results are perturbed.
Then If
Then
NSAMP > 20 & Fixed one way ANOV A performed & Normality of data is not satisfied Results are not perturbed
If Then
Fixed effect model and SSFACT/SST is low Model is invalid & At least one influential uncontrollable factor is not randomized during the course of experiments.
If
Then If
Then
NSAMP> 20 & Random one way ANOV A performed & Normality of data' is not s!ltisfied Results are in large error Random effect model of one way ANOV A performed & Assumptions are not satisfied Results are in error [Remedy: Transform data to adhere to the assumptions].
309
.
10.1 COVARIANCE, CORRELATION COEFFICIENT The effect of concentration of the analyte or pH on the absorbance of a coloured compound is monitored at the wavelength corresponding to a maximum absorbance in the visible spectrum. The concentration and the ' corresponding abs9rbance vectors form a bivariate data set. The change in absorbance is explained by the variation of cOl).centration by Beer's law. The concentration vector (CON) is, thus, an explanatory factor and the absorbance corresponds to response (RESP). They are also called independent and dependent . variables, respectively. . The variations in response and concentration are expressed as variance in statistical literature. The variation of one variable with the other is called covariance. It is the expectancy that the measured data (CON, RESP) deviate from the respective means. Thus the covariance indicates the relation between the two variables, in particular linear dependence. A few bivariate variables in different fields of chemical in~estigation are giveli in Chart to.l. The equations and source code in FORTRAN are given. Chart 101
Explainable Factor (X)
Response (Y)
Comment
Concentration
Absorbance
Both are primary data
Time
log(a - x)
Y is log transformed
cr
log k
log k is estimated ii'om primary data
Iff
logk
The range of T is generally small
•
The range of covariance is -oc to -toe. It depends on the measurement scale. For example, the covariances are very different when the concentration of analyte is expressed in flg or mg ..J'he electronic (0) and steric factors (E) for the compounds given in Table 10.1 are independent and covariance has a very low magnitude. Thus if sources of the two variables are different (independent), the covariance between them is zero. However, it does not imply ·that the two variables are independent, if covariance is zero.
* * *
311
BIVARIATE ANALYSIS
10,1 COI'ariance, Correlatioll Coefficiem
COV.FOR FUNCTION COV(X,Y,N,MAX) REAL MEAN1 DIMENSION X(MAX),Y(MAX) XMEAN = MEAN 1 (X,N,MAX) YMEAN = MEAN 1 (Y,N,MAX) cov = O. DO 10 1= l,N COV = COV + (X(I) - XMEAN) * (Y(IJ - YMEAN) CONTINUE COV = COV/REAL(N-1) WRITE(*,951)COV FORMAT (5X, 'COVARIANCE: 'G10.4) RETURN END
10
951
Table 10.1: Covariance between Electronic and Steric Factors
Covariance matrix
cr cr E
E
0.1878 -0.0387 -0.0387
0.7029
Correlation matrix
cr cr E
E
1.0000 -0.1065
-0.1065
1.0000
I
cr
E
1.1
0.24
1.05 1 0.85 0.6 0.52 0.49 0.41 0.405 0.385 0.36 0.215 0.11 0.08 0 -0.1 -0.115 -0.125 -0.13 -0.165 -0.19 -0.2 -0.3
0.24 0.27 0.37 2.55 0.19 -1.24 1.89 1.7.6 0.9 . 1.63 0.38 1.19 0 0 0.07 0.36 0.93 0.39 1.74 0.47 0.51 1.54
312
COMPUTER ApPLICATIONS IN CHEMISTRY
10.1 Covariante, Correlation Coefficient
Correlation Coefficient Correlation coefficient (CC) was introduced to obviate the limitation of scale dependency of covariance. It is the ratio of covariance to the square root of product of variances of the two variables. The correlation coefficient between X and Y is the projection of data onto a plane parallel to XY. The computation of covariance and correlation coefficient is described in Chart 10.2. Chart 10.2
cov(X,
y) =
I
(Xi -xmeaJ*(y, - Ymean)
N-J
i=1
cov(X,y)
r = -,=======-..:...::...'------
~
L
(Xi - Xmean
N-l
l *L
(Yi - Ymeallr
N-l
X(l)= 1.000 Y(l) = 2.000 Y(2)= 4.000 X(3)= 3.000 Y(3)= 6.000 COVARIANCE 2.000 CORRELATION COEFFICIENT (r) X(l)= -1.000 Y(l)= -1.000 Y(2) = .0000 X(3)= 1.000 Y(3)= 1. 000 COVARIANCE : 1. 000 CORRELATION COEFFICIENT (r)
X(2)= X( 1. 000
X(2)= X( 1.000
*
*
2.000
CC.FOR
* FUNCTION CC(X,Y,NP,MAX) DIMENSION X (MAX) ,Y(MAX) SDX = SD(X,NP,MAX) SDY = SD(Y,NP,MAX) COVXY = COV(X,Y,NP,MAX) CC = COVXY/(SDX*SDY) IF (NP .EQ .2)THEN WRITE(*,*) 'NP:2' WRITE(*,951)CC ENDIF IF (NP .LT. 10 .AND. NP .NE. 2)THEN CCC = 1. - (NP-1)/(NP-2)*CC WRITE(*,951)CC, ENDIF
.0000
10. J Cmariance, Correlarioll Coefficient
951 952 953
IF (ABS(CC) .NE.1)THEN IF (NP .LE. 30)THEN T = CC * SQRT( (NP-2)/(1.-CC*CC)) WRITE(*,952)T ENDIF .IF (NP .GT.30)THEN Z = 0.5 * ALOG((l+CC)/(l-CC)) WRITE(*,953)Z ENDIF ELSE ENDIF FORMAT(5X~ 'CORRLATION COEFFICIENT (r) FORMAT (5X, ' t value: ',G12.4) FORMAT (5X, ' z value: ',G12.4) END
* *
313
BIVARIATE ANALYSIS
:
'G10.4!)
CC.DEM
* $DEBUG PARAMETER (MAX=200) DIMENSION X(MAX),Y(MAX) CALL IXY1(N,X,Y,MAX) CALL WXY1(N,X,Y,MAX) = CC(X,Y,N,MAX) R END $INCLUDE 'IXY1.FOR' 'WXY1.FOR' $INCLUDE $ INCLUDE 'COV.FOR' $ INCLUDE 'CC.FOR' $INCLUDE 'SD.FOR' 'MEAN1.FOR' $INCLUDE $INCLUDE 'SUH1.FOR'
Properties of Correlation Coefficient Correlation coefficient is a scale independent scalar quantity. It is calculated based on the assumption that the two vectors x and yare random variables. The range of correlation coefficient is -I to + I through zero and is expressed by Schwartz inequality -I < CC < 1. It indicates the linear dependency between the two data sets.
314
COMPUTER ApPLICATIONS IN CHEMISTRY
10.1 COI'adance, Correlation Coefficient
Hypothesis Testing of Correlation Coefficient Statistical hypothesis testing establishes whether CC is significantly different from zero or not. The null and alternate hypotheses in this connection are Ho: CC = 0 and HA : CC;tO, respectively. When the number of data points is greater than 30, 0.5*ln(l + r)/( 1 - r) follows a Z distribution. For a small sample set (NP < 30), r* (NP - 2)/(1 - r) adheres to a t-statistic with NP - 2 degrees of freedom. In kinetic and equilibrium studies the number of data points is less than 10 and the t-statistic was found to be inadequate. Exner proposed corrected correlation coefficient (CCC) given by CCC = I - [(NP - I)/(NP - 2)]
* Cc.
The correlation coefficients (Fig. 10.1 a & 1O.lb) for data sets I and 2 are +1 and -I, respectively. When the absolute value of correlation coefficient is one all the points lie on a straight line indicating that data set is error free. In real experimental data, magnitude of CC decreases as the random error increases in either or both of the variables. In Fig. 10.1 c, the CC is 0.97 because few points lie slightly off the line. Although a linear relationship exists the low value (0.71) is due to significant scatter around the line (Fig. 10.1 d). A value of 0.26 for CC indicates that prediction of Y frolll knowledge of X from linear model is impossible (Fig. 10. Ie). The correlation coefficient is zero for the data set ioiiowing the functional relationship y =
± ~9-x2 (Fig. 10.2a). CC changes with the range of X for the data following the non-linear relation,
y = cos(tan (x» (Fig. 1O.2b).
Example 10.1 The physicochemical parameters of substituted pyridine carboxylic acids (Table 10.2) show that 7r is uncorrelated with (J" and R as the CC is less than 0.1. However, (J" is highly correlated with R (cc = 0.78). Tobie 10.20 : Physico Chemical Parameters
Hydrophobic parameter
Molar' refractivity (MR)
(n)
Substituent constant
Electronic parameter
(0")
(F)
Electronic parameter (R)
0.42
1.081
-0.34
0.22
-0.64
1.17
1.688
0.04
0.03
0.01
-1.48
-0.211
0.73
0.65
0.15
() 617
-0.27
0.26
-0.51
0.14
0.016
0.06
0.43
0.34
-0.26
0.726
0.78
0.67
0.16
-1.23
0.369
-0.66
0.02
-0.68
-0.98
1.332
0.0
0.28
-0.26
0.2
1.296
-0.83
0.1
-0.92
1.09
2.224
-0.9
0.01
-0.91
0.5
0.464
-0.07
-0.04
-0.13
-0.08
I
-
: i
I
;3
~
.;
Scatter diagram
Matrix
~
Linear Correlation
Data Form
~'
"":;
Least Squares fit
CC
~
[i n
l.00
8
~
g
8
61
6
6
4
4
:2
2 2
3
~
'"::;
". ~.
2
4
3
4 tll
--I
m
-4
-4
-6
-6
-8
-8 2
08]
» :0
-2
3
»
z ». r -< (J) iii
4
2
3
4
2
3
4
0.97 4
4
2
2
0
0
3.5 4.0
0 2
3
4
...., .....
lJ\
X 10- 10
0.71
2
o
o
~o
2
o
0
-2 (~
-2
o
-1
o
-1
X 10-10
l,le_lO
2, Ie-1O [
1
1
X 10- 10
()
o
r-~------r-----~----~--,
:s:: '0
0.26
c
-i
o
m
:0
» '0
3,1.00001e-IO
a
4, Ie-1O
o
o
3
o
~ z (fJ (5
1 ~~______~____~____~__~ 2
'0
r
4
2
3
4
Z () I
m
:s::
0.00
o
o
-1
-1
-1
o
-
.
3
~
2
0.5 0
a------~------~·o 0
-0.5 2
3
I(
2
3
x
Fig. 10.4 : Least Squares Fit and Residuals of a Simulated Data Set Example 10.3 The data set has two duplicate points and the points are distributed on either side of the least squares line (Fig. 10.5). The absolute value of residuals is around 0.5. If the rewoducibility of the data is more than 0.5, then the data adheres to the model
X(l)= .0000; Y(l)= 1.000; X(2)= .0000; Y(2)= 2.000 X(3)= 1.000; Y(3)= .0000; X(4)= 2.000; Y(4)= .0000
NP 4
CEPT
SLOPE -0.8182
1..364
y = 1.364 - 0.8182 * x
C
-0.8182
THETA .0000
323
BIVARIATE ANALYSIS
J0 J Covarl£mce, Correlation Coefficient
2.5 ....----I------t------I---,
2
0 0.5
1.5
o:i
~
~
0.5
0 -0.5
o -O.5~~
___
~
___
o
~~
-1
2
2
0 X
X
Fig. 10.5 : Least Squares Fit and Residuals of a Data Set with Noise
Residuals in Response The calculated response (YeAL,) at each experimental point is computed from the estimated least squares parameters (ao, a,) as YeAL, = ao+ a,
* X,
The residuals, RES" are the differences between measured responses (Y;) and that calculated (yeAL.) from the model. They indicate the unexplained variations in the response by the regression equation. The residuals are of paramount importance in concluding whether the data fit into the proposed model. The statistical tests for the analysis of these residuals to validate the model are discussed under the head residual analysis (section 10.4).
* * *
RESID.FOR
SUBROUTINE RESID(X,Y,NP,AO,Al,MAX) REAL X (MAX) ,Y(MAX) ,YCAL(500) ,RES(500) DO 20 1=l,NP YCAL (I)
AO+Al*X (I)
RES (I)
YCAL ( I) - Y ( I )
RSS
RSS+RES(I)**2
WRITE{*,951)I,X(I) ,Y(I) ,YCAL(I) ,RES(I) FORMAT(lX,I3,2F8.2,FI0.4,E15.4)
951 20
CONTINUE SD = SQRT(RSS/(NP-l)) WRITE(*,*)SD RETURN END
324
COMPUTER ApPLICATIONS IN CHEMISTRY
* * *
/0.1 Covariance, Correlation CoeftiLlcll(
RESID.DEM PARAMETER (MAX=4) DIMENSION X(MAX) ,Y(MAX) DATA X,Y/l.,2.,3.,4.,l.,2.,3.,4.004/ NP = 4
CALL LLS1(X,Y,NP,Al,AO,CC,MAX) WRITE(*,*)AO,Al,CC CALL RESID(X,Y, NP,AO,Al,MAX) END $INCLUDE: '\F77\FOR\LLS1. FOR'
Standard Deviation of Slope and Intercept In the case of univariate data, SD throws light on spread of replicate measurements. It is useful to calculate the confidence interval of the central tendency parameter namely mean. Similarly, for bivariate data following a linear relationship the SD in regression parameters are given in Chart 10.5. They are used to validate the model, to calculate the correlation between the parameters and their confidence intervals.
ChortlO.S
NPAR corresponds to the number of estimated parameters ao and a I
* * *
sda.for SUBROUTINE SDA(X,Y,N,MAX) REAL X(MAX) ,Y(MAX),MEANl REAL YCAL(500) REAL*4 Tl(lO) ,T2(lO) ,CHIL1(lO),CHIL2(lO) DATA Tl/63.66,9.92,5.84,4.60,4.03,3.71,3.50,3.36,3.25,3.17/ DATA T2/31.82,6.96,4.54,3.75,3.36,3.14,3.00,2.90,2.82,2.76/ DATA CHILl/7.88,lO.6,12.8,14.9,16.7,18.5,20.3,22.0,23.6,25.2/ DATA CHIL2/6.63,9.21,11.3,13.3,15.1,16.8,18.5,20.1,21.7,23.2/ CALL LLSl(X,Y,N,Al,AO,CC,MAX)
BIVARIATE ANALYSIS "
10 J Coraria1lce. Correlation CocJthient
YMEAN XMEAN
MEAN1(X,N,MAX) MEAN1(X,N,MAX)
SX2
0.0
SY2
0.0 1,N AO+Al*X(I)
DO 20 I YCAL (I)'
20'
SX2
SX2+(X(I)-XMEAN)**2
SY2
SY2+(Y(I)-YCAL(I))**2
CONTINUE SY2 = SY2/(N-2)
C. ,.
STANDARD DEVIATION
IN AO,Al
SDAl
(SY2/SX2)**0.5
C9
SY2*(1.0/N+XMEAN**2/SX2)
SDAO SQRT (C9 ) WRITE(*,*) 'SDAO,SDA1' WRITE(*,*)SDAO,SDAl C...
RANGE OF AO,Al AOL. = AO-T2(N-2)*SDAO A&u AO+T2 (N-2) *SDAO AlL AI-T2 (N-2) *SDAl Al+T2(N-2)*SDAl AIU TAO (AO-AOE)/SDAO N2 N/2 lAO 'N' IAl 'N' DO AO,Al SIGNIFICANTLY DIFFERENT FROM AOE,AIE IF(TAO.LT.T2(N2)) IAO='Y'
C. . .
TA1=(AI-AIE)/SDAl IF(TA1.LT.T2(N2)) IA1='Y' 23
WRITE(*, 957) WRITE(*, 958)AO,SDAO,AOL,AOU,TAO,IAO,Al,SDA1,AlL,AlU, *
957
TA1,IA1,SD,SDL,SDU FORMAT(65(lH-)/T8, 'VALUE',T19,2HSD,T30,lHL,T40,lHU,T50,
* 958 *
IHT,T59,2HAO/65(lH-)) FORMAT(2X,2HAO,lX,5FIO.4,T60,Al/2X,2HA1,lX,5FlO.4,T60,A 11 2X,2HSD,lX,FIO.4,10X,2FIO.4,/65(lH-)///) RETURN END
325
326
COMPUTER ApP~ICATIONS IN CHEMISTRY
10.1 Covariance. Correlatzon Coefficient
Statistical Significance of Slope and'Intercept In the case of calibration of a coloured compound by Beer's law the straight line passes through the origin when the blank solution has no absorbance or the absorbances are measured against blank. However, the intercept will not be 0.0 but Qf small magnitude. In order to statistically establish that the intercept is not different from zero, point hypothesis testing is used. The expected slope of the Hammett equation for log k versus substituent constant is one. A large deviation is explained in terms of other effects. In this case, a regression parameter is to be tested against a fixed value. Further the regression model is valid only when the parameters are different from zero. The testing of null hypothesis for the significance of slope and intercept of the straight line are performed parameter wise (Chart 10.6). Chart 10.6
aoexpect
Ho.-ao : ao HA-ao ao
* ao expect
Ho.-al al HA-ao : al
* al expect
=
al expect
(ao - ao expect) I (SOao) or (al - al expect)/(SOa/) follows a t-distribution with (NP-2) degrees offreedom Example 10.4 The slope, intercept and their standard deviations for a data set with 18 points are t> t-table Value SD t t-table (ex., DF) 1.415 0.218 False 6.47 1.75 ao False 0.6987 0.0089 7.78 1.75 al OF = NP- 2 = 16, a = 0.05 t-value calculated for the data set is greater than table value at 95% confidence level. The last column infers the statistical validity of the null hypothesis. The regression coefficients are statistically different from zero.
Standardized Regression Coefficient The magnitudes of regression parameters are scale dependent. Thus their numerical magnitudes do not reflect the relative importance of the explainable factors. The standardized regression coefficients are scale independent and are useful to interpret the relative importance of slope and intercept in explaining the total variation in y. sdao = ao *sdao I sdy sda,
= al *sdal I sdy
Correlation Coefficient Between Slope and Intercept In linear regression, the slope and intercept are simultaneously estimated. So there may be several combinations of numerical values satisfying least squares criterion. The information regarding correlation coefficient between slope and intercept is given as H = inv(XT*X) * X T
hNPARxl = diag{inv(XT*X)T} r _par=
(.Jh{ *inv(XT*Xl* (~{
BIVARIATE ANALYSIS
10.1 Co.-aria1/( e, Carre/aru", Coefficient
327
Confidence Intervals of Slope and Intercept The confidence interval (LU) for the intercept and slope can be calculated separately only when the CC between aD and a, is negligible. The Z or t statistic is used depending upon the number of points (Chart 10.7). However, Joint Parameter Uncertainty Intervals (JPUI) are calculated when the absolute CC (aQ, a,) is significant. Chart 10.7 : Confidence Interval of Slope and Intercept
If
NP>30
Then LU_a~ LU_a, IF
NP
ao
± Z (CL) * SD ao
= a,
± Z (CL) * SD a,
=
Pis
I
1 aD k=-*logl t (ao-x)
1
J
aD : concentration of reactant at t = 0
x
: concentration of product at time t
From a paired data set of x vs t, the rate constant is calculated from the slope of the linear least squares analysis of equation log (ao- x) = log (ao) - k
*t
where Y is log (ao - x) and explanatory variable is t. The objective is to estimate a Best Linear Unbiased Estimator (BLUE) of rate constant (KINET2.FOR) which has chemical significance. Here, the SD of the regression parameters should be as minimum as possible. The concentration of the reactant is followed up to hundred minutes in a kinetic study.
328
COMPUTER ApPLICATIONS IN CHEMISTRY
* * *
/0. J Comriallte, Correlatioll Coeffidelll
KINET2.FOR DIMENSION T(100) ,X(100) MAX = 100 WRITE(*,901) WRITE(*,902) READ(*,*)NP READ(*,*)AO DO 10 I =.l,NP WRITE(*,903)I,I READ (*.*) t, CONC T(I)
=
t
X(I) = ALOG (CONC)/ALOG(10) 10 CONTINUE CALL LLS1(T,X,N,SLOPE,CEPT,CC,MAX) WRITE(*,*)SLOP£,CEPT,CC 901 FORMAT (' GIVE NO OF POINTS: '\) FORMAT (' GIVE INITIAL CONC OF REACTANT 902 FORMAT (' TIME ( '12, ' ) ,X ( , ,12, ') : '\) 903 END INCLUDE: 'LLS1.FOR'
'\ )
Example 10.5 The primary data, log (ao - x) and its residuals are given in Table 10.4. For the least squares line, the residuals (Fig. 10.6) are less than I %, indicating a good tit of the data into a first order kinetic model. The SD in k is also less than I % indicating it to be reliable. This model is not used to predict the value of concentration of the product or reactant at any specified time. So the aim is not to fit all the data points into the curve. The correlation coefficient and angle between time and)O vectors are -0.99998 and 0.0, indicating a good linear trend. Table 10.4: Kinetic Data and Residuals
Time
Response
Y
(ao-x)
[Log (ao-x)] 0.2185
2.6400
. 1.5240
6.1800
0.6000
1.6540
Residuals in Y -0.0023
0.1830
-0.0017
1.3200
0.1206
-0.0013
13.9200
0.9660
-0.0150
0.0002
19.3800
0.7740
-0.1113
0.0007
26.9400
0.5700
-0.2441
0.0018
37.6800
0.3680
-0.4342
0.0021
84.1200
0.0560
-1.2518
0.0074
107.3400
0.0210
-1.6778
-0.0071
329
BIVARIATE ANALYSIS
10. J Covariance. Correlatiol/ Coeffidellt
Model = 0.2314 - 0.0177 * x (8.0243e - 006) (1.6472e·- 007) sdy = 0.0041972
y
0.01
Kinetics
0.5
0.005
0
'" 0;
-0.5
:::1
0
:9
>.
~
-1
-0.005
-1.5 -0.01
-2
0
50
0
100
150
50
100
150
x
Time
Fig. 10.6: Least Squares Fit of Kinetic Data Set and its Residuals
Calibration Model using Beer's Law The change in absorbance of a coloured species (exhibiting a maximum in the visible spectrum) with concentration of the analyte adheres to Beer's law ABSORB = blank + (E
* path length) * CONe.
The calibration involves measurement of the absorbance of a series of solutions for different concentrations of the' analyte at a wavelength corresponding to the maximum in the spectrum. When the concentration is expressed in molar scale, the slope is equal to extinction coefficient of the analyte at A. max • But in practice, calibration equations are developed in mg, Ilg or ng scales depending upon the extinction coefficient of coloured species. The intercept corresponding to the blank should be \ero. Even when it is of a small magnitude, the intercept is of no significance. The confidence interval in the range of concentrations used for calibration is of importance.
* * *
BEER. FOR DIMENSION CONC (100 )', ABSORB (100) MAX = 100 WRITE(*,901) READ(*,*)NP DO 10 I = l,NP WRITE(*,902)I,I READ(*,*)CONC(I),ABSORB(I)
330
COMPUTER ApPLICATIONS IN CHEMISTRY
/0./ Comriance. Correiatioll'Coejficielll
10
CONTINUE
901 902
CALL LLS1(CONC,ABSORB,NP,SLOPE,CEPT,CC,MAX) WRITE(*,*)SLOPE,CEPT,CC FORMAT(' GIVE NO OF POINTS: '\) FORMAT (' CONC ( '12, , ) , ABSORB ( , ,12, , ) '\ )
END $INCLUDE: 'LLS1.FOR' Example 10.6
The absorbances at "'max of the spectrum of a coloured species and corresponding concentration data are tested for the adherence to Beer's law. The residuals are very low confirming the model. (Fig. 10.7) Table 10.5: Data for Beer's Law and Residuals
Concentration
LLS Residuals
Absorbance
2.8000
0.1530
--0.0139
4.0000
0.2160
0.0070
4.8000
0.2310
-0.0060
5.6000
0.2970
0.0320
7.6000
0.3120
--0.0231
14.8000
0.6110
0.0236
16.8000
0.6370
-0.0205
17.6000
0.6690
--0.0165
18.4000
0.7310
0.0175
0.035038 = 0.068836 * x (0.00031523) (2.637e - 005) sdy = 0.022004 cc = 0.99593, angx (x, y) = 0
y
Beers Law
0.04
0.8 0
..,u
0.02
0.6
~
'""
:9
c:
-e'"0
~
~
0.4
0 -0.02
0.2 -0.04 0
0 0
5
10 Concentration
15
20
5
10 Concentration
Fig. 10.7 : Least Squares Fit of Spectrophotometric Data
15
20
10, I
CmarIlllltC,
331
BIVARIATE ANALYSIS
Correlation Coeffident
Hammett Equation The variation of log k or log K with substituent constant for a series of homologous compounds follows a linear model log k = log ko + P * 0" Example 10.7 The variation of logarithm of rate constant with substituent parameter is titted into the linear model (Table I 0.6). The graphical representation of least squares analysis is depicted in Fig. 10.8. Table 10.6: Hammett Equation Data 0"
log k (y)
(x)
Residual
-2.7e-OOI
2.241
-0.00069203
-J.7e-OOI
2.398
-0.0030394
2.3e-001
3.045
0.006571
7.8e-001
3.912
-0.0028396
+
2.6719
y
( 1.6122e -005) sdy
=
0.0055208
1.5935
*
cc(x,y) = 0.99998
angx(x,y)
Hammett equation
X
=0
10-3
10 ~------~------~------,
4
3.5 >.
x
(3.691 Ie -005)
5
3
o 2.5 -5
2
-0.5
0.5
0
L -______+-______+-____
-0.5
x
o
~
0.5 x
Fig. 10.8: Hammett Model for Variation of Rate Data.
Regression Through Origin In Beer's law the absorbance versus concentration plot passes through the origin. The statistical model is ABSORB = al
* CONC + E
and conforms to the chemical laws. It is essential to establish that the intercept is statistically insignificant before estimating the slope of the model (1). It results in biased values of slope, standard errors and erroneous confidence intervals.
332
COMPUTER ApPLICATIONS IN CHEMISTRY
10.2 Polynomial RegreHion
* X, . If the data is fitted into a two-parameter linear model, * Xi refers as over ambitious model although SD lj= aO+al * Xi and that used for fitting the data is Yi=al * X, it is
Consider the case where true model is Y; =al
part of the random errors is also fitted. Then lj= ao+al
in Y decreases. If the true model is called a constrained model. In other words one of the regression coefficients (ao ) is assumed to be zero or the straight line is forced to pass through the origin.
Least Squares Solution of Slope and Intercept in Matrix Notation The estimation of slope and intercept (par) in matrix notation is described in Chart 10.8.
ChartlO.S
Since X is a rectangular matrix, it is pre-multiplied by XT rendering the product T (X * X) a square matrix. XT * Y = (XT * X) * par Multiplying both sides by (XT * X)-I (XT * X)-I * (XT * y) = (XT * X)-I par = (XT
* (XT * X) * par = [ * par = par
* X)-1 * (XT * y)
It is the least squares solution of the model y = X * par + Ey where Ey is the vector of random errors in y. This is however, not the derivation of least squares in the matrix notation.
110.2 POLYNOMIAL REGRESSION' Polynomial Models in One Explanatory Variable A quadratic model is invoked when the residuals in y for a linear model show a trend and their magnitudes are far greater than the accuracy of the measurement and reproducibility of the data. If the residuals are still not acceptable, the third and higher order polynomials are used until the distribution of the residuals is random and are of comparable magnitude with data accuracy and precision. Such non-linear trends are common in calibration, variation of chemical parameters with dielectric constant, ionic strength or temperature. However, the least squares solution becomes unstable with the order of the polynomial as the values of X, X2, X 3 etc. are interdependent.
Example 10.8 For a weakly quadratic model Y= 0 + X + 0.2* X2
333
BIVARIATE ANALYSIS
10.2 Polynomial Regre.HicJII
2
if the x range is 0 to 0.9, x and x are highly correlated (0.97) and the angle between the vectors is very low (14.4°). A linear fit gives very low residuals of the order ± 0.025, but a perusal of the residual plot shows a quadratic trend. When a quadratic model is tried, the residuals are of the order of 10-15 • This is an adequate model (Fig. 10.9a-d). When the x range is from -3 to +3, the correlation between x and x 2 is 0.0 and the angle between the column vectors is 90°. Thus, the two variables arc independent and appropriate for polynomial regression. A quadratic tit is proposed since a linear fit is inadequate (Fig. 1O.ge-h). Table 10.7. : Y= 0 + X + 0.2* X'
Range [0 to 0.9] Par (SdPar)
Linear -0.0255
ao
( 8.80e-005) 1.18
aj
(0.00016)
Range [-3 to 3]
Quadratic 2.7756e-017 (3.81e-031)
Linear 0.7 (0.12 )
.
1
I
( 1.96c-030)
(0.067)
0.014124
1 (1.16e-031 )
(7.05-032)
(2. IOe-030) Sdy
' 1.1 102e-0 15 (3.2ge-031 )
0.2
0.::
az
Quadratic
7.8328e-016
0.67454
8.88e-016
Example 10.9 The results for a strongly quadratic model Y= 0 + 0.1 *X + 0.9* X2 are given in Table 10.8. The y is poorly correlated with x (0.07) while highly correlated with x 2 (1.00), which is reflected in the angle between the vectors. Table 10.8: Y= 0 + O.l*X + 0.9* X'
Range [0 to 0.9] (Fig. to.10a-d) Par (Sd Par) ao
-D. 11475 (0.0017)
aj
Model 2
Modell
1.6653e-016 (2.58e--032)
Range [ -3 to 3} (Fig. 10.10e-h) Model 1 3.15 (2.55)
1.11 02e-0 16 7.28e-D31
0.91
0.1
0.1
0.1
(0.0033)
( 1.3 2e-D31)
(1.36)
(2.58e-D31) 0.9
0.9
a2
(1.5584e--031 )
(1.42e--031) Sdy
Model 2
0.06356
2.03e-D 16
3.0354
1.32e-D15
334
COMPUTER ApPLICATIONS IN CHEMISTRY
Linear 1.5 r - - - - - - - - - - - - - . [a]
10.2 Polynomial Regression
Quadratic 1.5 [b]
0.5 0.5
0 ~.5
0 0
0.5
0.5
0
x 10-15
0.03
4 [d]
[c]
0.02 2
O.oJ
18
0 -0.01
2
0
4
-0.02 0
5
20
15
Linear
6
0
5
10
[e]
20
2
4
Quadratic
6 [f]
0
4
15
4
2 2 0 0
-2
-4 -4
L5
-2
0
2
4
-2 -4
-2
0
x 10-15 2 [h]
[g)
0.5 0 0 -0.5
-2
-I
0
5
10
15
0
5
10
Fig. 10.9 : linear and Quadratic Fit of Weakly Quadratic Dala
15
102 Polyn0J1llai Regre~~jon
335
BIVARIATE ANALYSIS
Linear
Quadratic
1 [a]
[b) 0.8
0.5 0.6
04 0.2
-0.5
' - - - - - - o f - - -_ _ _...J
o
Ost~~~-4--------~
0.5
o
X
015~~~--~-~--,
0.5
10-1.5
4r--~--~----~1~6~18~
[c)
0.1
0.05
-0.05 -O.1~_~--~~--4_---...J
o
15
10
5
20
Linear
Quadratic
'10 ...-----------------:----,
[e)
o
o
8
6 4
8
6
o
o
2
4
o
o
o
0
2
o
o
OL..-_ _I-__'o,j.-t'J-Y--_+----'
-2
-4
o
'10 r-:-:[f]~---------'
2
4
o
L..-_ _---1I---"~"'"---_+---....I
X
6...-~~-------,
:r
-2
-4
o
2
4
10-15
4r-~~---------~
[h)
-2
-4~
o
____+-____-+_____...J 15 5 10
~'-----~-----~---~ o 15 5 10
Fig. 10.10: linear and quadratic fit of strongly quadratic data
336
COMPUTER ApPLICATIONS IN CHEMISTRY
/0.2 Polynomial Regre.,,;oll
Example 10.10 The per cent control of grasses after use of herbicides like substituted styrene derivatives (Table 10.9) is analyzed by polynomial models. From the -magnitudes and trends in the residuals (Fig. 10.11), a quadratic model was found to be adequate. Models with only linear, quadratic, cubic or quartic terms are rejected based on higher values of SDy compared to that for quadratic. Table 10.9 Substituent R1
1t
Log (Activity)
-
1.2
2
-
1.8
1.9777-
R2
CHCl 2
.
CCI 3 CCb
4-CH 3
2.53
1.8865
CHCh
4-isopropyl
3.1
1.8633
CHCICOCH 3-
-
0.48
1.716
CCI 2
-
3
1.6902
CH 20H
-
0.5
1.9031
Model Par Linear
Quadratic Cubic
Quartic
ao
1.8938
1.6543
1.4356
1.6322
al
-0.01744
0.39151
0.9953
0.30851
-0.11536
-0.52324
a2
0.077047
a3
=ao+az*x z 1.8997
y =aO+a3*x3 Y =aO+a4*x4 1.9015
0.12906
0.099668
0.10489
-0.2143
0.12748
1.9012
-0.0086488 -0.0033887 -0.0011959
0.040035
a4 sdy
0.197
y
0.12514
0.12187
0.11995
337
BIVARIATE ANALYSIS
10.2 Polynomial Regression
Linear 2
''-'
1.9
Quadratic
2
0
1.9
~o
1.8
1.8
o
1.7
1.6
o
2
4
3
0.2
1.6
0
0
~4
() 2
3
4
4
6
8
0.1
cy
0
O·
Q
0
-0.1
-0.1 . (~6
(~
o
2
4
8
6
-02
0
2
Cubic
Quatric 2
2.1 ~
o~o
1.9
o
1.7
1.9
~
j'
1.8
1.6
2
0.2
0.1
-0.2
0
1.7
o
1.8
1.6 0
2
0.1
0
1.7
o 4
3
0
0
2
0.1
2
0.05
4
3
2
0.05
0 0 -0.05 -0.05 -0.1 -0.15
6
0
6
-OJ 2
4
6
8
0
2
Fig. lO.n: Polynomial Fit of log Activity with 1t
4
6
8
338
COMPUTER ApPLICATIONS IN CHEMISTRY
102 Polynomial Regres"iol1
Example 10.11
The variation of dielectric constant of aquo-DMSO mixtures in the range of 10-65% (w/w) is non-linear. It is titted into a quadratic model and the residuals are of the order 0.02 to 0.10, which is lower than the' measurement accuracy. Thus, any value interpolated in the composition range is valid.
x
Residual
y
10
78.2
-0.054
18.58
77.9
0.215
20
77.5
-0.109
32.56
76.9
-0.062
40
76.4
-0.022
52.01
74.9
0.0016
60
73.3
0.099
65.01
71.7
-0.067
Par
Linear
Quadratic
Cubic
Quartic
ao
-8.1036
-105.6
-364.27
20850
al
828.49
15179
72348
--6.20e+6
y
=ao+ a2*x 2 Y =aO+a3*x3 Y =aO+a4*x -2.439
-0.554
4
0.385 1
,
a2
-5.26e+5
a3
-4.73e+6
6.92e+8
1.03e+8
-3.42e+1O
a4 sdy
30209 1.46e+6 7.98e+OO7
6.33e+11 0.18894
0.05567
0.05990
0.06138
0.19904
0.209
0.2189
,
The cases discussed in this section are of curve-fitting, viz., fitting the data to a mathematical model. In such cases the residuals should be low, unlike parameterization where the statistical significance of model parameters is essential.
BIVARIATE ANALYSIS
IO.3.Robust Regressiol!
339
10.3 ROBUST REGRESSION' The objective of regression is to obtain the trend of majority of points and not to explain the outlier by fitting it into the model. Least Squares (LS) is applicable if the errors in Yare normal and homoscedastic. An outlier in y attracts the least squares line, resulting in incorrect slope and intercept. Hence, methods robust to outliers are desirable. Median is a successful statistical parameter in estimating the central tendency in presence of outliers. It is useful even in a linear model.
Single Median Method Usually, a large number of experimental points are obtained to minimize the effect of measurement errors on the model parameters. But, a pair of points is sufficient to estimate slope and intercept of a linear model. For three data points, 32 pairs (Chart 10.9) of slopes and intercepts are possible. Regression is not possible for the diagonal pairs as they have the same points. It leaves 32 - 3 = 6 sets in the upper and lower triangles. Since the slope and the intercept for a pair of points I, 2 or 2, I are the same, they can be estimated from either upper or lower triangle. Thus, one is left with three {(3 2 - 3)/2} pairs. The algorithm of single median method procedure of Theil is given in Chart 10.10. Chart 10.9
Chart 10.10: Single Median Algorithm
Object function: Median of parameters Step 1
Calculation of slope for pairs of points in the upper triangular matrix Slope = (yj -Yj )/(x,Xj)
Step 2
Sorting of vector of slopes Calculation of median of slopes slope_sma = median (slopes)
Step 3
Calculation of intercept for all points using slope_sma Calculation of median of intercept intercepCsma = med (intercepts)
Repeated Median Estimator For six pairs of points, a unique set (Table 10.10) of pairs of points «6 2 - 6)/2 = 15) can·be represented as an upper triangular matrix. Calculation ofthe slope and the intercept are given in Chart 10.11.
340
COMPUTER ApPLICATIONS IN CHEMISTRY
10.3 Rpbu.,/ R~gressioll •
Table 10.10: Sets of Points and the Slopes . Slope
Median of slopes ofith row
Row (i)
Point numbers
1
1,2
slope (1,2)
1,3
slope (1,3)
1,4
slope (1,4)
1,5
slope (1,5)
1,6
stope (1,6)
2,3
stope (2,3)
2,4
slope (2,4)
stope(2,4) +slope(2,5)
2,5
slope (2,5)
2
2,6
slope (2,6)
3,4
slope -(3,4)
3,5
stope (3,5)
3,6
slope (3,6)
4,5
slope (4,5)
4,6
stope (4,6)
slope(4,5) + slope(4,6) 2
5,6
slope (5,6)
slope(5,6)
2
3
4
5 Median of medians
Slope (1,4)
stope(3,5)
slope(3,5)
Chart 10.11: RME (Sigel) Algorithm
Object function: Median (median of slopes in each row of upper triangular matrix) Step 1: For every row in the upper triangular matrix For each pair of points Calculate slope End Calculate median of slopes End Step 2: Slope_ Sigel = median (median (slopes of all rows» Step 3: Intercepc Sigel (same as Step 3 of Single Median Algorithm)
341
BIVARIATE ANALYSIS
103 Robust Regression
LEAST MEDIAN SQUARES Least squares results in the best linear unbiased estimator (BLUE) of intercept and slope of a linear model. In least median squares (LMS), the median of the squares of residuals is calculated for each pair of points. The slope and intercept corresponding to the minimum of medians of squares of residuals are the LMS estimates of that model. They are robust to outliers but are biased. The algorithm of LMS procedure is given in Chart 10.12. Chart 10.12 : Algorithm of Least Median Squares
Object function: Minimum of the median of squares of residuals in 'y' Step 1 : For every pair of points in upper triangular matrix, calculate
[:~] [~ ~~r *[;J =
RES = y-X* a RES2 = RES * RES
Med _ RES2 = Med (RES2) End Step 2: Minimum of Med _ RES2 LMS estimates are the corresponding slope and intercept
In the presence of an outlier, the residuals by LLS are high but are within the 3SD limits, while those by LMS are very high at outlying points only. When the outliers are deleted, the parameters by both LMS and LLS are nearly equal. This combination of LMS to detect outliers and LLS to calculate BLUE of slope and intercept is a popular hybrid method. Example 10.12 The performance of the above discussed methods is illustrated with a data set from an industrial process (Table 1O.11a). Single Median, Repeated Median and LMS behaved similarly and different from LLS (Table 1O.11b). Table 10.110: A Data Set with One Outlier x
Y
0 1.0 2.0 3.0 4.0 5.0
0 1.1 2.0 3.1 3.8 10.0
Table 10.nb: Comparison of Regression Parameters from Different Methods
Method
au
at
LLS
0.815
1.6914
Single Median (Theil)
-2.220e-16
1.033
Repeated Median (Sigel)
0.0250
1.016
LMS
0.000
1.033
342
COMPUTER ApPLICATIONS IN CHEMISTRY
/0.3 Robust Regression
Example 10.13 Data sets with one outlier in y at different positions are simulated. The data points with regression lines from LLS and LMS, along with residual plots are given in Fig. 10.12. The position of outlier does not affect the regression parameters in LMS but the influence is drastic in LLS (Table 10.12). Table 10.12: Regression Parameters of LLS with Outliers at Different Positions Parameter
Position of outlier
1
2
3
4
ao
2.0000
1.4000
0.8000
0.2000
-0.4000
-1.0000
al
0.5714
0.7429
0.9143
1.0857
1.2571
1.4286
SDy
1.0351
1.2593
1.3575
1.3575
1.2593
1.0351
5
6
343
BIVARIATE ANALYSIS
10.3 Robu,/ ReKreSl;ol1
[e]
o : Expt Points,
[f]
Line: Fitted line
0
6 4
'"
,,/
:Q
,,/
~
,,/ ,,/
2
-1
"@ ::>
,,/ ,,/
'"~-l
,,/ ,,/ ,,/
-2
0
-3 -2
-2
-2
2
0
4
0
[g]
6
o :Expt Points,
4
2
6
6 [h]
Line: Fitted line
0 ,,/
"/
'" -I
"@ ::>
:Q
4
~
'" :§
-2
2
-3 4
2
0
6
0
2
0
4
6
[j]
[i]
o : Expt Points,
Line: Fitted line
3
8
6
'"
"@
.g
-4
2
"00
~
Vl
0
~
-l
2 0
0 0
2
4
6
0 8
2
4
6
8
344
COMPUTER ApPLICATIONS IN CHEMISTRY
[k]
o :Expt Points,
/03 Robu.,( Regres."ioll
[1]
Line: Fitted line
3
8
6
4 ./
./
./
./
./
./
./
./
0
. '"
2
~
~
tZl
~
.....l
2 0
0
0 0
2
4
4
6
8
[n]
[m]
o : Expt Points,
2
8
6
Line: Fitted line
3
10
..
8
'" ;::s
6 4
/"
./
./
./
./
./
./
2
~ .,
./
0
~ tZl
~
.....l
2 0
0
0 0
5
5
10
10
Fig. 10.12: Effect of Position of Outlier on LLS and LMS Regression Lines and the Residuals in LMS Fit
345
BIVARIATE ANALYSIS
10.3 Robust Regressioll
Example 10.14
A simulated data set with one outlier in
'y'
at sixth point is given in Table 10.13.
Tobie 10.13: Comparison of LLS with LMS x
y
Residual
LMS
LLS
1.0000
1.0000
0
0.5714
2.0000
2.0000
0
0.1429
3.0000
3.0000
0
-0.2857
4.0000
4.0000
0
-0.7143
5.0000
5.0000 .
0
-1.1429
6.0000
9.0000
-3.0000
1.4286
Regression Parameters
Intercept
0 1.0000
-1.0000
Slope SDy
2.2500
1.0351
1.4286
The residuals mislead that the least squares estimators are reliable, since SDy by LLS is far less than that by LMS. The plot of residuals versus y shows a trend indicating the insufficiency of a linear model. The residual by LMS is very high (3.0) for the outlier while those for all other points are zero (Fig 1O.12m, n). This demonstrates robustness of LMS to the outlier. The results of analysis after the removal of the outlier (Table I 0.14) show that the parameters, residuals and SDyare identical in both the methods. Tobie 10.14: Comparison of LLS with LMS after Removal of Outlier x
1.00 2.00 3.00 4.00 5.00
Residual
y
1.00 2.00 3.00 4.00 5.00
LMS
LLS
0 0 0 0 0
0 0 0 0 0
Regression Parameters
Intercept Slope SDy
0 1.00 1.0e-014
0 1.00 0.11 c-014
Example 10.15 The results for a data set with normal noise in y and an outlier (Table 10.15) show that the residuals by LLS are higher than those obtained by LMS. The high residuals in LLS are due to the pull of regression line towards the outlier (sixth point).
346
COMPUTER ApPLICATIONS IN CHEMISTRY
103 Robust Regression
Table 10.15: Comparison of LLS with LMS
x
Residual
y
LLS
LMS 0
0
1.0000
1.1000 2.0000
-0.0667
0.3038
0.0667
-0.4876
2.0000
0.0000
0.8952
3.0000
3.1000
0.0000
--1.0790
4.0000
3.8000
0.3333
-2.0705
5.0000
10.0000
-4.8333
2.4381
Regression Parameters Intercept ;, Slope
i
0.0000
-0.8952
1.0333
1.6914
1.7697 ~ I 5.8703 Chemical Tasks Outliers, from a statistical point of view, result for data with asymmetric distributions and/or higher values of cumulative probability,far away from central values. The chemical reasons for their occurrence arc insufficient concentrations of reagent in calibration, presence of ortho-compound in Hammett relationship, solute-solvent interactions in Born dielectric model and on set of a different mechanism in kinetic order.
I
Example 10.16 The data set consists of the instrument signals for different concelltrations of a pollutant (Table 10.16a). The extinctio.n coeffiCIent (slope) and blank (intercept) are vitiated in LLS but not in LMS (Table 1O.16b). Table 10.160: Residuals by LLS and LMS
Concentration
Signal
I
1.1
Residual LLS LMS 0.32 -0.01 -0.04 -0.03
2
2.0
3
-020
4
3.1 3.8
-0.76
0.15 -0.07
5
6.5
0.68
1.71
Table 10.16b: Regression Parameters
Method
Intercept
Slope
LLS
-0.48
1.26
LMS
0.19
0.92
Example 10.17 In the determination of order of reaction, a plot of log concentration versus log k is linear with slope corresponding to the order of reaction. Hence, it is a least squares problem. For a typical kinetic data set without outliers (Table 10.17), the SD in log k is 10-3 and the residuals are comparable in LLS and LMS methods. The slope and intercept are also the same.
347
BIVARIATE ANALYSIS
103 Robu, ( Regres,joll
Table 10.17 : LLS and LMS Results for the Kinetic Data
Concentration
logk
Residuals
LMS
LLS
-0.9590
-3.1146
8.8818e-016
1.0623e-003
-0.8218
-2.8416
3.3194e-003
-6.5402e-004
-0.7095
-2.6253
1.337ge--002
-9.399ge-003
-0.6307
-2.4486
-4.6506e--003
9.5503e-003
-0.5605
-2.3116
8.8818e-016
5.7213e-003
-0.4845
-2.1713
1.2890e--002
-6.279ge-003
Intercept
-1.18
-1.19
Slope
2.01
2.00
SD in logk
0.009718
0.007958
Example 10.18 The solute-solvent and solvent-solvent interactions have pronounced effect on log k. The trend may become non-linear when these interactions are predominant. A data set oflog k versus liD (dielectric constant) with point numbers I, 9 and 10 as outliers (marked with asterisk, Table 10.18) has the slopes -4.64 and -2.04 obtained from LLS and LMS. When point 1 is eliminated, the slope by LLS is increased to -3.29 and SD decreased by 50%. Subsequent deletion of ninth and tenth points further increased the slope to -2.35 with a SD of 0.11. • Table 10.18 : Elimination of Outliers through LMS Analysis
Residuals
Point No.
LMS
1
1.74*
2
0.26
LLS -0.91 0.28
LMS
LLS
Data set
LMS
LLS
Eliminated 0.26
-0.10
0.00
-0.1
liD
log (k)
0.526
-5.301
0.416
-3.596
3
0.00
0.62
0.00
0.19
0.28
0.0
0.448
-3.400
4
-0.04
0.41
-0.04
0.11
0.25
0.1
0.349
-3.154
5
0.05
-0.06
0.05
-0.16
0.03
-0.0
0.204
-2.954
6
0.07
-0.13
0.07
-0.21
0.00
-0.0
0.185
-2.935
7
-0.08
-0.18
0.08
-0.15
0.09
0.0
0.101
-2.602
8
0.00
-0.41
0.00
-0.30
0.03
-0.0
0.046
-2.576
9
-0.61 *
-0.61 *
0.29
Eliminated
0.039
-1.950
10
-0.67*
0.21
-0.67*
0.34
Eliminated
0.027
-1.860
-2.48
-1.94
-2.48
-2.11
-2.40
-2.44
Intercept
-2.04
-4.64
-2.05
-3.29
-2.85
-2.35
Slope
0.70
0.47
0.36
0.25
0.17
0.11
0.)7
SD in y
348
COMPUTER ApPLICATIONS IN CHEMISTRY
IDA Residual Analysis
10.4 RESIDUAL ANALYSIS' The data analysis with least squares is a common modeling technique. The distribution of experimental error is unknown hut we assume that it follows a normal distribution. The residuals obtained are analyzed to understand whether the model proposed is adequate or not. The purpose of residual analysis is to understand • distribution of errors in response • detection of outliers in the data • ruling out inadequate models • avoiding over fitting and • dealing with model errors. The relationship between the error in response and residual are given in Chart 10.13. Chart 10. 13
ao + a,*xj + ej
Model: y, e yeal;
Res;
= = =
N(o, Ii) ao + a,*x; yeal,
)'j-
The residuals are analyzed for normal distribution. If the statistical measures of the residuals and the errors assumed in the model are not significantly different from each other, it establishes that model is adequate since the necessary conditions of LS are satistied. Then one can calculate the confidence contours of regression coefficients, Y cal, etc. Further the regression coefficients are BLUE. A model is considered as adequate only if the residuals do not show any trend. When the data fits into the models the residuals should be ideally equal to zero. But they will be of a small magnitude compared to the response. The absence of trend and auto-correlation leaves the numbers to be random ones, in fact they can be of any disttibution. Respecting the hypothesis that the errors are random following normal distribution in the least squares analysis, the residual vectors are tested for normality. The half-normal plot and normal-plot are popular under 'VEDA' and X2 , skewness, kurtosis tests belonging to 'SEDA.' A discussion on some popular methods of residual analysis viz., X2, R-factor, Exner '1', Ehreusen F-tests employed in chemical sciences follows.
10.4.1 X2 test for Analysis X2 is a special case of Gamma distribution whose probability density function (PDF) is an unsymmetrical function. When the errors in the response follow normal distribution, the sum of the weighted residuals follow X2 distribution (Chart 10.14), with (np - npar) degrees of freedom (dt). This distribution measures the probability of residuals forming a part of standard normal distribution, which has zero mean and unit standard deviation. 2
Chart 10.14: X test
If
X2 calculated < X2 table
Then
Ho: Model is acceptable valid
. 10.4 Re.l;dual Analys;s
BIVARIATE ANALYSIS
349
A higher value of X2 than the table vallIe arises due to •
inadequate model for a good data and
•
adequate model for imprecise data
X2 statistic is used to •
compare experimental results with expected data belonging to a statistical distribution.
•
check the adequacy of a model, whose parameters are estimated by linear or non-linear least squares.
•
assess the association between two variables.
•
estimate parameters of the model by invoking it as an object function for minimization.
Limitation
X2 statistic is also a point estimate. Although X2 test indicates that the observed values are not significantly different from expected values, data accuracy is an important component in planned experiments. For example, X2 test is passed even for a difference of 0.1 pH in the observed and calculated values in alkalimetric titrations. But when an instrument of 0.01 readability with 0.03 precision is used, these residuals are on higher side and remedial measures to improve experimental conditions are implemented. 10.4.2 Exner and Ehreusen Parameters The number of compounds studied in LFER is generally in the range of 5 to I O. The variation of log k or log K with variation of substituent in a basic moiety, co-solvent composition or even non-aqueous organic solvents belongs to this category. It has been recognized that correlation coefficient, standard deviation, ttest are inadequate. Exner and Ehreusen proposed new statistics (SIEXN and FEHR ) applicable when number of compounds is as small as five. Further correlation coefficient is moditied as corrected correlation coefficient (CCC). These parameters again depend upon the residuals and deviation from the mean. An empirical rule is invoked to arrive at the best model based on the range of these parameters. The formulae and heuristics are given in Chart 10.15. Chart 10.15 Corrected correlation coefficient (Ccq = 1.0 - (NP - 1)(RSS/ TSS)/ (NP- 2) Exner parameter::::: (RSS/ (NP- 2)*TSS)1/2 Ehreusen parameter (FEHR) =(RSSITSS) 1/2 If 0.0 ? @
(381)
ASCII Character
A B C D E F G H I J
K L
77
M
78
N
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
0 P
Q R S T U V W
X Y Z
[ \
] 1\
-
,
Value
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
ASCII Character
a b
c d
e f g
h
i j
k
1 ill
n 0
P q r
s t
u v
w x Y
z {
I
t ___
}
DEL
382
COMPUTER ApPLICATIONS IN CHEMISTRY
Appendix 2 Object Oriented Representation of Hardware
Computer System
[Analog, digital, hybrid] [Micro, work-station, mainframe, supercomputer] [I, II, III, IV, V generation], [Offline, online] [Hardware, Software, Firmware]
Hardware
[Micro Processor, Memory, I/O device]
Microprocessors
[Intel, Zilog, Motorola, NS]
Intel
[80X86, Pentium]
80X86
[8086,8088,80286,80386,80486]
Pentium
[Pentium, Pentium MMX, Pentium II, Pentium III, Pentium III
Computer Classification
Xeon, Pentium IV] Memory
[Primary, Secondary, Cache, Virtual]
Primary
[ROM, RAM, Virtual, RAM DISK]
ROM
[EEROM, UVEROM, CDROM]
RAM
[Expanded, Extended]
liD device
[Input, Output]
Input
[Keyboard, Mouse, Magnetic media, OCR, Digitizer, Voice]
Output
[VDU, Printer, Magnetic media, CD-ROM-W, Multimedia]
VDU
[MDA, CGA, ECGA, VGA, MCGA, SVGA]
Printer
[Impact, Non impact]
Impact
[Dot matrix, Line printer]
Non impact
[Jet, Thermal]
Jet
[Laser, Ink]
Magnetic Media
[Disk, Tape]
ApPENDICES
383
Disk
[FDD, HD, Zip Drive]
FDD
[8", 3 W', 5 1;.\"]
Hard disk
[10 MB, 40 MB, 1GB, 4 GB, 8 GB, 80 GB]
Tape
[Paper, Magnetic]
Optical
[CD, DVD]
CD
[CD ROM, CD-RW]
Software
[System, OS, Application]
Application Software
[Compiler, Interpreter]
Compiler
[C, Fortran 77, C++]
Interpreter
[BASIC, VB]
OS
[MS DOS, UNIX, WINDOWS]
WINDOWS
[WINDOWS3.1, WINDOWS 95, WINDOWS 98, WINDOWS NT, WINDOWS 2000]
Applications
[Languages, Packages, User programs]
Languages
[Low level, High level]
Low level
[Binary, Octal, Hexadecimal, Assembly]
High level
[Basic, FORTRAN, C, C++]
Package
[DBASE, GRAPHER, SPSS]
User Programs
[Source, Object, Executable]
Appendix 3 Z-table
a
0
0
ao
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
2.575829
2.326348
2.17009
2.053749
1.959964
1.880794
1.811911
1.750686 1.695398
0.1
1.644854
1.598193
1.554774
1.514102
1.475791
1.439531
1.405072
1.372204
1.340755 1.310579
0.2
1.281552
1.253565
1.226528
1.200359
1.174987
1.150349
1.126391
1.103063
1.080319 1.058122
0.3
1.036433
1.015222
0.994458
0.974114
0.954165
0.934589
0.915365
0.896473
0.877896 0.859617
C
0.4
0.841621
0.823894
0.806421
0.789192
0.772193
0.755415
0.738847
0.722479
0.706303 0.690309
:rJ
()
o
s: "U
--I
m
»"U "U
r
0.5
0.67449
0.658838
0.643345
0.628006
0.612813
0.59776
0.582842
0.568051
0.553385 0.538836
0.6
0.524401
0.510073
0.49585
0.481727
0.467699
0.453762
0.439913
0.426148
0.412463 0.398855
0.7
0.38532
0.371856
0.358459
0.345126
0.331853
0.318639
0.305481
0.292375
0.279319 0.266311
0.8
0.253347
0.240426
0.227545
0.214702
0.201893
0.189118
0.176374
0.163658
0.150969 0.138304
0.125661
0.113039
0.100434
0.087845
0.07527
0.062707 ; 0.050154
0.037608
0.025069 0.012533
a
0.002
0.001
0.0001
0.00001
0.000001
0.0000001
0.00000001 0.000000001
z
3.090232
3.29053
3.89059
4.41717
4.89164
5.32672
5.73073
~
--I
oZ (jJ
Z () I
m
s:
Cii
0.9
6.10941
--I :rJ
-