208 110 5MB
English Pages 224 Year 2007
A d v a n c e s
i n
Geosciences Volume 6: Hydrological Science (HS)
ADVANCES IN GEOSCIENCES Editor-in-Chief: Wing-Huen Ip (National Central University, Taiwan)
A 5-Volume Set Volume 1:
Solid Earth (SE) ISBN-10 981-256-985-5
Volume 2:
Solar Terrestrial (ST) ISBN-10 981-256-984-7
Volume 3:
Planetary Science (PS) ISBN-10 981-256-983-9
Volume 4:
Hydrological Science (HS) ISBN-10 981-256-982-0
Volume 5:
Oceans and Atmospheres (OA) ISBN-10 981-256-981-2
A 4-Volume Set Volume 6:
Hydrological Science (HS) ISBN-13 978-981-270-985-1 ISBN-10 981-270-985-1
Volume 7:
Planetary Science (PS) ISBN-13 978-981-270-986-8 ISBN-10 981-270-986-X
Volume 8:
Solar Terrestrial (ST) ISBN-13 978-981-270-987-5 ISBN-10 981-270-987-8
Volume 9:
Solid Earth (SE), Ocean Science (OS) & Atmospheric Science (AS) ISBN-13 978-981-270-988-2 ISBN-10 981-270-988-6
A d v a n c e s
i n
Geosciences Volume 6: Hydrological Science (HS)
Editor-in-Chief
Wing-Huen Ip
National Central University, Taiwan
Volume Editor-in-Chief
Namsik Park
Dong-A University, Korea
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
ADVANCES IN GEOSCIENCES A 4-Volume Set Volume 6: Hydrological Science (HS) Copyright © 2007 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 ISBN-10 ISBN-13 ISBN-10
978-981-270-781-9 981-270-781-6 978-981-270-985-1 981-270-985-1
(Set) (Set) (Vol. 6) (Vol. 6)
Typeset by Stallion Press Email: [email protected] Printed in Singapore.
EDITORS Editor-in-Chief:
Wing-Huen Ip
Volume 6: Hydrological Science (HS) Editor-in-Chief: Namsik Park Editors: Chunguang Cui Eiichi Nakakita Simon Toze Chulsang Yoo Volume 7: Planetary Science (PS) Editor-in-Chief: Anil Bhardwaj Editors: C. Y. Robert Wu Francois Leblanc Paul Hartogh Yasumasa Kasaba Volume 8: Solar Terrestrial (ST) Editor-in-Chief: Marc Duldig Editors: P. K. Manoharan Andrew W. Yau Q.-G. Zong Volume 9: Solid Earth (SE), Ocean Science (OS) & Atmospheric Science (AS) Editor-in-Chief: Yun-Tai Chen Editors: Hyo Choi Jianping Gan
v
This page intentionally left blank
CONTENTS
Stochastic Generation of Multi-Site Rainfall Occurrences
1
Ratnasingham Srikanthan and Geoffrey G. S. Pegram A Spatial–Temporal Downscaling Approach for Construction of Intensity–Duration–Frequency Curves in Consideration of GCM-Based Climate Change Scenarios
11
Tan-Danh Nguyen, Van-Thanh-Van Nguyen and Philippe Gachon Development and Applications of the Advanced Regional Eta-Coordinate Numerical Heavy-Rain Prediction Model System in China
23
Cui Chunguang, Li Jun and Shi Yan Method of Correcting Variance of Point Monthly Rainfall Directly Estimated Using Low Frequent Observations From Space
35
Eiichi Nakakita, Syunsuke Okane and Lisako Konoshima Monte Carlo Simulation for Calculating Drought Characteristics
47
Chavalit Chaleeraktrakoon and Supamit Noikumsin On Regional Estimation of Floods for Ungaged Sites Van-Thanh-Van Nguyen vii
55
viii
Contents
Determination of Confidence Limits for Model Estimation Using Resampling Techniques
67
N. K. M. Nanseer, M. J. Hall and H. F. P. Van Den Boogaard Real-Time High-Volume Data Transfer and Processing for e-VLBI
81
Yasuhiro Koyama, Tetsuro Kondo, Moritaka Kimura, Masaki Hirabaru and Hiroshi Takeuchi A Comparison of Support Vector Machines and Artificial Neural Networks in Hydrological/ Meteorological Time Series Prediction
91
Dulakshi S. K. Karunasingha and Shie-Yui Liong Long-Term Water and Sediment Change Detection in a Small Mountainous Tributary of the Lower Pearl River, China
97
S. Zhang and X. X. Lu Flow Structure and Air Entrainment in Riparian Riffles in Seomjin River
109
Jin-Hong Kim Application of a Sediment Information System to the Three Gorges Project on Yangtze River, China
119
Shuyou Cao, Xingnian Liu, Kejun Yang and Changzhi Li Spatial Distribution of Nitrate in Mizoro-Ga-Ike, a Pond With Floating Mat Bog Tetsuya Shimamura, Yasuhiro Takemon, Ken’ichi Osaka, Masayuki Itoh and Nobuhito Ohte
129
Contents
Effect of Population Growths on Water Resources in Dubai Emirate, United Arab Emirates
ix
139
Hind S. Al-Nuaimi and Ahmed A. Murad Effects of Sand Dune and Vegetation in the Coastal Area of Sri Lanka at the Indian Ocean Tsunami
149
Norio Tanaka, Yasushi Sasaki and M. I. M. Mowjood Using Managed Aquifer Recharge to Remove Contaminants from Water
161
Simon Toze Finite-Difference Method Three-Dimensional Model for Seepage Analysis Through Fordyce Dam
171
Samuel Sangwon Lee and Takeshi Yamashita Development of Unsaturated-Zone Leaching and Saturated-Zone Mixing Model
181
Samuel Sangwon Lee Analytical Estimation of Potential Groundwater Resources in Coastal Areas
195
Namsik Park, Sung-Hun Hong, Kyung-Soo Seo and Lei Cui Verification of the Combined Model of a Geyser (a Periodic Bubbling Spring) by Underground Investigation of Kibedani Geyser Hiroyuki Kagami
203
This page intentionally left blank
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
STOCHASTIC GENERATION OF MULTI-SITE RAINFALL OCCURRENCES RATNASINGHAM SRIKANTHAN Hydrology Unit, Bureau of Meteorology, GPO Box 1289, Melbourne, Australia [email protected] GEOFFREY G. S. PEGRAM Civil Engineering, University of KwaZulu-Natal, Durban, 4041, South Africa [email protected]
Daily rainfall is a major input to water resources and agricultural systems. As the historical record provides a single realization of the underlying climate, stochastically generated data are used to assess the impact of climate variability on water resources and agricultural systems. Daily rainfall data generation at a single site is a well researched area in the hydrological and climatological literature. However, for assessing hydrological and land management changes over larger regions, the spatial dependence between the weather inputs at different sites have to be accommodated. In a recent study, it was found that Wilks’ approach20 performed well in comparison with a hidden Markov model and a nonparametric k-nearest neighbor model. In Wilks’ approach, the precipitation occurrence was generated by using a correlated set of normally distributed random numbers. The spatial correlations between the normal random numbers were obtained by the method of bisection using simulation. This is not only a cumbersome procedure but takes a lot of computer time if the number of stations is large. In this paper, a root finding algorithm is used to obtain the hidden correlation between the normal variates from the estimated binary correlation between the rainfall occurrence processes. In addition, the hidden covariance model is validated by comparing the cumulative distribution functions of observed and generated wetness counts for 10 stations (specifying the moments up to the tenth-order), not guaranteed by the second-order nature of the model. The procedure is applied to three catchments, with the number of rainfall stations varying from 3 to 30, to model the rainfall occurrences and the results showed that the rainfall occurrence process was satisfactorily modeled in each case.
1. Introduction Daily rainfall is a major input to water resources and agricultural systems. As the historical record provides a single realization of the underlying 1
2
R. Srikanthan and G. G. S. Pegram
climate, stochastically generated data are used to assess the impact of climate variability on water resources and agricultural systems. Daily rainfall data generation at a single site is a well researched area in the hydrological and climatological literature.1–13 However, for assessing hydrological and land management changes over larger regions, the spatial dependence between the weather inputs at different sites have to be accommodated. This is particularly important to the simulation of rainfall, which displays the largest variability in time and space. The model used to generate daily rainfall at a number of sites can be broadly grouped into four categories: conditional models, extension of Markov chain models, random cascade models, and nonparametric models. Conditional models generate the occurrence and the amount of rainfall using surface and upper air data.14–19 Wilks20 extended the familiar twopart model, consisting of a two-state, first-order Markov chain for rainfall occurrences and a mixed exponential distribution for rainfall amounts, to generate rainfall simultaneously at multiple locations by driving a collection of individual models with serially independent but spatially correlated random numbers. He applied the model to 25 sites in New York area. Jothityangkoon et al.21 constructed a space–time model to generate synthetic fields of space–time daily rainfall. The model has two components: a temporal model based on a first-order, four-state Markov chain which generates a daily time series of the regionally averaged rainfall and a spatial model based on nonhomogeneous random cascade process which disaggregates the regionally averaged rainfall to produce spatial patterns of daily rainfall. The cascade used to disaggregate the rainfall spatially is a product of stochastic and deterministic factors; the latter enables the model to capture systematic spatial gradients exhibited by measured data. Buishand and Brandsma22 used a nearest-neighbor resampling for multisite generation of daily precipitation and temperature at 25 stations in the German part of the Rhine basin. Mehrotra and Sharma23 applied the k-nearest neighbor technique to simulate rainfall conditional upon atmospheric variables simultaneously at 30 stations around Sydney. Conditional models are both data and computationally intensive. Besides, these are usually applied to one location and not tested adequately. The random cascade models also require a large amount of data to characterize the spatial dependence at different levels in the cascade as it generates rainfall data over a grid. A nonparametric model was developed at the University of New South Wales by Mehrotra and Sharma.23 The extended two-part model of Wilks20 which is an extension of the Markov
Stochastic Generation of Multi-Site Rainfall Occurrences
3
chain model appears to be a relatively simple model and at the same time, it has the potential to perform well. A comparison of the extended twopart model of Wilks20 with two other approaches (hidden-state Markov model and the k-nearest neighbor model) to model rainfall occurrence has shown that this approach performed the best.24 In Wilks’ approach, the precipitation occurrence was generated by using a correlated set of normally distributed random numbers. The spatial correlations between the normal random numbers were obtained by the method of bisection using simulation. This is not only a cumbersome procedure but takes a lot of computer time if the number of stations is large. In this paper, a combination of numerical quadrature of the bivariate normal density function and a bisection root finding algorithm is used to obtain the hidden correlation between the normal variates from the estimated correlation between the rainfall occurrence processes. In addition, it is demonstrated that the “hidden covariance model” preserves the higher order (≥3) binary correlations satisfactorily, an indirect validation of the procedure. The model is applied to three Australian catchments/regions, with the number of rainfall stations varying from 3 to 30, to model the daily rainfall occurrences and the results will be presented.
2. Multi-Site Rainfall Occurrence Model A first-order two-state Markov chain is used to determine the occurrence of daily rainfall at each site. For each site k, the Markov chain has two transition probabilities: pW |D and pW |W , respectively, the conditional probabilities of a wet day given that the previous day was dry or wet. The individual models are driven by serially independent but crosscorrelated random numbers to preserve the spatial correlation in the rainfall occurrence process. Given a network of N locations, there are 2N binary relationships and as a minimum requirement, N (N − 1)/2 pairwise correlations should be maintained in the generated rainfall occurrences. This is achieved by using correlated uniform random numbers (u t ) in simulating the occurrence process. The uniform variates ut (k) can be derived from standard Gaussian variates wt (k) through the transformation
ut (k) = Φ[wt (k)]
(1)
4
R. Srikanthan and G. G. S. Pegram
where Φ[·] indicates the standard normal cumulative distribution function. Let the correlation between the Gaussian variates, wt , for the station pair k and l be ω(k, l) = Corr[wt (k), wt (l)]
(2)
Together with the transition probabilities for stations k and l, a particular ω(k, l) will yield a corresponding correlation between the synthetic binary series (X t ) for the two sites: ξ(k, l) = Corr[Xt (k), Xt (l)]
(3)
Let ˆı0 (k, l) denote the observed value of ξ(k, l), which will have been estimated from the observed binary series Xt0 (k) and Xt0 (l) at stations k and l. Hence the problem reduces to finding the N (N − 1)/2 correlations of ω(k, l) which together with the corresponding pairs of transition probabilities reproduces ˆı0 (k, l) = ˆı(k, l) for each pair of stations. Direct computation of ω(k, l) from ˆı0 (k, l) is not possible. In practice, one can invert the relationship between ω(k, l) and ξ(k, l) using a nonlinear root finding algorithm or obtain ω(k, l) by simulation as suggested by Wilks.20 In an earlier study,25 the correlation between the corresponding normal variates is obtained by an iterative method using simulation and the method of bisection. However, this procedure is time-consuming and cumbersome. Hence an efficient root finding algorithm, the hidden covariance model, is proposed and tested in this paper. The hidden covariance model uses the bisection root finding algorithm to search for the equivalent correlation in the normal space to give the correct bivariate probabilities. This probability is in fact calculated by Gaussian quadrature of the binormal densities function.26 Realizations of the vector w t may be generated from the multivariate normal distribution having mean vector 0 and variance–covariance matrix Ω, whose elements are the correlations ω(k, l). The multivariate normal variates are generated from w t = Bεt
(4)
where B is a coefficient matrix and εt independent normal vector. The coefficient matrix is obtained from BB T = Ω
(5)
Stochastic Generation of Multi-Site Rainfall Occurrences
5
The elements of B can be obtained by Cholesky’s decomposition for a small number of rainfall stations (up to 5). For a larger number of rainfall stations, the Cholesky’s decomposition fails and the elements of B can be obtained by singular value decomposition, a method that is robust even if the matrix Ω is ill-conditioned. The seasonality in daily rainfall occurrence is taken into account by considering each month separately. 2.1. Hidden covariance model There are several relationships which have to be maintained between the four probabilities linking the two binary variables X1 and X2 : {P [X1 = 0, 1; X2 = 0, 1] = pij , i, j = 0, 1}: 1 = E[X1 ] = p1 = j p1j ; m2 = E[X2 ] = p.1 = ij pij = 1; m p ; E[X X ] = ijp = p , so that the standard deviations and i1 1 2 11 ij 1 ij the cross-correlation are specified as: s1 = m1 (1 − m1 ), s2 = m2 (1 − m2 ) and the Binary (or Bernoulli) correlation is ρB = (E[X1 X2 ] − m1 m2 )/s1 s2 . When three binary variables are involved, the condition ijk pijk = 1 (i, j, k = 0, 1) together with a specification of the three means and three binary cross-correlations are sufficient to specify only seven of the eight probabilities pijk . The triple correlation (or equivalently p111 ) has to be specified in addition. There is no obvious functional form, between p111 and the remaining pijk values, which can be specified in terms of first- and second-order moments. The matter is even worse for higher dimensional constructs, where for a p-dimensional binary process, 2p parameters have to be specified. This is manageable for p ≤ say 8, but when p = 30 (a reasonable network of raingauges) there are in excess of 1.074 × 109 parameters/relationships for a complete specification, a point that has not been raised in previous literature. Turning to the multi-normal distribution, its p-dimensional standardized p.d.f. is specified by p(p − 1)/2 cross-correlations. Thus if each individual dimension (i = 1, 2, . . . , p) is partitioned at zi so that P [z > zi ] = mi , the mean of a corresponding Bernoulli variable, and the p(p−1)/2 multinormal cross-correlations ρN are chosen judiciously, we can guarantee all pairwise second-order probabilities will have the correct values. Although the multi-normal moments of dimension higher than 2 are completely specified by those of first- and second-order, there is no guarantee that this ploy will hold for binary probabilities for dimension higher than 2. To determine if the hidden covariance model can capture the higher order probabilities, 3 years of hourly rainfall data at 10 stations in the Reno
6
R. Srikanthan and G. G. S. Pegram
1 0.95 0.9 0.85 0.8
simulated
0.75
observed
0.7 0.65 0.6 0
2
4
6
8
10
Fig. 1. Cumulative density functions of observed and simulated number of wet gauges out of 10 in any hour, for the Reno catchment pluviometer data.
River catchment in Italy were modeled as a complementary exercise. Their individual probabilities of wetness varied considerably, ranging from 0.076 to 0.166 with a mean of 0.121. A fair test is to see if the correct proportion of raingauges is wet at any one time; 300 years of simulated multi-site hourly binary data were generated and compared to the observed. The cumulative probability density functions of the two sets were computed and are compared in Fig. 1, where an excellent match is observed, validating the hidden covariance model.
3. Daily Rainfall Data Daily rainfall data from three catchments in Australia were used to develop and test the multi-site daily rainfall model. The number of stations varies from 3 to 30. The Woady Yaloak River Catchment is located in southwest Victoria. The area of the Catchment is 1157 km2 . There are three rainfall stations near the catchment, but none is within the catchment. The rainfall data used are 83-years long covering the period 1919–2001. The Yarra River Catchment is located close to Melbourne and is one of the water supply catchments for Melbourne Water. There are 10 rainfall stations located within the Catchment which has an area of 3957 km2 . The rainfall data used are 41-years long covering the period 1955–1995. The mean annual rainfall varies from 635 mm to 1437 mm while the number of wet days varies from 90 to 252. The Murrumbidgee River Catchment is located in southern New South Wales and the Australian Capital Territory lies within the catchment. The area of the catchment is 81,563 km2 . Thirty stations are selected and
Stochastic Generation of Multi-Site Rainfall Occurrences
7
the rainfall data used are 110 years long covering the period 1890–1999. The mean annual rainfall varies from about 340 to 970 mm while the average number of wet days per year varies from 57 to 106.
4. Discussion of Results The performance of the second-order properties of the model of the Australian monthly data was evaluated by using the log-odds ratio and the wet fraction. The log-odds ratio (a measure of the pairwise correlation for a pair of sites) is: p(D, D)p(W, W ) lor = log (6) p(D, W )p(W, D) where p(D, D) p(W, W ) p(D, W ) p(W, D)
is is is is
the the the the
probability probability probability probability
of of of of
both sites dry; both sites wet; first site dry and second site wet; and, first site wet and second site dry.
The wet fraction is the ratio of rain days to the total number of days at each site and is equivalent to the probability of wetness — a first-order property. One hundred replicates, each of length equal to the length of historical data were generated for rainfall occurrence. The log-odds ratio and the wet fraction were calculated for each month separately from each replicate and averaged. The results are presented in Fig. 2. It can be seen from this figure that the spatial rainfall occurrence model satisfactorily preserves the spatial correlation of the occurrence process and the wet fraction.
5. Conclusions A multi-site Markov chain model based on Wilks’ model20 is used to generate daily rainfall occurrence at a number of sites. The correlation between the rainfall occurrences is handled by using correlated random numbers. However, the methods used in the literature were cumbersome involving iterative simulations to find the correlations in the normal domain from the binary correlations. An efficient root finding is proposed in this paper and evaluated using daily rainfall data from three catchments with the number of rainfall sites varying from 3 to 30. The results show that
8
R. Srikanthan and G. G. S. Pegram Wet Fraction
Log-odds ratio 0.7
5
0.6 0.5
Generated
Generated
4 3 2
0.4 0.3 0.2
1
0.1
0
0 0
1
2 3 Observed
4
5
0
0.2
0.4
0.6
Observed
(a) Woady Yaloak Catchment Wet fraction 1
4
0.8
Generated
Generated
Log-odds ratio 5
3 2
0.6 0.4 0.2
1
0
0 0
1
2 3 Observed
4
5
0
0.2
0.4
0.6
0.8
1
Observed
(b) Yarra Catchment Wet fraction 0.5
4
0.4 Generated
Generated
Log-odds ratio 5
3 2 1
0.3 0.2 0.1
0 0
1
2 Observed
3
4
5
0 0
(c) Murrumbidgee Catchment
0.1
0.2
0.3
0.4
0.5
Observed
Fig. 2. Comparison of observed and generated log-odds ratio and wet fraction for the gauge networks on three Australian catchments.
the model performed well for all the three catchments. In addition, because pairwise binary results are in themselves not sufficient to guarantee proper patterns of rainfall, a subsidiary experiment was undertaken. In that experiment, the use of the hidden covariance model was fully validated by
Stochastic Generation of Multi-Site Rainfall Occurrences
9
comparing probabilities of the number of gauges in a network of 10 being wet at any one time.
Acknowledgments The second author would like to thank the Department of Civil and Environmental Engineering at the University of Melbourne for its support during several visits for research. In addition he gives special thanks to Professor Ezio Todini of the University of Bolognia for hosting him on sabbatical in 1996 when the hidden covariance model was borne.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
T. A. Buishand, J. Hydrol. 36 (1978) 295–308. D. A. Woolhiser and G. G. S. Pegram, J. Appl. Meteorol. 5(1) (1979) 34–42. G. G. S. Pegram, J. Appl. Probab. 17 (1980) 350–362. T. G. Chapman, Water Down Under 94, Inst. Eng. Aust. 3 (1994) 7–12. T. G. Chapman, Environ. Modell. Software 13 (1998) 317–324. T. G. Chapman, MODSIM 2001 Int. Congr. Modell. Simul. 1 (2001) 287– 292. T. I. Harrold, A. Sharma and S. J. Sheather, Water Resour. Res. 39(10) (2003) 1300, doi:10.1029/2003WR002182. T. I. Harrold, A. Sharma and S. J. Sheather, Water Resour. Res. 39(12) (2003) 1343, doi:10.1029/2003WR002570. B. Rajagopalan, U. Lall and D. G. Tarboton, J. Hydrol. Eng. 1(1) (1996) 33–40. A. Sharma and U. Lall, Math. Comput. Simul. 48 (1999) 367–371. R. Srikanthan and T. A. McMahon, Technical Paper No. 84 (Australian Water Resources Council, Canberra, 1985). R. Srikanthan and T. A. McMahon, Hydrol. Earth Syst. Sci. 5(4) (2001) 653–656. D. A. Woolhiser, in Statistics in the Environmental and Earth Sciences, eds A. T. Walden and P. Guttorp (Edward Arnold, London, 1992), p. 306. W. Zucchini and P. Guttorp, Water Resour. Res. 27(8) (1991) 1917–1923. A. Bardossy and E. J. Plate, J. Hydrol. 122 (1991) 33–47. A. Bardossy and E. J. Plate, Water Resour. Res. 28 (1992) 1247–1259. L. L. Wilson and D. P. Lettenmaier, J. Geophys. Res. 97 (1993) 2791–2809. J. P. Hughes, P. Guttorp and S. P. Charles, Appl. Statist. 48 (Part 1) (1999) 15–30. S. P. Charles, B. C. Bates and J. P. Hughes, J. Geophys. Res. 104(D24) (1999) 31657–31669. D. S. Wilks, J. Hydrol. 210 (1998) 178–191.
10
R. Srikanthan and G. G. S. Pegram
21. C. Jothityangkoon, M. Sivapalan and N. R. Viney, Water Resour. Res. 36(1) (2000) 267–284. 22. T. A. Buishand and T. Brandsma, Water Resour. Res. 37(11) (2001) 2761–2776. 23. R. Mehrotra and A. Sharma, 29th Hydrology and Water Resources Symposium, Canberra, 2005. 24. R. Mehrotra, R. Srikanthan and A. Sharma, 29th Hydrology and Water Resources Symposium, Canberra, 2005. 25. R. Srikanthan, Report 05/7, CRC for Catchment Hydrology (Monash University, Melbourne, 2005), 66pp. 26. N. L. Johnson and S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions (Wiley, New York, 1972). 27. W. C. Boughton, Report 99/9, CRC for Catchment Hydrology (Monash University, Melbourne, 1999), 21pp. 28. J. D. Salas, D. C. Boes, V. Yevjevich, V and G. G. S. Pegram, J. Hydrol. 44(1/2) (1979) 1–15. 29. A. Sharma, D. G. Tarboton and U. Lall, Water Resour. Res. 33(2) (1997) 291–308.
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
A SPATIAL–TEMPORAL DOWNSCALING APPROACH FOR CONSTRUCTION OF INTENSITY–DURATION–FREQUENCY CURVES IN CONSIDERATION OF GCM-BASED CLIMATE CHANGE SCENARIOS TAN-DANH NGUYEN and VAN-THANH-VAN NGUYEN∗ Department of Civil Engineering and Applied Mechanics McGill University Montreal, Quebec, Canada H3A 2K6 ∗ [email protected] PHILIPPE GACHON Environment Canada and OURANOS Consortium Montreal, Quebec, Canada This paper presents an innovative spatial–temporal downscaling approach for the construction of intensity–duration–frequency (IDF) relations based on an objective description of the direct linkage between general circulation model (GCM) simulations and annual maximum precipitation (AMP) statistical characteristics at a local site. The proposed approach is a two-step procedure which combines a statistical (spatial) downscaling method to link largescale climate variables as provided by GCM simulations with daily extreme precipitations at a local site and a temporal downscaling procedure to describe the relationships between daily extreme precipitations with sub-daily extreme precipitations using the scale-invariance (or scaling) generalized extreme value distribution. The feasibility of the proposed downscaling method has been tested based on climate simulation outputs from two GCMs under the A2 scenario (HadCM3A2 and CGCM2A2) and using available AMP data for durations ranging from 5 min to 1 day at 15 raingage stations in Quebec (Canada) for the 1961–2000 period. Results of this numerical application has indicated that it is feasible to link large-scale climate predictors for daily scale given by GCM simulation outputs with daily and sub-daily AMPs at a local site. Furthermore, it was found that AMPs at a local site downscaled from the HadCM3A2 displayed a small change in the future, while those values estimated from the CGCM2A2 indicated a large increasing trend for future periods.
1. Introduction Design rainfall, which is the maximum amount of rainfall for a given duration and for a given return period, is always required for the design of various hydraulic structures. At a site where adequate annual maximum 11
12
T.-D. Nguyen et al.
precipitation (AMP) data records are available, frequency analysis is commonly used to estimate the design rainfall for a given duration and for a selected return period. Results from the frequency analysis are usually presented under the form of intensity–duration–frequency (IDF) curves. Traditionally, to build IDF curves, a selected probability distribution is independently fitted to observed AMP for various durations. This traditional estimation method, however, has certain limitations. For example, it cannot take into consideration characteristics of precipitation for different durations (the time scaling problem); it is based on AMP data available at a local site only (the spatial scaling problem); and it is unable to account for the potential impacts of climate change and variability for future periods. Recent developments in the modeling of precipitation processes have indicated the successful application of the scale-invariance (or scaling) concept that could permit statistical inference of precipitation properties across various durations. The practical implication of this concept is that statistical properties of precipitation for certain duration within a scaling regime can be derived from statistical properties of precipitation for other durations within the same scaling regime. The scaling concept has been successfully applied to relate statistics of AMPs for different durations.1,2 In particular, Nguyen3 developed a method for the construction of IDF curves using the scaling generalized extreme value (GEV) distribution that can take into account the scaling behavior of the non-central moments (NCMs) of AMP for different durations. More recently, general circulation models (GCMs) have been proved to be able to simulate global climate variables for current period as well as for future periods under various global climate and environmental change scenarios. However, the spatial scale of these GCMs’ data is quite large (hundred square kilometers) and does not satisfy local scale (point or at site) climate conditions usually required for climate-related impact assessments. Fortunately, various spatial downscaling techniques have been developed to provide the linkage between GCMs’ large-scale climate simulations and precipitation characteristics at a local site. One of the most commonly used spatial downscaling techniques is the method based on the statistical downscaling model (SDSM) proposed by Wilby et al.4 The SDSM relates global atmospheric variables (called “predictors”) to a local weather variable (called “predictand”) using a multiple linear regression relation. Once such relation is established, the predictand can be generated from the predictors that are simulated by a GCM for different
A Spatial–Temporal Downscaling Approach for Construction of IDF Curves
13
climate scenarios. Results of the application of SDSM using data in Quebec have suggested that it could be used to generate adequately local weather variables from global predictors.5 However, GCMs and SDSM produce data for the daily time scale while the construction of the IDE relations requires data at sub-daily scales (e.g. hourly duration). Hence, a temporal downscaling method is necessary to provide the linkages between daily AMPs and sub-daily values. In view of the above-mentioned issues, the present study proposes therefore a statistical downscaling approach that can be used to link the climate change scenarios given by GCMs to AMPs at a local site. More specifically, the proposed approach is based on a combination of a spatial downscaling method to link large-scale climate variables as provided by GCM simulations with daily extreme precipitations at a local site and a temporal downscaling procedure to describe the relationships between daily extreme precipitations with sub-daily extreme precipitations using the scaling GEV distribution. The proposed spatial–temporal downscaling method was tested using AMP data at 15 raingages in Quebec (Canada) and based on A2 climate change scenario simulation results (denoted by CGCM2A2 and HadCM3A2, respectively) provided by the Canadian and UK GCMs for the current 1961–2000 period as well as for future 2020s, 2050s, and 2080s periods. Results of this numerical application have indicated that, after some bias correction, it is feasible to develop an accurate linkage between the daily AMPs spatially downscaled from GCM simulations with the observed daily AMPs at local stations. These results suggest that it is possible to use the climate predictors given by GCM simulations under the A2 scenario for projecting the variability of daily AMPs for future periods. On the basis of these results for daily AMPs, the IDF curves for the current 1961–1990 period and for future periods (2020s, 2050s, and 2080s) were constructed using the proposed temporal GEV-scaling method for sub-daily AMPs. In general, it was found that the IDF curves based on HadCM3A2 simulations for future periods are quite similar to those for the current period while those using CGCM2A2 indicated a large increasing trend in the future.
2. The Spatial–Temporal Downscaling Method As mentioned above, the proposed downscaling approach consists of two basic components: (1) a spatial downscaling method to link large-scale
14
T.-D. Nguyen et al.
climate variables as provided by GCM simulations with daily extreme precipitations at a local site using the popular SDSM4 ; and (2) a temporal downscaling procedure to describe the relationships between daily extreme precipitations with sub-daily extreme precipitations using the scaling GEV distribution2,3 for the construction of the IDF curves at the site of interest. Brief descriptions of the two components are presented in the followings.
2.1. Spatial downscaling technique using SDSM In general, a spatial downscaling technique is based on the view that the regional climate is conditioned by two factors: large-scale (global) climatic state and local physiographic features.6 From this perspective, local information is derived by first determining a statistical model which relates global atmospheric variables (called predictors) to any of local weather variables (called predictand). Predictors given by GCM simulations are then fed into this statistical model to estimate the corresponding predictand. In particular, the linear regression-based spatial downscaling technique, called SDSM, as proposed by Wilby et al.4 have been commonly used in practice for constructing climate scenarios for various climate-related impact studies. The SDSM could provide a linkage between surface climate variables at individual sites for daily time scale (e.g. precipitation and temperature extremes) with grid-resolution daily GCM climate simulation outputs. Detailed description of the SDSM can be found in Wilby et al.4 As expected, the daily AMPs that are extracted from daily precipitation series given by the spatial downscaling of GCM outputs using the SDSM method are often not comparable to the observed daily AMPs at a local site. Therefore, an adjustment procedure is needed in order to improve the accuracy of the spatial downscaling SDSM technique in the estimation of local daily AMPs. The proposed adjustment can be described in more detail as follows: Let yτ = yˆτ + eτ
(1)
in which yτ is the adjusted daily AMP at a probability level τ , yˆτ is the corresponding GCM–SDSM estimated daily AMP, and eτ is the residual associated with yˆτ . The estimated residual eτ can be computed using the following equation: eτ = m0 + m1 yˆτ + m2 yˆτ2 + ε
(2)
A Spatial–Temporal Downscaling Approach for Construction of IDF Curves
15
in which m0 , m1 , and m2 are parameters of the regression function, and ε is the resulting error term.
2.2. A temporal downscaling method using the scaling GEV distribution The proposed temporal downscaling method is based on the concept of scale-invariance (or scaling). By definition, a function f (x) is scaling if f (x) is proportional to the scaled function f (λx) for all positive values of the scale factor λ. That is, if f (x) is scaling then there exists a function C(λ) such that f (x) = C(λ)f (λx)
(3)
It can be readily shown that C(λ) = λ−β
(4)
in which β is a constant, and that f (x) = xβ f (1)
(5)
Hence, the relationship between the NCM of order k, µk , and the variable x can be written in a general form as follows: µk = E{f k (x)} = α(k)xβ(k)
(6)
in which α(k) = E{f k (1)} and β(k) = βk. Application of the GEV distribution to model the annual series of extreme rainfalls has been advocated by several researchers.7 The cumulative distribution function, F (x), for the GEV distribution is given as 1/κ κ(x − ξ) F (x) = exp − 1 − α
κ = 0
(7)
where ξ, α, and κ are, respectively, the location, scale, and shape parameters. It can be readily shown that the kth-order NCM, µk , of the
16
T.-D. Nguyen et al.
GEV distribution (for k = 0) can be expressed as α k α k µk = ξ + + (−1)k Γ(1 + kκ) κ κ k−1 α i α k−i ξ+ (−1)i Γ(1 + iκ) +k κ κ i=1
(8)
where Γ(·) is the gamma function. Hence, on the basis of Eq. (8), it is possible to estimate the three parameters of the GEV distribution using the first three NCMs. Consequently, the quantiles (XT ) can be computed using the following relation: α (9) XT = ξ + {1 − [− ln(p)]κ } κ in which p = 1/T is the exceedance probability of interest. The scaling behavior of AMPs could be examined based on the powerform relationship between the kth order NCM of AMP data for durations t and λt. Consequently, the NCMs (and the corresponding GEV parameters) of AMP for duration λt can be computed from those for duration t provided that these NCMs are within the same scaling regime. Hence, the proposed scaling GEV distribution can be used to derive the IDF relationships for AMPs for different durations. 3. Numerical Application To illustrate the application of the proposed spatial–temporal downscaling approach, a case study is carried out using both global GCM climate simulation outputs and at-site AMP data available at 15 raingage stations in Quebec (Canada). The selected global GCM predictors are given by the CGCM2A2 and HadCM3A2 simulations for the 1961–2000 period as well as for some future periods 2020s, 2050s, and 2080s, while the at-site AMP series for durations ranging from 5 min to 1 day used in this study are available only for the 1961–2000 period. Furthermore, data for the 1961– 1990 period were used for model calibration and data for the remaining 1991–2000 period were used for validation purposes. The computational procedure for the proposed downscaling method can be summarized as follows: (i) Calibrate and validate the (spatial) downscaling SDSM model using the at-site daily precipitation as predictand and global GCM atmospheric variables as predictors5 ;
A Spatial–Temporal Downscaling Approach for Construction of IDF Curves
17
(ii) Generate 100 samples of 30-year daily precipitation series at a given site using the calibrated SDSM and the corresponding GCM predictors, and extract daily AMP series from these generated samples; (iii) Perform necessary adjustment of the GCM-downscaled daily AMP series using Eqs. (1) and (2) in order to obtain a good agreement between the mean series of GCM-downscaled daily AMPs with the at-site observed daily AMPs; (iv) Investigate and establish the scaling relations between the NCMs of observed at-site AMPs for various durations; (v) Calibrate the scaling GEV model using observed at-site AMPs in order to establish the linkage between daily and sub-daily AMPs; (vi) Construct IDF curves using the adjusted GCM-downscaled AM daily precipitations and the estimated sub-daily AMP amounts given by the calibrated scaling GEV model. (vii) Repeat steps (ii) to (vii) to construct IDF curves for future periods (2020s, 2050s, and 2080s) based on the corresponding GCM predictors for these periods. For purposes of illustration, Fig. 1 presents the probability plots of AMPs downscaled from CGCM2A2 and HadCM3A2 as compared to those of observed at-site AMPs for the 1961–1990 calibration period for Dorval station. It can be seen that the GCM-downscaled AMPs do not agree well with the observed at-site amounts. Figure 2 shows the good fit of the second-order correction function (Eq. 2) to these differences (or residuals) for both GCMs. Hence, as indicated in Fig. 1, after making the adjustment of the downscaled AMPs using the fitted correction function, a very good agreement can be achieved between the adjusted GCM-downscaled amounts and the observed at-site values. The adjustment functions developed based on data for the 1961–1990 calibration period were then applied to the downscaled AMPs for the 1991–2000 period to assess their validity. Figure 3 shows the improved closeness between the adjusted downscaled AMPs and the observed values as compared to the unadjusted downscaled AMP amounts. Hence, it is feasible to use the adjustment function derived from data for the 1961–1990 calibration period and for other time periods in the future. Similar results were found for other stations. To assess the scaling behavior of the at-site AMP series, the log–log plots of the first three rainfall NCMs against duration are prepared for all
18
T.-D. Nguyen et al. Dorval
Dorval
90
100
80
90 80
Daily AMF
Daily AMF
70 60 50
70 60 50
40
40
Observed HadCM3A2
30
Observed CGCM2A2 Adj-CGCM2A2
30
Adj-HadCM3A2 20
20 1
10
1
100
10
100
Return period (year)
Return period (year)
Fig. 1. Probability plots of daily AMPs downscaled from HadCM3A2 and CGCM2A2 before and after adjustment for Dorval station for the calibration 1961–1990 period.
Dorval
Dorval
12
6
10
4 2
6
Residuals
Residuals
8 4 2 0
0 -2 -4
-2
Calculation
-4
Calculation
-6
Fitting curve
Fitting curve
-8
-6 20
40
60
HadCM3A2 daily AMP
80
20
40
60
80
100
CGCM2A2 daily AMP
Fig. 2. Error-adjustment functions for daily AMPs downscaled from HadCM3A2 and CGCM2A2 for Dorval station for the calibration 1961–1990 period.
15 stations. For purposes of illustration, Fig. 4 shows the plot for Dorval station. The log-linearity exhibited in the plot indicates the power law dependency (i.e. scaling) of the rainfall statistical moments with duration (Eq. 5) for two time intervals: from 5 min to 1 h, and from 1 h to 1 day. Hence, for a given location, it is possible to determine the NCMs and the
A Spatial–Temporal Downscaling Approach for Construction of IDF Curves Dorval
100
100
90
90
80
80
70
70
Daily AMF
Daily AMF
Dorval
60 50 40
60 50 40
Observed HadCM3A2 Adj-HadCM3A2
30
19
Observed CGCM2A2 Adj-CGCM2A2
30
20
20 1
10 Return period (year)
100
1
10
100
Return period (year)
Fig. 3. Probability plots of daily AMPs downscaled from HadCM3A2 and CGCM2A2 before and after adjustment for Dorval station for the validation 1991–2000 period.
distribution of rainfall extremes for short durations (e.g. 1 h) using available rainfall data for longer time scales (e.g. 1 day) within the same scaling regime (β is known). For purposes of illustration, Fig. 5 shows the plots of 5-min AMPs at Dorval station for the 1961–1990 period and for future periods (2020s, 2050s, and 2080s) using the proposed spatial–temporal downscaling method. It can be seen that the HadCM3A2 scenario suggested a small change of AMPs in the future, while the CGCM2A2 model indicated a large increasing trend for future periods.
4. Conclusions A spatial–temporal downscaling approach was proposed in the present study to describe the linkage between large-scale climate variables for daily scale to AM precipitations for daily and sub-daily scales at a local site. The feasibility of the proposed downscaling method has been tested based on climate simulation outputs from two GCMs under the A2 scenario (HadCM3A2 and CGCM2A2) and using available AM precipitation data for durations ranging from 5 min to 1 day at 15 raingage stations in Quebec (Canada) for the 1961–2000 period. Results of this numerical application have indicated that it is feasible to link daily large-scale climate variables
20
T.-D. Nguyen et al.
6
Log–log plot, NCMs versus Duration, Dorval
10
5
Non-central moments
10
NCM 1 NCM 2 NCM 3
4
10
3
10
2
10
1
10
0
10 -2 10
-1
10
0
10 Duration (h)
1
10
2
10
Fig. 4. The log–log plot of maximum rainfall non-central moments (NCMs) versus rainfall duration for Dorval station.
to daily AM precipitations at a given location using a second-order errorcorrection function. Furthermore, it was found that the AM precipitation series in Quebec displayed a simple scaling behavior within two different time intervals: from 5 min to 1 h, and from 1 h to 1 day. Based on this scaling property, the scaling GEV distribution has been shown to be able to provide accurate estimates of sub-daily AM precipitations from GCM-downscaled daily AM amounts. Therefore, it can be concluded that it is feasible to use the proposed spatial–temporal downscaling method to describe the relationship between large-scale climate predictors for daily scale given by GCM simulation outputs and the daily and sub-daily AM precipitations at a local site. This relationship would be useful for various climate-related impact assessment studies for a given region. Finally, the proposed downscaling approach was used to construct the IDF relations for a given site for the 1961–1990 period and for future
A Spatial–Temporal Downscaling Approach for Construction of IDF Curves GEV Dist. of AM 5-min Precip. after adjustment (HadCM3A2), Dorval
GEV Dist. of AM 5-min Precip. after adjustment (CGCM2A2), Dorval
20 AM 5-min Precipitation (mm)
AM 5-min Precipitation (mm)
18 16 14 12 10 8
1961–1990 2020s 2050s 2080s
6 4 0 10
21
1
10 Return period (years)
18 16 14 12 10 8
1961–1990 2020s 2050s 2080s
6 2
10
4 100
101
102
Return period (years)
Fig. 5. Probability plots of 5-min AMPs projected from HadCM3A2 and CGCM2A2 scenarios for the 1961–1990 period and for future periods (2020s, 2050s, and 2080s) for Dorval station.
periods (2020s, 2050s, and 2080s) using climate predictors given by the HadCM3A2 and CGCM2A2 simulations. It was found that AMPs at a local site downscaled from the HadCM3A2 displayed a small change in the future, while those values estimated from the CGCM2A2 indicated a large increasing trend for future periods. This result has demonstrated the presence of high uncertainty in climate simulations provided by different GCMs. Further studies are planned to assess the feasibility and reliability of the suggested downscaling approach using other GCMs and data from regions with different climatic conditions.
References 1. P. Burlando and R. Rosso, J. Hydrol. 187 (1996) 45 . 2. V.-T.-V. Nguyen, T.-D. Nguyen and F. Ashkar, Water Sci. Technol. 45, (2002) 75. 3. T.-D. Nguyen, Doctoral Thesis (Department of Civil Engineering and Applied Mechanics, McGill University, Montreal (Quebec), Canada, 2004), 221 pp. 4. R. L. Wilby, C. W. Dawson and E. M. Barrow, Environ. Model. Software 17 (2002) 147. 5. V.-T.-V. Nguyen, T.-D. Nguyen and P. Gachon, in Advances in Geosciences, Vol. 4: Hydrological Sciences, eds. N. Park et al. (World Scientific Publishing Company, Singapore, 2006), pp. 1–9. 6. H. von Storch, E. Zorita and U. Cubasch, J. Climate 6 (1993) 1161. 7. M. D. Zalina, M. N. M. Desa, V.-T.-V. Nguyen and K. Amir, Water Sci. Technol. 45 (2002) 63.
This page intentionally left blank
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
DEVELOPMENT AND APPLICATIONS OF THE ADVANCED REGIONAL ETA-COORDINATE NUMERICAL HEAVY-RAIN PREDICTION MODEL SYSTEM IN CHINA CUI CHUNGUANG∗ , LI JUN and SHI YAN Institute of Heavy Rain, CMA, Wuhan, China ∗ No.3 Donghudong Road, Hongshan District, Wuhan Hubei, China 430074 [email protected]
Regional Eta model (REM) is a limited-area numerical heavy-rain prediction model with Chinese climate and topography features developed by the Institute of Atmospheric Physics of the Chinese Academy of Science in the 1990s, in which a ladder topography coordinate is adopted. A two-step shapepreserving advection scheme is designed to assure the computational precision of water vapor transfer directly associated with rainstorms. In recent years, Wuhan Institute of Heavy Rain has made improvements to the REM in areas of resolution, model standardization, lateral-boundary conditions, physical parameterization and data assimilation, and established the advanced regional eta model (AREM). This model has gone through several updates from Arem V2.1, Arem V2.3 to present Arem V3.0. AREM has shown good performances in operational applications in recent years. It depicted well the rainband feature, intensity, vertical structure of precipitation, and evolution of the hourly rainfall for a variety of heavy-rain events in regions to the east of Qinghai-Tibet Plateau in China. The next step will be focused on the research of data assimilation using observations from radar, global positioning system, satellite and gauge data, and on the improvement of the AREM physics scheme. Applications of quantitative precipitation forecast in hydrology are also under investigation and development.
1. Development of Advanced Regional Eta Model Facing the challenge of dealing with abrupt terrain in numerical model to lessen great impacts of the complex terrain around Qinghai-Tibet Plateau on Chinese weather and climate, Chinese scientists have done great effective work in restraining the dummy effect resulting from terrain. In 1963, Zeng1 advanced the standard stratification stationary deducting method. The weather change is showed as a variety of deviation to “standard state air,” thus the calculation error of atmosphere movement equations called 23
24
C. Chunguang et al.
“big terms-small difference” resulting from abrupt terrain can be reduced, and calculation precision is improved. Yan and Qian2 in the late 1970s and early 1980s discussed on the problem of calculating pressure force in the numerical model with terrain. Qian and Zhong3 advanced an algorithm to calculate pressure force, based on which a new method of calculating pressure force at abrupt terrain region is designed called error-deduction method. To resolve the problem of σ-coordinate declining, Zeng advanced a kind of amendatory σ-coordinate, which is basically in accordance to the η-coordinate advanced by Mesinger.4 The η-coordinate keeps the advantage of the σ-coordinate with simple underlying condition, while getting rid of the disadvantage and keeping a quasi-horizontal coordinate plane. Under the direction of Zeng, the Institute of Atmospheric Physics, Chinese Academy of Science (IAP, CAS) began to develop a numerical prediction model frame taking abrupt terrain into account in 1986.5 In view of the difficulty of dealing with water vapor advection, two-step shape-preserving advection scheme,6,7 with good calculation capacity is designed to adapt to temporal computer level in our country, which guarantees the calculation precision of water vapor transfer directly associated with rainstorm. This model gave successive simulation to leeward cyclone of Qinghai-Tibet Plateau and the most typical heavy-rain phenomenon effected by terrain in China — “Aleak sky in Yaan”.8–10 Since 1993, a great deal of effective work has been done on operational application of the model. A preprocess system consists of decoding, error-checking, quality-control, and objective analysis. In the flood season of 1993 and 1994, this model was used in the Weather and Climate Forecast Center, IAP, CAS and Hunan province, and the forecast experiments obtained good results.11,12 As the simulation and prediction abilities of this model is well recognized, IAP, CAS hold the generalization training of this model in the spring of 1995 and it was then named as regional etacoordinate model (REM). Soon REM was widely used in researching and operational organizations conducting meteorology, hydrology, environment, and military affairs. The relevant research area includes heavy-rain simulation and forecast,13–29 snow simulation,30,31 drought estimation,32,33 flood monitoring and forecast,34 environment pollution simulation,35–37 meso-scale systems (e.g. squall line),38,39 typhoon-led rainstorm,40,41 etc. REM became one of the main tools for meso-scale study,42–46 including heavy-rain research in China. From November 9 to December 5, 1998, the International Center of Climate and Environments (1CCES), IAP, CAS held the third world science academy numerical model training class in Beijing, through which REM was introduced to countries
Development and Applications of the AREM
25
such as Pakistan, Syria, Sri Lanka, and Thailand. REM has already been applied well in some neighborhood countries and brought some impact upon international research area.47–49 During 1995–1998, with the expanding application area of REM, some users made necessary modifications and developments according to their own research or application needs.22 Due to lack of fund support and cooperation, there is no radical change in the whole model frame and the simulation/forecast ability. From 1999 to 2003, under the support and organization of the national key-point basic research programming item named “research on Chinese significant weather disaster mechanism and forecasting theory,” great progresses and improvements have been made on such aspects as follows: resolution, model standardization, side-boundary condition, physics parameterization, and objective analysis system. Thus a more advanced limited-area η-coordinate rainstorm forecasting system (AREM) was established. With the support of this project, the first advanced version, AREM 2.1, was firstly put into real-time operational forecast experiment in Anhui and Hubei Provincial Meteorological Bureau in 2002. According to the prediction results shown by Hubei Provincial Meteorological Bureau, AREM 2.1 is apparently improved than REM on heavy-rain forecast ability. New versions of AREM (AREM 2.3 and 3.0) were developed in the last two years. In order to make the new versions serve for Chinese heavy rain and forecast, the Laboratory of Atmospheric Science and Geological hydromechanics (LASG), IAP, CAS combining with the Institute of Heavy Rain, CMA, Wuhan and Beijing Applied Meteorology Institute, held the generalization training class in Beijing during April 22–23, 2004. About 50 researchers from more than 10 provinces or cities participated in the communication and discussion. During the development from REM to AREM, in order to establish a heavy-rain numerical forecast model with modern state of the art, the original REM program was completely changed into modularized structure, which helps for further development and generalization. The REM mainly focuses on making model frame to fit for abrupt or complex terrain, while its physical process was not chosen and debugged well. So the key point of developing AREM is to renew and perfect physical process; and the enhancement of resolution is a necessary work. Table 1 shows the contrast of three AREM versions on resolution, model layer top, main physical process, and initial value process. All versions of AREM have modularized structure and job sheet function. It is an essential improvement compared to former REM
26
C. Chunguang et al. Table 1.
Main features of different versions. Parameter
Model
Resolution Side boundary and top boundary
Physical course Boundary condition
Cloud and rain
Surface flux and radiation
Initial value
AREM 37 km, V2.1 20–25 layers
Fixed boundary, 100 hPa
Non-local boundary scheme
Saturation condensation, Betts convection adjustment
Bulk permutation method, surface daily change
Advanced Barnes
AREM 37 km, V2.3 20–32 layers
Fixed boundary/ timevariation boundary, 100–10 hPa
Non-local boundary scheme
Warm cloud micro-physics process, Betts convection adjustment
Multistratification flux-profile method, surface daily change
Advanced Barnes, threedimentional variation
AREM 18 km V3.0 20–35 layers
Fixed boundary/ timevariation boundary, 100–10 hPa
Non-local boundary scheme
Cold cloud micro-physics process, Betts convection adjustment
Bulk permutation method, CLM land surface process, unabridged radiation process
Advanced Barnes, threedimentional variation
(AREM 2.1). Besides this, both the vertical and horizontal resolutions are analyzed twice. As for model boundary, non-local boundary scheme replaces the former K-diffusion local scheme; and the NCEP re-analysis data or T106 or T213 data are added as background field for the objective analysis. The main improvement of AREM 2.3 from AREM 2.1 is that the higher model top boundary has been reduced from 100 to 10 hPa. An explicit cloud and rain scheme replaces original implicit grid-point saturation congelation large-scale precipitation scheme. The calculation methods of surface sensible heat and latent heat are advanced. AREM 3.0 is the original version of the model with the highest resolution at present. Compared to AREM 2.0, the main changes are as follows: the resolution being twice, the additional selection of Common Land Model (CLM) land surface process time-variation side boundary, and the introduction of unabridged radiation process. Finally, AREM 3.0 will include consummate physics course and self-contained initial assimilation system, which can satisfy the main requirements of short-time weather forecast operation and regional weather/climate simulation research.
Development and Applications of the AREM
27
2. Application of AREM in China Influenced by East-Asian monsoon, summer precipitation in China could be divided into three phases: early flood season precipitation in South China during May to the first 10 days of June, Meiyu front precipitation in the Yangtze valley during the middle 10 days of June to the middle 10 days of July, rainband in North China during the middle 10 days of July to the middle 10 days of August. Forecast experiments to these three kinds of precipitation can reveal the prediction ability of AREM on Chinese summer precipitation. 2.1. Precipitation forecast experiments of different operational models At present, there are multi-kinds of quantitative precipitation forecast methods such as global spectral model T213 by National meteorological Center, East-Asian regional spectral model by Japan, and MM5 grid-point model by PSU/NCAR,USA. To examine the practical forecast ability of AREM, quantitative precipitation forecast results of the above four models during June, July, and August in 2003 and 2004 were tested (data not shown). Results show that the AREM model is more advantageous than the other models on Meiyu front precipitation forecast. According to the result in 2004, AREM gets the highest shine-rain test score of 65.3, which shows much improved result than obtained from the other models. Such results show that this model can well describe the changes between shiny and rainy weather in the flood season of Hubei and the simulated rain area distribution is close to the observed weather. Besides, AREM gets the highest score on precipitation forecast of 2–10 and 10–25 mm. As for precipitation of 25–50 mm, Japanese Regional Spectral Model (RSM) gets the best forecast result, while the second best is AREM. As for precipitation of 50–100 mm, AREM gets the highest test score of 8.8, and the simulated center of rainfall area are close to the observed condition, showing AREM’s good forecast ability of severe rainfall. 2.2. Temporal and spatial evolvement forecast experiment of AREM In order to evaluate the whole simulation ability of precipitation and examine if it can describe the main rain events during the flood season of 2003, a 24-h post-analysis experiment lasting for 3 months from June to
28
C. Chunguang et al.
(a)
(b)
Fig. 1. Time-meridional cross section of daily mean precipitation rate in JJA 2003 (averaged between 110 and 122.5◦ E): (a) observed; (b) post-analyzed by AREM (unit: mm).
August 2003 using AREM, with daily 00:00 UTC NCEP re-analysis data (1◦ × 1◦ horizontal resolution) as initial condition. Figure 1b shows the simulated time-meridional cross section of daily mean precipitation rate in East China (110–122.5◦ E). Compared to Fig. 1a, it can be seen that the post-analyzed precipitation southern–northern undulation by AREM 2.3 is according to the observed condition. Figures 2a–2d shows, respectively, the post-analysis and observed daily mean precipitation rate in Northeast China (121–128◦ E, 38–45◦ N), North China (115–120◦ E, 35–40◦ N), Southeast China (105–120◦ E, 21–33◦ N), and Southwest China (108–112◦ E, 28–31◦ N). It can be seen that AREM well describes the strengthening and weakening course of daily precipitation and the post-analyzed daily rain rate is basically according to the observed condition. 2.3. Simulation experiments of some important rain event The severe rainfall forecast ability of AREM could be revealed by the simulation experiments of some important rain event. Figure 3 is the AREM
Development and Applications of the AREM
29
(a)
(b)
(c)
(d)
Fig. 2.
Regional mean daily precipitation rates (mm/day).
simulation result of a special heavy-rain event in the middle reaches of the Yangtze River in June 1998. The broken line with crosses is the observed maximum precipitation, while the solid line is the simulated result and broken line with rings is the observed mean precipitation. It can be seen from the figure that AREM successfully describes this rain event.
30
C. Chunguang et al.
Fig. 3.
Contrast of observed condition and simulated result.
Figure 4 shows a local severe rainfall event in Dongyang city, Guangdong province simulated by AREM. Because the spatial scale is small the simulation is three-self-nesting, respectively, with the resolution of 37, 12, and 6 km. Result shows that AREM successfully simulates this event. 2.4. Data assimilation experiments based on AREM In order to improve the quality of initial field, assimilation experiments based on AREM have been done. For instance, variation analysis based on Grapes/3DVAR system, 1DVAR assimilation by adjusting humidity and temperature profile, 3DVAR assimilation of radar wind field, and 3DVAR of ATOVS. The above assimilation experiments show that the assimilation of multi-data brings positive effect (data not shown) to forecast result. 3DVAR system of AREM has been established. 3. Considerations for Further Development According to the development and wide application at present, AREM is an important numerical model tool, which gains much attention from users of Chinese meteorological, hydrological, and environmental research and operation. The distinguished simulation and forecast ability of AREM on Chinese heavy rain shows that AREM could catch some key factors of Chinese weather events and it has the large potency of being continuing developed and applied more widely. The possible directions of its further development are as follows: 1. Improvements on physics schemes including cloud and rain course, boundary layer scheme, radiation course, surface flux parameterization, and so on.
Development and Applications of the AREM
OBS
12KM
31
37KM
6KM
Fig. 4. Three-self-nesting of AREM (heavy-rain event in Yangjiang city, Guangdong province on May 7, 2004).
2. Research on data assimilation technique. To develop schemes with the capability of assimilating multi-kinds of high-resolution data such as radar reflectivity and wind, satellite data, wind profile, GPS water vapor content, rain gauge data, surface station, etc. Learn from RUC system and LAPS to build a meso-scale re-analysis system. 3. Research on short-term ensemble forecast technique based on initial field disturbance. 4. Exploiting researches on numerical model application in other fields, for instance, flood forecast based on hydro-meteorology and geological disaster forecast led by heavy rain.
32
C. Chunguang et al.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
Q. C. Zeng, Acta Meteorol. Sin. 33 (1963), 472. H. Yan and Y. F. Qian, Chin. J. Atmos. Sci. 5 (1981), 175. Y. F. Qian and Z. J. Zhong, Meteorol. Soc. Jpn. Special Volume (1986), 743. F. Mesinger, Riv Meteor. Aeronau. 44 (1984), 195. R. C. Yu, Chin. J. Atmos. Sci. (In English). 13 (1989), 145. R. C. Yu, Adv. Atmos. Sci. 11 (1994), 479. R. C. Yu, Adv. Atmos. Sci. 12 (1995), 13. G. K. Peng, F. X. Cai, Q. C. Zeng, et al., Chin. J. Atmos. Sci. 18 (1994), 466. R. C. Yu, Q. C. Zeng, G. K. Peng, et al., Chin. J. Atmos. Sci. 18 (1994), 535. Q. C. Zeng, R. C. Yu, G. K. Peng, et al., Chin. J. Atmos. Sci. 18 (1994), 649. R. C. Yu, Chin. J. Atmos. Sci. 18 (1994), 284. R. C. Yu, Chin. J. Atmos. Sci. 18(Suppl), (1994) 801. P. M. Dong and S. X. Zhao, Climatic Environ. Res. 8 (2003), 230. W. C. Zhang, J. M. Zheng and Z. N. Xiao, J. Guizhou Meteorol., 27 (2003), 19. C. G. Cui, A. R. Min and B. W. Hu, Acta Meteorol. Sin. 60 (2002), 602. W. Y. Ma and L. P. Zhang, Meteorol. J. Hubei., 2 (2002), 1. X. P. Zhong and Q. T. Qing, J. Appl. Meteorol. Sci. 12 (2001), 167. Q. T. Qing, X. P. Zhong and C. G. Wang, Meteorol. Monthly 26 (2000), 19. Q. T. Qing and X. P. Zhong, J. Sichuan Meteorol. 2 (2000), 29. Z. G. Zhou, W. H. Zhang and Y. Q. Jiang, Sci. Meteorol. Sin. 20 (2000), 453. X. P. Zhong and Q. T. Qing, J. Sichuan Meteorol. 2 (2000), 7. Z. G. Zhou, W. H. Zhang, W. F. Hao, et al., Chin. J. Atmos. Sci. 23 (1999), 597. Z. G. Zhou, W. H. Zhang, X. X. Cheng, et al., Plateau Meteorol. 18 (1999), 171. Z. G. Zhou, W. H. Zhang, N. S. Lin, et al., J. Trop. Meteorol. 15 (1999), 146. C. A. Fang, X. N. Mei and G. X. Mao, Meteorol. Monthly. 25 (1999), 15. X. L. Tang and G. An, J. Jilin Meteorol. 1 (1998), 30. H. Chen, J. H. Sun, N. F. Bei, et al., Climatic Environ. Res. 3 (1998), 382. S. X. Zhao, Chin. J. Atmos. Sci. 22 (1998), 503. Y. G. Wang, Y. Yang and Y. P. Xu, Martial Meteorol. 3 (1996), 13. Q. T. Qing, Y. H. Xu and X. P. Zhong, J. Sichuan Meteorol. 2 (1999), 6. Q. T. Qing, Y. H. Xu and X. P. Zhong, J. Sichuan Meteorol. 3 (1999), 12. L. P. Zhang, W. C. Chen, J. Xia et al., Eng. J. Wuhan Univ. (Eng. Ed.), 36 (2003), 24. J. R. Dong and A. S. Wang, J. Nat. Disst. 6 (1997), 70. K. J. Zhang, K. J. Zhou, P. Liu, et al., J. Nanjing Inst. Meteorol. 24 (2001), 587. L. X. Ju, X. E. Lei and Z. W. Hang, J. Grad. Sch. Chin. Acad. Sci. 20 (2003), 470.
Development and Applications of the AREM
33
36. Z. W. Hang, S. Y. Du, X. E. Lei, et al., Chin. Environ. Sci. 22 (2002), 202. 37. H. W. Gao, M. Y. Huang and F. Q. Yu, Environ. Sci. 19 (1998), 1. 38. Z. G. Zhou, W. M. Wang, Y. Q. Jiang, et al., Sci. Meteorolo. Sin. 22 (2002), 474. 39. H. Z. Li, Z. Y. Cai and Y. T. Xu, Chin. J. Atmos. Sci. 23 (1999), 713. 40. Z. F. Tian, Q. B. Teng and Z. S. Wang, J. Trop. Meteorol. 14 (1998), 163. 41. Y. Q. Jiang, C. Y. Wang, W. H. Zhang, et al., Acta Meteorol. Sin. 61 (2003), 312. 42. Z. Y. Cai and R. C. Yu, Chin. J. Atmos. Sci. 21 (1997), 459. 43. Y. B. Huang, H. C. Lei, Z. H. Wang, et al., J. Nanjing Inst. Meteorol. 26 (2003), 668. 44. Y. B. Huang, H. C. Lei, G. Xueliang, et al., Plateau Meteorol. 22 (2003), 574. 45. C. G. Cui, Meteorol. Monthly, 26 (2000), 3. 46. W. H. Zhang, Y. Q. Jiang, Z. G. Zhou, et al., J. PLA Univ. Sci. Technol. 1 (2000), 77. 47. S. X. Zhao, International Workshop on NWP Model for Pakistan and Bangladesh, (GIK Institute, Topi, Pakistan, 1997). 48. K. R. FawzHaq and Z. R. Siddiqui, Proceedings of the 6th APC— MCSTA(Asia-Pacific Multilateral Cooperation in Space Technology and Applications), Beijing, P. R. China, September 18–21 (2001). 49. S. Nakapan and J. Kreasuwan, The 28th Congress on Science and Technology of Thailand, Bangkok, Thailand, October 22–25 (2002).
This page intentionally left blank
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
METHOD OF CORRECTING VARIANCE OF POINT MONTHLY RAINFALL DIRECTLY ESTIMATED USING LOW FREQUENT OBSERVATIONS FROM SPACE EIICHI NAKAKITA∗ , SYUNSUKE OKANE and LISAKO KONOSHIMA Disaster Prevention Research Institute, Kyoto University Gokasho Uji, Kyoto 611-0011, Japan ∗ [email protected]
The purpose of this paper is to estimate the climatologically important stochastic information such as the variance of monthly rainfall, the time and spatial correlation lengths of point rainfall intensity, using the tropical rainfall measuring mission (TRMM)/PR observations. First, the expectation of the sample variance of the point monthly rainfall was formulated as a function of observation length m and observation frequency n, as well as time correlation length ζ and spatial correlation length λ of instantaneous point rainfall. A method of identifying population time correlation length z and population spatial correlation length l was developed. Validation using ground-based radar and application into TRMM/PR were shown. As a result, it becomes not only a method of estimating the population variance of point monthly rainfall, but also the population time and spatial correlation lengths using only TRMM/PR and GPM. Using only TRMM/PR observations, standard deviation of the point monthly rainfall, the time and spatial correlation lengths of instantaneous rainfall were estimated for a few different regions in Asia.
1. Introduction The significance of measuring tropical region precipitation is to figure out the reason of climate change in the global scale. Our country has the highest level rain gauge network, however there are not enough observation facilities to measure rainfall in other areas of the world and. The tropical rainfall measuring mission (TRMM) was suggested to observe climatological values of those tropical areas. Understanding the change of region average precipitation in the area of latitude 5◦ and longitude 5◦ is a great purpose of TRMM to get the above climatological values. To examine the feasibility, estimation accuracy over a sea surface area1 compared to estimation over the Gobi desert2 and over our country has been considered. Besides that, 35
36
E. Nakakita et al.
Nakakita et al.3 proposed dependency of rainfall on topographic elevation after the launch of TRMM. These researches, however, a priori estimate stochastic parameters such as variance of point monthly rainfall, time and spatial correlation lengths of point rainfall intensity using long-term observations of available radar and/or rain guage. Meantime, Nakakita et al.4 showed possibility to find relation between the observation frequency and the computed sample variance of monthly rainfall. Under these circumstances, the purpose of this research is to propose a method of estimating not only the variance of point monthly rainfall but also time and spatial correlation lengths of point rainfall intensity.
2. Correction of the Variance of Point Monthly Rainfall 2.1. Feasibility of correction The first purpose of TRMM is to estimate tropical area precipitation climatological value which differs from other satellites, therefore, its inclined angle of the orbit is 35◦ , observing mostly the tropical area putting a special emphasis on monthly rainfall estimation over 5◦ square region. However, at the low latitudes (lower than latitude 20◦ ), the number of monthly observations is around 30, while at the high latitudes (around latitude 33◦ ) it is around 60 which indicates that the observation is very intermittent. Here, from the continuous observations (per 5 min) by rain gauge of the Miyama radar of the MLIT (Ministry of Land, Infrastructure and Transport Japan), the relationship between the spatial average of the variance of the point monthly rainfall and the number of observations was drawn by Nakakita et al.,4 which is depicted in Fig. 1. Notice that the data observed every 5 min from July to October 1998 and from June to October 1999 is used, which has been thinned out in an equal time range for each month. Also, the mean of 3 km square mesh is taken as the point precipitation, and sample variance of the monthly rainfall is taken by the spatial averaging over the 240 km square region. From the law of large numbers, it is said that the spatial averaged sample variance calculated from the low frequent observation value is estimated larger. The number of observational frequencies per month is around 30–60 in TRMM, which is clear that the variance is liable to be overestimated. Also, we can see that there is some kind of a relationship between the spatial average of sample variance of the point monthly rainfall and the number of observations which
Sample variance of point monthly rainfall (mm2)
Method of Correcting Variance of Point Monthly Rainfall
Range of sample Variance
37
Range of observation frequency of TRMM/PR
Observation Frequency The number of observation in a month
Population variance
Fig. 1. Relationship between area-averaged sample variance of point monthly rainfall and observation frequency in a month.
converge to some value by making the number of observations to a bigger amount. From the above, if it is possible to develop a formula from the relationship between the spatial average of the sample variance of the point monthly rainfall and the number of observations, it is possible to correct stochastic parameters in the precipitation field estimated from the low frequency observation of the TRMM.
2.2. Modeling the relationship between the sample variance of the monthly precipitation and the number of observations Consider the instantaneous point rainfall intensity as a stochastic variable and including temporal correlation, the equation of the mean of sample variance of the monthly rainfall and the number of observations was derived as followed. First, if it is assumed that approximate temporal correlation can be expressed by an exponential type function, the autocorrelation function can be written as c(τ ) = µi2 e−ν|τ | − µ2i1
(1)
where µi1 is the mean (the first moment) of instantaneous point rainfall intensity and µi2 is the square mean (the second moment) of instantaneous point rainfall intensity. As a relation among the expectation of sample
38
E. Nakakita et al.
variance of the monthly rainfall and the number of observations n in a month, and the number of months m (record length), n n 2
= µi2 E Sn,m e−ν|(k−i)∆T | ∆T ∆T i=1 k=1
m m n n µi2 −ν|(l−j)T +(kl −ij )∆T | e ∆T ∆T − 2 m j=1 i =1 l=1
j
(2)
kl =1
is derived, where T is the length of the month and ∆T = T /n is the interval of observation (assumed to be constant). Making m, n an infinite value, the estimation of the variance of the monthly rainfall becomes 2 µi2 T T −ν |t−t′ | dt dt′ = 2 e (3) lim E Sn,m n→∞,m→∞ T 0 0 which shows that expectation of the variance converges to a certain amount.
3. Verification of the Model Equation of the Sample Variance 3.1. Verification using information from the ground-based radar First, for the verification of the model formula itself, the data from the Miyama radar is used, because by arbitrarily thinning out time series of observation making time intervals equally, we can virtually get a wide range of observation frequency n. The operation of statistical expectation of sample variance was replaced by averaging over Miyama observed areas. The used data are Miyama observation data of the rainy season 34 months in total (from June to October of 1988–1994, excepting August 1991). Here, the climatological characteristics of rainfall are supposed to be uniform during June until October. As the model formula depends on the number of months m, validation with various number of months m was conducted. The way to change the number of months was taking m-month data from the latest data. For example, when taking m as 5 months, the data from June to October 1994 was used, and when taking m as 7 months, data from June to October 1994 and from September to October 1993 was used. As an identified model parameter ν, we used ν which realizes the minimum square sum of the difference between the variances as the sample value and as the model value. As the model formula depends on both the
Method of Correcting Variance of Point Monthly Rainfall
Fig. 2.
39
Spatially averaged sample variance and optimized value.
number of observations n and the number of months m, the identification is conducted using all numbers n and m. From now on, this method is called global identification in this paper. As shown in Fig 2, in the global identification, the minimum square sum is taken over n–m space. Therefore, even in the case of a small range of n values, like the case of TRMM data, we can stably get an optimized ν, because we still have a range of record length m. On the other hand, for the case of small record lengths, the expectation of sample variance has a risk to have a large variance due to the central limit theorem. Fortunately, however, TRMM has already accumulated about 10 years of data, there are more than 9 months data even for estimating the sample variance for each month of a year. In the validation, the values of less than 5 months were excluded in identification for its reliability. Figure 3 shows the result of global identification. The sample value and the model value are nearly equal and it can be said that this method is valid to identify ν.
3.2. Verification using information from the ground-based radar considering observation frequency of the TRMM/PR In Sec. 3.1, ν was identified using a wide range of numbers of n. As a next step, whether the parameter can be identified was verified in the case of the TRMM observation frequency. So we assumed that the largest n is 60 in the high latitude areas (latitude 33◦ –35◦ ) and n is 40 in the low and middle latitude areas (latitude lower than 33◦ ), respectively. Hereafter, the largest n is denoted as N .
40
E. Nakakita et al.
In this validation, by thinning out time series of TRMM observations, time series with observation frequency less than N is also virtually produced. In addition to this, logarithmic of square sum of the difference between the variances of the sample value and as the model value is used in the identification of ν. Figure 3 shows the verification for the high latitude area with record length of 34 months. The plots with solid lines represent the values from the model formula with the chosen parameters, and the plot with no line represents the sample value. Although all the plots for observation frequency greater than N are shown in the figure, plots used for the identification is those of less than N . We can see that, by excluding the sample values computed with record length less than 9 months, the model value converges close to the sample convergence. Also similarly, Fig. 4 shows the result for the low and middle latitude areas. It shows that the model converges close to the sample value in as same accuracy as the ones in the high latitude areas. Here, using Eq. (1), the value of ν calculated directly from the Miyama radar data was 0.0036 and comparing it with the values of Figs. 3 and 4, it can be denoted that the validity of this method for parameter estimation is high.
3.3. Validation using TRMM/PR observation Finally, validation with only TRMM/PR observation was conducted. The target area is 2.5◦ square area which is almost in the same location and
Sample variance of point monthly rainfall (mm2)
Sample variance of point monthly rainfall (mm^2)
m = 34 Sample values
Excluded less than 6 months: ν== 0.0036 0.0036
Exclude less than 10 months, ν = 0.004
The number of sample Fig. 3.
Correction for high latitude (N = 60).
Sample variance of point monthly rainfall (mm2)
Method of Correcting Variance of Point Monthly Rainfall
41
m = 34 Sample values
Excluded less than 6 months: ν = 0.0036
Excluded less than 10 months, ν = 0.0040 The number of sample Fig. 4.
Correction for low latitude (N = 40).
spatial scale with the Miyama observation area as shown in Fig. 5. In Fig. 5, 20-month averaged observation frequency n of TRMM/PR in a month is indicated. Used TRMM/PR datum is 3 km-height 2A25 observed from June to October in 1998, 1999, 2000, and 2003. Although the operation of statistical expectation was replaced by averaging over target area in Secs. 3.1 and 3.2, here it is replaced by averaging over a latitudinal layer discretized with every five observational frequencies n, because the observation frequency is almost constant along the same latitude. Using
Fig. 5.
Observation frequency in a month (20 months averaged).
42
E. Nakakita et al.
this modification of expectation operation, we can get various kind of observational frequencies n without any procedure of thinning out the observed original time series. Using the modified procedure above mentioned, a validation was conducted using the data with record length m of 20. Figure 6 shows results from the global identification. Here, m less than 12 was not used in the identification because of large variance of sample value. The graph on top in the section for m = 15, while the graph below shows the section for m = 20. Both graphs in Fig. 6 show that the identified model values well follow the sample values for various observation frequency n.
Sample variance of point monthly rainfall (mm2)
Sample variance of point monthly rainfall (mm^2)
m = 15 Sample Model value: ν = 0.0054
Sample variance of point monthly rainfall (mm^2)
Sample variance of point monthly rainfall (mm2)
The number of sample
m = 20 Sample Model value: ν = 0.0054
The number of sample Fig. 6.
Correction using only TRRM/PR observation (m = 15, m = 20).
Method of Correcting Variance of Point Monthly Rainfall
43
4. Intoducing Spatial Correlation and Estimation of Temporal and Spatial, Correlation Lengths To evaluate the difference of the observational footprints, the spatial correlation relationships in an instantaneous precipitation field must be introduced. Suppose the precipitation field can be given as a marginal Poisson distribution; by extending (1), the temporal and spatial correlation function will be r/λ −r/λ − µ2i1 (4) e c(r, τ ) = µi2 e−|τ /ζ| 1 + 2 where ζ = 1/ν is the temporal correlation length, λ is the spatial correlation length, and r is the radius of the footprints of the observation. Based on (4), parallel to (2), the relationships between the number of observation n, the number of months (record length) m, and the expectation of sample variance of the monthly rainfall at the observed point, also can be analytically derived. Also, similar to (3), by making m, n an infinite value, the extreme value of the mean of the sample variance of the monthly rainfall can be analytically derived, although the specific equations are omitted due to limitation of the paper amount. As a result, the expectation of sample variance becomes a function of the observed frequency n, observed record length m, the temporal correlation length ζ, and the spatial correlation length λ. Therefore, by fitting the expectation of the sample variance of the point monthly rainfall obtained from TRMM observation data, the temporal length ζ and the spatial length λ of the instantaneous precipitation field can be identified at the same time. Table 1 shows the result 2 of the standard deviation of the point monthly , the temporal correlation length ζ and the rainfall limn→∞,m→∞ E Sn,m spatial correlation length λ of the instantaneous rainfall intensity in the Asian regions which is estimated only from the TRMM/PR data. On estimation, these values are assumed to be equal in a grid of 2.5 square degrees of latitude and longitude. Also, the radius of the footprints of the TRMM/PR here is 4 km. The calculated values show same order of the values having estimated by ground rain gauge in the world. However, the temporal and the spatial correlation lengths in Kyushu area of Japan are larger compared to the values of the other places. In the next research step, these kinds of small differences must be made clear with the comparison with the ground rain gauge data and it is necessary to be verified from the climate characteristics
44
E. Nakakita et al. Table 1.
Estimated rainfall characteristics in Asia. Kyushu Funan, China Norhan Mekong Southern Mekon
Standard deviation of point monthly rainfall (mm/month)
149.21
76.194
72.827
101.45
Temporal correlation length of point rainfall intensity (h)
5.5
2.5
2.1
2.5
Spatial correlation length of point rainfall intensity (km)
46
32
32
32
in the areas of the world, and also from the differences between the lands and seas estimation.
5. Conclusions The purpose of this paper is to estimate the climatologically important stochastic information such as the variance of monthly rainfall, and the time and spatial correlation lengths of rainfall intensity for point rainfall, using TRMM/PR observations. First, the expectation of the sample variance of the point monthly rainfall was formulated as a function of observation length m and observation frequency n, and time correlation length ζ and spatial correlation length λ of instantaneous point rainfall. A method of identifying population time correlation length z and population spatial correlation length l was developed. Validation using ground-based radar and application into TRMM/PR were shown. As a result, it becomes a method of estimating not only the population variance of point monthly rainfall, but also the population time and spatial correlation lengths using only TRMM/PR and GPM. Using only TRMM/PR observations, standard deviation of the point monthly rainfall, and the time and spatial correlation lengths of instantaneous rainfall were estimated for a few different regions in Asia. In the next research step, the estimated parameters should be deeper validated with the comparison with the ground rain gauge data and also it is necessary to be verified from the climate characteristics in the areas of the world, and also from the differences between the lands and seas estimation.
Method of Correcting Variance of Point Monthly Rainfall
45
Acknowledgments The authors thank the Japan Aerospace Exploration Agency (JAXA) for furnishing the observed data and the Ministry of Land, Infrastructure and Transport (MLIT) for providing information on rain gauge of the Miyama radar.
References 1. T. L. Bell, J. Geophys. Res. 92 (1987) 9631–9643. 2. S. Ikebuchi, E. Nakakita, K. Kakimi and T. Adachi, Proceeding of the International Symposiuim on HEIFE (1993), pp. 216–226. 3. E. Nakakita, T. Okimura, Y. Suzuki and S. Ikebuchi, Annu. J. Hydraul. Eng. JSCE 46 (2002) 25–30. 4. E. Nakakita, T. Okimura, Y. Suzuki and S. Ikebuchi, Disaster Prev. Res. Inst. Annu. 45(B2) (2002) 687–704.
This page intentionally left blank
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
MONTE CARLO SIMULATION FOR CALCULATING DROUGHT CHARACTERISTICS CHAVALIT CHALEERAKTRAKOON∗ and SUPAMIT NOIKUMSIN Department of Civil Engineering, Thammasat University, Klong Luang Pathumthani 12120, Thailand ∗ [email protected]
This paper investigates whether a simplified Monte Carlo simulation approach is feasible for calculating the drought characteristics — frequency, magnitude, and duration — of a water resource system. The simplified simulation approach uses the seasonal averages of observed climate (rainfall and evaporation) data to represent their stochastic processes. The approach is based on the combination of a popular HEC-3 model with a stochastic flow procedure — MAR(1). The simplified approach was applied to medium-scale and large-size water resource systems. Results have indicated that the approach is applicable for the medium system. It approximates the average, one-standard-deviation bounds about the mean, and maximum of the water deficit properties closely. For the large system, the stochastic properties of the climate phenomena should be taken into consideration.
1. Introduction Drought characteristics (frequency, magnitude, and duration) are usually necessary for evaluating the design solutions and operating rules of a water resource system. The assessment is generally performed by the Monte Carlo simulation approach of the system against many synthetic samples of seasonal flow, rainfall, and evaporation records. The construction of the approach involves the set-up of the chosen simulation software for the system. It also includes the developments of stochastic models for the seasonal flow and climate phenomena, as well as the integration of all referred models together. The stochastic simulation approach may be simplified in some practical applications1,2 by assuming the rainfall and evaporation processes to be deterministic. This paper investigates the feasibility of the simplified approach consisting of the combination of simulation HEC-3 model3 with 47
48
C. Chaleeraktrakoon and S. Noikumsin
a stochastic flow MAR(1) model.4 Results of its applications have shown that it is possible for medium-scale water resource system. However, the deterministic assumption of the climate processes needs to be removed if the size of considered system is large.
2. Simplified Monte Carlo Simulation Approach Consider a water resource system. The Monte Carlo simulation approach of the considered system is constructed based on the HEC-3 model — a popular simulation software for the analysis of reservoir conservative purpose.3 The HEC-3 model generally computes the discharge at each control point of the system (e.g. the location of hydraulic structure and that of river branch junction) where the amount of flow has been changed. Starting at the most upstream node m = 1 (the site of outflow structure for a storage reservoir in the system), the model calculates the reservoir ˆ 1 during year i and season j based on a standard operating rule.5 release X ij The rule aims to keep the amount of remaining water within its lower and upper bounds (Lj and Uj ) as much as possible. That is, Dj + Sˆij − Uj , for Sˆij ≥ Uj + Dj for Lj ≤ Sˆij < Uj + Dj ˆ 1 = Dj , (1) X ij Dj , +Sˆij − Lj , for Lj − Dj ≤ Sˆij < Lj 0, otherwise where Dj is the total or partial water requirement of the system that is responsible by this reservoir and Sˆij is the calculated reservoir storage using the principle of reservoir water balance. Based on the referred water balance concept, the storage Sˆij depends generally on three different stochastic phenomena of evaporation Eij , precipitation Pij , and inflow Qij . Among these processes, the variability of the inflow is much greater than those of the others because the inflow process Qij is the resultant of all flow variations in every sub-drainage area of the reservoir while the climate phenomena Eij and Pij are localized at the location of the reservoir. Hence, to simplify the computation of Sˆij , only the inflow Qij is considered to be stochastic. The remaining climate phenomena will be assumed to be deterministic, as shown in the following equation: ˆ ij + P¯j − E ¯j Sˆij = Sˆi(j−1) + Q
(2)
Monte Carlo Simulation for Calculating Drought Characteristics
49
ˆ ij is the synthetic reservoir inflow of Qij generated using a where Q ¯j the average of Eij , and P¯j is the mean decomposed MAR(1) model,4 E of Pij . For the next downstream node (m = 2), the simulation approach ˆ 2 as computes its discharge X ij ˆ ij2 = X ˆ ij1 + Yˆij2 − Zˆij2 X
(3)
subject to ˆ ij2 ≥ Q2min , X
Zˆij2 ≤ Dj2
(4)
where Yˆij2 is the amount of reservoir-regulated inflow calculated using Eqs. (1) and (2) or that of natural one of a river branch computed by the decomposed MAR(1) model, Zˆij2 is the quantity of diverted water, and Q2min is the observed minimum flow of considered main river. To satisfy the water ˆ m to be requirement Dj2 as much as possible, and maintain the discharge X ij 2 at least equal to the historic minimum-flow record Qmin ; the maximum ˆ 2 is used in possible value of Zij2 that can give the feasible discharge X ij Eq. (3). The computation continues until the approach gives the solution of Zˆijm at the interested control point of the system. The flow Zˆijm can be then used to calculate several popular drought properties for assessing the performance of the system, which will be described in the following section.
3. Drought Characteristics To compute the drought properties, let Vˆijm = (Djm − Zˆijm ) > 0 be a water deficit flow. Also denote ℓ as the number of a sequence of successive water ˆ m where shortage flows W ℓ ˆ ℓm = W
ˆ K
(Vˆijm )ℓ
(5)
j=1
ˆ ℓ is the total period of the ℓth consecutive flows W ˆ m. in which K ℓ m ˆ ˆm The occurrence frequency f of the consecutive water deficit flow W ℓ can be calculated as pˆm fˆm = n
(6)
50
C. Chaleeraktrakoon and S. Noikumsin
where pˆm is the overall number of ℓ and n is the total number of years m ˆ m ) of ˆ ave. and W considered. The average and maximum magnitudes (W max . the phenomenon are computed by m
pˆ 1 ˆm m ˆ ave. = m Wl W pˆ
ˆ m = Max(W ˆ m ) for ℓ = 1, 2, . . . , pˆm (7) and W max . ℓ
ℓ=1
ˆ m and K ˆ m ) of the Similarly, the average and longest durations (K ave. max . process can be determined as m
pˆ 1 ˆm m ˆ Kl Kave. = m pˆ
m ˆ max ˆm and K ˆm (8) . = Max(Kℓ ) for ℓ = 1, 2, . . . , p
ℓ=1
4. Assessment of the Simulation Approach Two types of water resource systems were considered for investigating the ability of the presented simulation approach in calculating the drought characteristics within the systems. The first one is a medium-scale singlepurpose (i.e. irrigation) system in which the storage capacity and retention surface area of a reservoir for the system are less than 100 million cubic meters (MCM) and 15 square kilometers (km2 ), respectively, or the size of irrigation area is generally smaller than 128 square kilometers (km2 ) (http://www.rid.go.th/index kw1.htm). The reservoir is usually full of water at the end of flooding season. Another type considered in this study is a large-scale multi-purpose (e.g. agriculture, water supply, flood control, industry, hydropower generation, navigation, fishery, recreation, etc.) system. It has larger storage size in terms of both capacity and retention surface area, or greater irrigation area than the referred medium does. Due to its scale, the amount of stored water in the reservoir is seldom reached its maximum retention level at the beginning of dry season, unlike the characteristic of the mediumsize system. In the following, the assessment of the simulation approach for the two different systems is presented. 4.1. Medium-scale system The Kiwlom Irrigation System situating within the Wang River Basin of the North Region of Thailand was selected to represent the case of
Monte Carlo Simulation for Calculating Drought Characteristics
51
medium-scale system. The system agrees approximately with the criteria of the physical reservoir characteristics. It has the reservoir capacity of 106 MCM, and the retention surface area of 16 km2 . The size of its irrigation area is 178.56 square kilometers (km2 ). The investigation used the observed monthly series of 23-year (1980– 2002) reservoir inflow, 30-year (1973–2002) precipitation, and 21-year (1982–2002) evaporation at the Kiwlom Reservoir. The collected hydrologic data, and the other related information such as the irrigation water requirement for the system, the reservoir capacity versus elevation curve, and the relationship between retention surface area and elevation can be seen in Noikumsin.6 The described simulation approach (Eqs. 1–4) was then used to calculate many samples of diverted flow Zˆijm with the same size as that of the historic flow (500 sets of the 23-year monthly flow Zˆijm for this particular case). In this case, since the medium-size reservoir is often full of water at the beginning of the dry period, it is adequate in practice to consider only the frequency, average magnitude, and mean duration of the water deficit phenomenon when assessing the drought situation of the system. Hence, they were then estimated for each sample of the flow Zˆijm based on Eqs. (5)–(8). Next, the frequency distributions of the 500-sample estimated properties were approximately characterized in the form of box plot. These box plots of the assessed simulation approach (case 1) were subsequently compared with those calculated by the following two simulation schemes in which the deterministic assumption of rainfall (case 2) and those of rainfall and evaporation (case 3) are removed. That is, ¯j of Eij in Eq. (2) the average rainfall P¯ij of Pij , and the mean evaporation E is appropriately replaced by the corresponding synthetic one Pˆij of Pij and ˆij of Eij . In this study, the climate data Pˆij and E ˆij were generated using E the method of fragment and AR(1) developed by Srikanthan et al.7 The computer program of the approach is available, and can be downloaded from http://www.toolkit.net.au. Table 1 presents the box plots of all considered water shortage properties of the Kiwlom Irrigation System using three simulation approaches (case 1: generated flow, case 2: generated flow and rainfall, and case 3: generated flow rainfall and evaporation). It appears that the average µ, mean ± standard deviation µ ± σ, and maximum statistics of the drought properties for the investigated simulation approach (case 1) are close to those of the others (cases 2 and 3). The minimum of the water shortage properties are, however, overestimated by the approach. This is
52
C. Chaleeraktrakoon and S. Noikumsin
Table 1. Box plots of frequency, average magnitude, and average duration of drought at the Kiwlom Water Resource System (case 1: generated flow, case 2: generated flow and rainfall, and case 3: generated flow, rainfall and evaporation). Frequency
Magnitude (MCM)
Duration (months)
Statistic Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Min. µ−σ µ µ+σ Max.
0.9 0.97 0.99 1.0 1.0
0.05 0.94 0.99 1.0 1.0
0.05 0.94 0.99 1.0 1.0
272.6 305.8 323.6 341.4 372.8
0.8 298.3 321.4 344.5 371.8
0.1 293.9 317.3 340.7 372.7
7.3 8.6 9.2 9.8 10.7
2.0 8.4 9.1 9.9 10.6
2.0 8.4 9.1 9.8 10.6
Note: Min. = minimum, µ = mean, σ = standard deviation, and Max. = maximum.
perhaps because its deterministic assumption results in the loss of system characteristic (i.e. the reservoir of the system is usually full at the starting of dry period). However, it is fortunate that the ability in accurately yielding the minimum is unnecessary in practice. The statistics that are essential for assessing the design solutions and operation rules of the system are those approximated well by the examined approach. Consequently, it can be thus used for the medium-scale system.
4.2. Large-scale system The Ubolratana Water Resource System was chosen as a large water resource system. This water resource system is in the Che River Basin of the Northeast Region of Thailand. The storage and retention surface area of a reservoir for the system are 2263 MCM and 182 km2 , respectively. The Ubolratana Reservoir is mainly used for hydropower generation, irrigation, and flood control. It produces 25,200 kw, and supplies water for the irrigation area of 480 square kilometers (km2 ). The examination considered the historic monthly sequences of 21-year (1982–2002) reservoir inflow, 23-year (1980–2002) rainfall, and 21-year (1982–2002) evaporation at the reservoir. Noikumsin6 has reported these hydrologic data and all the other information necessary for the investigation. The three simulation approaches were applied to the referred data sets for calculating the drought properties of the system, as described earlier. The water deficit characteristics, in this case, are frequency, average and greatest magnitudes, and mean and longest durations since the reservoir
Monte Carlo Simulation for Calculating Drought Characteristics
53
of the system is rarely full of water. Results of the investigations are as follows. Tables 2 and 3 show the box plots of considered drought properties of the Ubolratana Water Resource System based on three Monte Carlo simulation approaches (case 1: generated flow, case 2: generated flow and rainfall, and case 3: generated flow, rainfall, and evaporation). They are evident that the simplified simulation approach (case 1) estimates the mean and one-standard-deviation bounds of the drought properties which agree well with those of the case 2 and 3 schemes. However, in this case, its close approximation of maximum statistic is generally lost (see Tables 2 and 3) because of the deterministic assumption of the climate processes. The hypothesis leads to the inadequacy of describing the carry-over storage of the reservoir system which the referred statistic relies on. Thus, in this case, the investigated simulation approach that has the simplified hypothesis (case 1) is inapplicable to approximate the water deficit characteristics.
Table 2. Box plots of frequency, average magnitude, and average Duration of drought at the Ubolratana Water Resource System (case 1: generated flow, case 2: generated flow and rainfall, and case 3: generated flow, rainfall and evaporation). Frequency
Magnitude (MCM)
Duration (months)
Statistic Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Case 1 Case 2 Case 3 Min. µ−σ µ µ+σ Max.
0.05 0.2 0.29 0.39 0.57
0.05 0.2 0.3 0.39 0.71
0.05 0.19 0.29 0.39 0.67
73.9 383.5 624.9 866.4 1,851
43.4 387.6 630.2 872.7 1,832
29.6 381.3 627.8 874.3 1,828
1 3.1 4.4 5.7 9.3
1 3.1 4.5 5.8 9.3
1 3.1 4.5 5.8 9.3
Note: Min. = minimum, µ = mean, σ = standard deviation, and Max. = maximum.
Table 3. Box plots of greatest magnitude and longest duration of drought at the Ubolratana Water Resource System (case 1: generated flow, case 2: generated flow and rainfall, and case 3: generated flow, rainfall and evaporation). Greatest magnitude (MCM) Statistic Min. µ−σ µ µ+σ Max.
Longest duration (months)
Case 1
Case 2
Case 3
Case 1
Case 2
Case 3
114.2 754.1 1,424 2,095 4,873
2.3 732.1 1,446 2,160 4,731
29.6 724.8 1,443 2,160 4,683
1 5.5 9.4 13.3 25
1 5.5 9.7 13.8 31
1 5.5 9.7 13.8 28
Note: Min. = minimum, µ = mean, σ = standard deviation, and Max. = maximum.
54
C. Chaleeraktrakoon and S. Noikumsin
5. Summary and Conclusions The main objective of the present paper was to investigate the ability of a simplified Monte Carlo simulation approach in calculating the drought characteristics (e.g. frequency, magnitude, and duration) of a water resource system. The simplified simulation approach assumes localized periodic rainfall and evaporation to be deterministic by using their seasonal averages to represent the climate phenomena. It consists of the combination of popular HEC-3 simulation model and an MAR(1) seasonal-flow procedure. The simulation approach was applied to calculate the water shortage properties of two different water resource systems. The first one is a singlepurpose and medium-scale system (the Kiwlom Irrigation Project in the Wang River Basin). Its storage is usually full at the beginning of dry period. The other — the Ubolratana Water Resource System in the Che River Basin — is multi-purpose and large-size. The water level of reservoir at the end of flooding season seldom reaches it maximum retention. Results have shown that the simplified simulation approach using the deterministic assumption of rainfall and evaporation processes is feasible for approximating the drought characteristics of the medium project. The simulation approach provides the good approximations of average, onestandard-deviation bounds, and maximum of the water shortage properties. However, when dealing with the large system, its use can give only the rough approximation of the maximum statistic. The hypothesis used in the simulation approach should be removed in this case.
References 1. C. Chaleeraktrakoon, Proceedings of the Seventh National Convention on Civil Engineering, Bangkok, Thailand (2001), pp. WRE83–WRE88. 2. C. Chaleeraktrakoon and N. Peerawat, Res. Dev. J. Eng. Inst. Thailand 14(4) (2003) 44–52. 3. US Army Corps of Engineers, HEC-3 Reservoir System Analysis for Conservation: User Manual (Hydrologic Engineering Center, US Army Corps of Engineers, Davis, CA, 1974). 4. C. Chaleeraktrakoon, J. Hydrol. Eng. 4(4) (1999) 337–343. 5. C. Chaleeraktrakoon and A. Kangrang, Can. J. Civil Eng. 34(2) (2007) 170– 176. 6. S. Noikumsin, Impact of Stochastic Climate Process on Reservoir Simulation Study, Master Thesis, (Faculty of Engineering, Thammasat University, 2006). 7. R. Srikanthan, C. Francis and F. Andrew, Stochastic Climate Library: User Guide (CRC for Catchment Hydrology, Melbourne, Australia 2005).
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
ON REGIONAL ESTIMATION OF FLOODS FOR UNGAGED SITES VAN-THANH-VAN NGUYEN Department of Civil Engineering and Applied Mechanics, McGill University Montreal, Quebec, Canada H3A 2K6 [email protected] This paper presents an innovative approach to estimation of floods at locations where flow data are not available. The approach is based on the assumption that the statistical properties of floods are scaling with the basin characteristics. More specifically, the analysis of the physiographic and hydrologic data for 180 watersheds of different sizes (ranging from 0.8 to 86,929 km2 ) in Canada has indicated the scaling behavior of the non-central moments of flood series with the basin areas. Based on this empirical evidence, a new definition of regional homogeneity of watersheds has been formulated. It was found that the grouping of homogeneous basins as proposed in this study formed well-defined geographical regions with distinct climatic characteristics. Furthermore, it was recommended that the selection of regional probability distribution for the estimation of flood quantiles and the corresponding parameter estimation method should be made such that the scaling properties of the flood series were preserved. In particular, the generalized extreme value distribution was found to be quite suitable for regional flood estimation since it is possible to account for the scaling properties of flood series in the estimation of its parameters. Finally, it has been demonstrated that, based on the proposed definition of regional homogeneity, the estimates of floods for ungaged sites could be more consistent and more accurate than those provided by existing methods.
1. Introduction For planning and design of various hydraulic structures, streamflow characteristics (e.g. peak discharge or flood volume) for a given return period are often required. However, in most cases, flow data at the location of interest are not available (an ungaged site). Hence, regionalization methods, such as the index flood method,1 are frequently used to transfer the hydrological information from one location of the region to another where data are needed but not available. Such regionalization methods require, however, the basic understanding of the watershed similarity (or regional homogeneity), and a detailed knowledge of the space–time 55
56
V.-T.-V. Nguyen
variability of various physical processes (e.g. streamflow, precipitation) in the region. In other words, identification of the hydrologically similar basins is the first and most important step in the regional estimation of the floods. Traditional techniques of delineating homogeneous groups of watersheds are based on their geographical locations or administrative and political boundaries. These methods were criticized for the obvious subjectivity and lack of physical interpretations. Recent techniques, such as discriminant analysis,2 region of influence approach,3 and discordancy measure4 also involved a great deal of subjectivity in determining the grouping of homogeneous basins. Further, all previous classification techniques were developed for defining watershed similarity based on criteria that are not directly related to the purpose of flood estimation. The accuracy of flood estimates for an ungaged site based on these techniques is thus rather limited. In view of the above problems, the present study is to propose a new method for estimating floods at ungaged sites using the recently developed “scale-invariance” (or “scaling”) concept.5 The scale invariance implies that statistical properties of floods for different basin scales are related to each other by a scale changing operator involving only the scale ratio. Physiographic and hydrologic data of 180 watersheds in Canada were used to illustrate the application of the proposed method. The generalized extreme value (GEV) distribution6 was used to estimate the flood quantiles at different location. It was observed that the non-central moments (NCMs) of order 1 to 3 of the flood series are scaling with the basin area. Further, it was found that the groups of the homogeneous basins, as delineated based on the scaling behavior of the flood series lie in distinct geographical regions and possess distinct climatic characteristics. Finally, results of this numerical application have indicated that the proposed scaling approach can provide accurate flood estimates for ungaged sites as compared to those given by existing procedures.
2. The Scaling Approach to Regional Estimation of Floods 2.1. The scaling process By definition,7 a function f (x) is scaling (or scale-invariant) if f (x) is proportional to the scaled function f (λx) for all positive values of the scale factor λ. In the present study, the flood quantile function QT (A) is assumed to be scaling. That is, if QT (A) is scaling then there exists a function C(λ)
On Regional Estimation of Floods for Ungaged Sites
57
such that QT (A) = C(λ)QT (λA)
(1)
in which A is the basin area, QT (A) is the corresponding flood quantile for a given return period T . It can be readily shown that C(λ) = λ−β
(2)
in which β is a constant, and that QT (A) = Aβ QT (1)
(3)
where QT (1) is the flood quantile for a basin of unit scale (A = 1). Thus, under the scaling hypothesis, notice that QT (A) has the same distribution as Aβ QT (1). Hence, the relationship between the NCM of order k, (µi,k ) and the basin area Ai can be written in a general form as follows: β(k)
µi,k = E{QkT (Ai )} = α(k)Ai
(4)
in which α(k) = E{QkT (1)} and β(k) = βk. This relation indicates that the NCMs of flood series have a simple dependence on the basin area. Hence, it is possible to transfer the NCMs from one site to any other site based on the basin scale. Notice that if the exponent β(k) is not a linear function of k, in such case the flood series are said to be “multiscaling”.5
2.2. The scaling GEV distribution Application of the GEV distribution to model the annual flood series has been advocated by several researchers (see Refs. 6, 8 and 9). The cumulative distribution function F (q) for the GEV distribution is given as 1/κ κ(q − ξ κ = 0 (5) F (q) = exp − 1 − α where ξ, α, and k are, respectively, the location, scale, and shape parameters of the distribution. It can be readily shown that that the kth order NCM, µk , of the GEV distribution (for k = 0) can be expressed as12 α k α k + (−1)k Γ(1 + kκ) µk = ξ + κ κ k−1 α i α k−i ξ+ (−1)i Γ(1 + iκ) (6) +k κ κ i=1
58
V.-T.-V. Nguyen
where Γ(·) is the gamma function. Hence, on the basis of Eq. (6), it is possible to estimate the three parameters of the GEV distribution using the first three NCMs. Consequently, the quantiles (QT ) can be computed using the following relation: QT = ξ +
α {1 − [− ln(p)]κ } κ
(7)
in which p = 1/T is the exceedance probability of interest.
3. Numerical Application In the following, to illustrate the application of the proposed approach, a case study was carried out using annual peak flow series from 180 watersheds in Canada (71 basins in Quebec region and 109 basins in Ontario region). The length of the flood series varies from 20 to 76 years and the size of the basin area ranges from 0.8 to 86,929 km2 . Since the majority of the floods in the region occur during the spring season (i.e. from May to September), only spring floods were considered in this study. The selection of the stations was made such that the observed series are stationary in mean. The first three NCMs of the flood series were computed for each station in the region. The log-linearity between the NCMs and the basin area indicates the power-law dependency (scaling) of the moments of the flood series with the basin area. For purposes of illustration, Fig. 1 shows the log–log plots of the relations between the first-order NCM (i.e. the mean) of floods and the basin areas in Quebec and Ontario regions. Thus, using Eq. (4), the regression estimates (a(k) and b(k)) of the coefficients α(k) and β(k) for the sample NCMs (mk for k = 1, 2, and 3) are given in Table 1. It can be seen that the exponent β(k) is approximately a linear function of the exponent β(1), i.e. β(k) = k · β(1). The linearity of the exponent β(k) supports the assumption that the spring flood series in Quebec and Ontario can be described by a simple scaling model.
3.1. Delineation of homogeneous regions In the present study, a homogeneous region is defined as a region in which all annual flood peak series must have similar scaling properties. More specifically, the power-law relation between the NCMs and the basin area (Eq. 4) for various watersheds within a homogeneous region can be represented by a straight line on a log–log plot, giving a unique set of
On Regional Estimation of Floods for Ungaged Sites
59
Fig. 1. Relations between at-site mean floods and basin areas for (a) Quebec and (b) Ontario regions.
parameters and b(k). Hence, one could try to group similar basins in the log(mi,k )−log(Ai ) plan into a number of subgroups based upon the equality of these parameters. For example, Fig. 1 shows the log–log plots of the at-site sample mean of floods versus the basin areas for 71 watersheds
60
V.-T.-V. Nguyen Table 1. Sample estimates of the parameters in the model representing the relation between non-central moments (NCMs) of floods and basin area. Order of the NCM
Quebec
Ontario
(k)
a(k)
b(k)
a(k)
b(k)
1 2 3
0.776 0.716 0.773
0.756 1.502 2.238
0.633 0.505 0.487
0.715 1.421 2.121
in Quebec and 109 basins in Ontario. In general, it can be seen that the points in the plots are well organized within each sub-group rather than within the region as a whole. Therefore, in the present study, the underlying nonlinear relationship between the NCMs of floods and the basin area could be adequately described by first dividing the watersheds into different homogeneous subgroups, and then a log-linear approximation (i.e. the power-law function) can be used to describe more accurately such relationship within each subgroup. With the above grouping objective, the region was separated into “n” subgroups by drawing “n” straight lines in the log(mi,1 )−log(Ai ) plane. The straight lines were drawn such that the total residuals within the subgroup was minimum, that is, mj n (8) (yi,j − yˆi,j )2 min min SSE = min j=1
i=1
where SSE is the sum of squared error, mj is the number of basins in the jth group; yi,j log(mi,1 ) is the log of the mean annual flood for the ith basin when assigned to the jth subgroup; and yˆi,j is the log of the predicted value of the mean annual flood for the ith basin when assigned to the jth subgroup and is given by the following model: yˆi,j = log(a(1)) + b(1) log(Ai )
(9)
The objective function in Eq. (8) was computed using the clusterwise linear regression technique.10 Hence, it was found that the basins can be divided into two distinct homogeneous groups in Quebec and three homogeneous groups in Ontario based on the similar scaling behavior of the statistical properties of the flood series (Fig. 1).
On Regional Estimation of Floods for Ungaged Sites
61
The validity of any method of delineating homogeneous regions should be supported by some physical evidences, such as the proximity of geographical locations and similar climatic features. To this end, the geographical locations of the stations for each homogeneous subgroup are shown in Fig. 2. It can be seen that, irrespective of the basin size, the newly delineated homogeneous groups are well defined in each distinct geographical regions which have distinct climatic features. Notice that there are some basins which are on the boundary between two groups. Such basins may be considered as being members of either group. Further, since the basin area is the most significant variable in the estimation of floods (see Ref. 1), the delineation of homogeneous groups based on the scaling of flood statistical moments with the basin area can be considered as acceptable.
3.2. Estimation of quantiles for ungaged sites The principal objective of regional flood frequency analyses is to estimate the flood quantiles for ungaged sites. To this end, the Jacknife procedure was used to simulate the ungaged condition. That is, one basin was removed from the database and the model was developed using the data from the remaining stations. The model was in turn used to make prediction of the quantiles for the site which was not used in the model development. The process was repeated until every station was removed once. Further, for comparison purposes, estimation of the flood quantiles was made using the GEV/PWM Index Flood Method.11 This method was selected because of its popularity and its superior performance in the regional estimation of floods.6,8,9 In the estimation of flood quantiles using the proposed scaling (GEV/NCM) approach, only data belonging to a specific homogeneous group delineated earlier were used in the model development. For each homogeneous group, using Eq. (4) the first three NCMs (µi,k for k = 1, 2, and 3) were estimated for an ungaged site (i.e. the site which was not included in the estimation of the parameters α(k) and β(k) of Eq. 4). Those predicted moments were in turn equated with the first three NCMs of the GEV distribution (Eq. 6 with k = 1, 2, and 3) in order to determine the parameters ξ, α, and κ of the GEV distribution. Finally, with these known parameters, the flood quantiles at the ungaged site can be computed using Eq. (7).
62
V.-T.-V. Nguyen
Fig. 2. Geographical location of the gaging stations in (a) Quebec and (b) Ontario regions.
On Regional Estimation of Floods for Ungaged Sites
63
Fig. 3. Quantile–quantile plots between at-site fitted and regional estimated 100-year floods using GEV/PWM and GEV/NCM methods for Quebec region: (a) real space and (b) log space.
64
V.-T.-V. Nguyen
Fig. 4. Quantile–quantile plots between at-site fitted and regional estimated 100-year floods using GEV/PWM and GEV/NCM methods for Ontario region: (a) real space and (b) log space.
On Regional Estimation of Floods for Ungaged Sites
65
For purposes of illustration, Figs. 3 and 4 show the quantile–quantile plots between the at-site (fitted) and regional (predicted) 100-year flood estimates based on the GEV/PWM and GEV/NCM methods for Quebec and Ontario regions, respectively. Further, to assess the accuracy of the estimation of floods for small and large basins, the results are presented in both real and log flow domains. It can be seen that the quantile estimates obtained from the proposed GEV/NCM method is more accurate than those given by the standard GEV/PWM Index Flood Method for small and large floods.
4. Conclusions The major findings of the present study can be summarized as follows: (i) It was found that the NCMs of the regional flood series are scaling with the basin area. (ii) A key step in regional flood frequency analyses is the definition of hydrologically similar basins. To this end, instead of using an arbitrary criterion as in most previous investigations, it has been shown that similarity of basins can be defined based on the scaling of the statistical properties of the flood series. Further, it was observed that the grouping of similar basins into homogeneous groups as suggested in this study was supported by some physical evidences such as basin geographical locations. (iii) Considering the scaling of the NCMs of regional flood series, a new method has been developed to estimate the floods for ungaged sites in a region. This method can provide better flood estimates than those given by the traditional GEV/Index Flood Method.
References 1. D. M. Thomas and M. A. Benson, US Geological Survey, Water Supply Paper 1975, 1970. 2. S. E. Wiltshire, Hydrol. Sci. J. 31(3) (1986). 3. D. H. Burn, Water Resour. Res. 26(10) (1990). 4. J. R. M. Hosking and J. R. Wallis, Water Resour. Res. 29(2) (1993). 5. V. K. Gupta and E. Waymire, J. Geophys. Res. 95(D3) (1990). 6. J. R. M. Hosking, J. R. Wallis and E. F. Wood, Hydrol. Sci. J. 30(1) (1985). 7. J. Fedder, Fractals (Plenum Press, New York, 1988). 8. D. P. Lettenmaier and K. W. Potter, Water Resour. Res. 21(12) (1985).
66
V.-T.-V. Nguyen
9. K. W. Potter and D. P. Lettenmaier, Water Resour. Res. 26(3) (1990). 10. H. Spath, Mathematical Algorithms for Linear Regression (Academic Press, San Diego, 1982). 11. J. R. M. Hosking, Research Report, RC17097 (IBM Research Division, Yorktown Heights, NY, 1994). 12. G. R. Pandey, Doctoral Thesis (Department of Civil Engineering and Applied Mechanics, McGill University, Montreal, 1995).
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
DETERMINATION OF CONFIDENCE LIMITS FOR MODEL ESTIMATION USING RESAMPLING TECHNIQUES N. K. M. NANSEER Senior Research Engineer, Lanka Hydraulic Institute 177, John Rodrigo Mawatha, Katubedda, Moratuwa, Sri Lanka [email protected] Late Prof. M. J. HALL Former Head, Dept. of Hydrology and Water Resources , UNESCO-IHE P.O. Box 3015, 2601 DA Delft, The Netherlands H. F. P. VAN DEN BOOGAARD Research Specialist, Delft Hydraulics, P.O. Box 177, 2600 MH Delft, The Netherlands
From the last few decades, a variety of models have been used in the field of hydrology to estimate hydrological components such as runoff, infiltration, base flow, etc. However, there is uncertainty in these model results due to the stochastic nature of hydrological processes, the limited amount of data available for assessing the true random mechanism of hydrological processes, and insufficient data. Prediction of such uncertainty is very important since it determines the reliability of outputs that models predict. Uncertainty could be presented in the form of confidence limits. However, theoretical confidence intervals are not readily available for many cases. In this situation, resampling techniques that make a certain kind of inference between the statistics and the hydrology can be adopted to solve this problem. This paper deals with how confidence limits can be constructed for uncertainty of observed data and model structure for linear models.
1. Introduction Estimation of uncertainties in model prediction has become increasingly important during the last few decades as modeling has been used in almost all aspects of professional fields. The hydrology field also is equipped with a variety of models in order to predict hydrological variables. However, the stochastic nature of hydrological processes, the limited amount of data available for assessing the true random mechanism of hydrological processes, and insufficient data cause uncertainty in these model results.1 Even physically-based systems, known as the “skilful forecast” or the “good 67
68
N. K. M. Nanseer et al.
guess,” are also associated with some degree of uncertainty. In other words, it is almost impossible to predict or to model physically-based systems accurately enough because they are generally random processes. Apart from the quality of data required for a particular model the accuracy of model results also depends on model structure that represents the mechanism of hydrological processes. Uncertainty analyses of model structure and model parameters are very important in hydrology since it determines the reliability of outputs that models predict. Therefore in this study, resampling technique is used to optimize model prediction and to attach some degree of confidence to them with respect to model structure. The aim of determination of the confidence intervals is to minimize the risk associated with the predicted value. In this way, the analyst is allowed to express the value (the mean) with its upper and the lower values. In simple terms, there is an x% (say, 95%) probability that the estimated value may fall between upper and lower limits. The user is allowed to choose a suitable value for the confidence interval. In this study, this was selected as 95%. Accordingly the user can visualize how the estimated values are distributed within the range. The theoretical confidence intervals are not readily available for many cases. Therefore determination of upper and lower limits is difficult, and in this situation resampling techniques can be adopted to solve these problems. Resampling is mainly a technique that makes a certain kind of inference between the statistics and the hydrology. A spreadsheet model was selected for the applications of one-, two- and three-parameter models. 2. Application 1 — Spreadsheet Models The confidence limits for the model prediction were first analyzed using simple spreadsheet models ranging from single parameter to three parameters. The similarity among these models is that all are rainfall runoff models. The different types of model used, their parameter(s), and results are described below. 2.1. Spreadsheet models 2.1.1. Model 1 — Single linear reservoir This is a single parameter model. If the catchment is considered to behave as linear and time invariant system the unit hydrograph (UH) method can
Determination of Confidence Limits for Model Estimation
69
be used. This approach assumes that the UH does not change in time (time invariant) and that the principle of superposition is valid. This allows runoff to be simulated by convolving the UH with observed effective rainfall. The expression for the instantaneous unit hydrograph (IUH) of a single linear reservoir (SLR) denoted by u(0, t) is given by u(0, t) =
1 −t/k e k
(k = constant)
(1)
and the expression for the S-Curve can be derived as t 1 −t/k e S(t) = dt = [−e−t/k ]t0 = (1 − e−t/k ) k 0
(2)
The expression for the D-hour UH, u(D, t) is composed of two parts: the first part is the S-Curve and the second part is the subtraction of the S-Curves shifted by D hours: (i) (ii)
1 (1 − e−t/k ) D 1 S(t) − S(t − D) = (e−(t−D)/k − e−t/k ) D
S(t) =
(3) (4)
Therefore, the UH for D equal to one time step can be derived as For t ≤ 1 : u(1, t) = (1 − e−t/k ) −(t−1)/k
For t ≥ 1 : u(1, t) = (e
−e
(5) −t/k
)
(6)
The distributed unit hydrograph (DUH) can be derived by integrating the UH, u(D, t) over time periods of D hours. It is composed of two parts: DUHI for t ≤ D and DUHII for t > D. For t ≤ D: t 1 t 1 S(t)dt = (1−e−t/k )dt = [1+ke−t/k (1−e−1/k )] (7) DUHI = D D t−1 t−1 For t > D: 1 t −(t−D)/k (e − e−t/k )dt (S(t) − S(t − D))dt = DUHII = D t−1 t−1 k = e−t/k (1 − e−D/k (1 − e−1/k ) (8) D
t
The estimation of this model can be altered by changing the k-value. Thus, the k-value has to be optimized to obtain the better estimation.
70
N. K. M. Nanseer et al.
2.1.2. Model 2 — Two-parallel linear reservoirs equally distributed The expressions for a two-parallel linear reservoir model are the same as for the equation derived for SLR. However, each reservoir is assumed to receive half of the unit input and a linear channel is added to the second reservoir in order to shift the DUH by one time step. The estimation of this model depends on the model parameters k1 and k2 related to the IUH (Eq. 1) for each reservoir. Therefore, values of k1 and k2 have to be optimized for the better estimation. 2.1.3. Model 3 — Two-parallel linear reservoirs unequally distributed This model is similar to model 2 and the expressions derived for the SLR are still valid. However, each ordinates of input will be distributed between two reservoirs unequally. Therefore, the constants k1 and k2 of the IUH for each reservoir and the fraction of distribution of unit input govern the working of this model. Hence, better estimation can be obtained by altering these three parameters.
2.2. Methodology Rainfall data synthesized for input to a runoff routing program (RORB) model for the study “Artificial Neural Networks as Rainfall Runoff Models”2 have been treated as input data for Models 1–3. These synthetic rainfall data consisted of 743 ordinates at hourly intervals. The generated runoff by the same model, when it was set to act as SLR, was used in order to verify spreadsheet model 1. Figure 1 shows the distribution of rainfall and runoff generated by the RORB model. The general procedure adopted was as follows: (a) Input data preparation: Rainfall data were totaled for every 6 h since there was a limitation in the number of columns in the spreadsheet. (b) First optimization of model parameter (s): The model was run for the first optimization of model parameter(s) using SOLVER in Excel and the estimated output (runoff) was recorded. (c) Random number generation: Random numbers were generated using Excel command RAND( ) and were normalized using the Box–M¨ uller3 method.
Determination of Confidence Limits for Model Estimation
71
Rainfall and Runoff Data for Model – six hours ordinates 80
0
Runoff (mm)
20 40 Rainfall
Runoff (SLR)
30
Rainfall (mm)
10
60
20 40
0 0
20
40
60
80
100
120
50
Time steps (6h)
Fig. 1.
Rainfall and runoff distribution — RORB model.
(d) Introducing noise to output: The normalized random numbers of d mm deviation was added to the estimated output. (e) Second optimization of model parameter (s): The model was simulated again in order to estimate the new model parameter(s) for the noisy output and corresponded output was recorded. (f) Residual calculation: The residual was calculated by subtracting the estimated output in step (e) from that of step (b). (g) Resampling noisy outputs: New noisy outputs were produced by bootstrapping the series of residuals and adding them to the originally estimated output in step (b). (h) Calculating new model parameter (s): Model parameters were optimized by running the model for all new noisy outputs and these parameters were recorded each time. (i) Analyzing model parameters: The recorded values of model parameters were ranked and categorized into particular intervals for plotting purpose. (j) Confidence limits of model parameters: Based on the distribution pattern of model parameters their values were calculated for the 95% confidence level (2.5% and 97.5% limit values). (k) Constructing confidence limits: Confidence limits for the model estimation were constructed using both the percentile and the Gaussian methods.
72
N. K. M. Nanseer et al.
2.3. Analysis of results 2.3.1. Distribution of model parameters The noise introduced for model estimation was normalized random numbers 1 and 4 mm. The minimum number of resamples required to reflect normalization in parameter distribution was investigated by plotting parameter distribution for each 100 samples separately and then combining their results. It was found that model parameter distributions become normal for 200 resamples. The distribution of model parameters for models 1–3 for 200 samples are shown in Figs. 2(a)–(c), respectively. Figures 2(a)–(c) shows that the distribution of model parameters become normal when the number of resample is 200 and it is true for both 1- and 4-mm/6 h deviations. Therefore, the upper and lower limits of uncertainty of model parameter which are 2.5% and 97.5%, respectively, can be calculated based on both the normal distribution (x ± t97.5,n−1 ∗ σ) and the percentile method (ranking parameter values in ascending order and picking up 2.5% and 97.5% values) was also used to compute the same for the comparison purpose. The distributions of model parameters, k1 and k2 of model 2 (two parallel reservoirs equally distributed) and k1 , k2 and α of model 3 (two parallel reservoirs unequally distributed) are approximately normal for both 1- and 4-mm deviations and the distribution becomes better for the combination of results of the first and second hundred runs. However, the means of the distributions have been shifted by a small amount for 4-mm deviation. The uncertainty limits of model parameters with respect 40%
40%
DISTRIBUTION OF MODEL PARAMETER - K, SINGLE LINEAR RESERVOIR
30%
Percentage (%)
30%
20%
10%
10%
35
3.
40
3.
0
2.9 53.1 0 3.1 03.2 5 3.2 53.4 0 3.4 03.5 5 3.5 53.7 0 3.7 03.8 5
30
3.
2.9 5
25
3.
0-
20
3.
. -3
. -3
2.8
15
3.
45
40
. -3
. -3
2.8
10
3.
35
30
. -3
5-
05
3.
25
. -3
2.6
00
3.
20
. -3
. -3
. -3
0-
95
15
10
05
00
. -3
2.6 5
0%
0%
2.
20%
2.5
Percentage (%)
DISTRIBUTION OF MODEL PARAMETER - K, SINGLE LINEAR RESERVOIR
Intervals
Intervals
Distribution of K-SLR , 1 mm deviation, combined runs
Fig. 2a.
Distribution of K-SLR , 4 mm deviation, combined runs
Parameter distribution, model 1.
Determination of Confidence Limits for Model Estimation 40%
40%
30%
DISTRIBUTION OF MODEL PARAMETER - K2, TWO PARALLEL RESERVOIRS
Percentage (%)
20%
10%
30%
20%
10%
0%
0%
3 3 3 3 3 3 3 Intervals Distribution of K1-TPRED, 1 mm deviation, combined runs
15 30 45 60 75 90 05 20 35 50 65 80 95 10 25 2. 2. 2. 2. 2. 2. 3. 3. 3. 3. 3. 3. 3. 4. 4. 0- 5- 0- 5- 0- 5- 0- 5- 0- 5- 0- 5- 0- 5- 0.2 0 2.1 2.3 2.4 2.6 2.7 2.9 3.0 3.2 3.3 3.5 3.6 3.8 3.9 4.1 Intervals Distribution of K2-TPRED, 1 mm deviation, combined runs
40%
40%
.4
5
2
.4
2 5-
.6
0
2
.6
2 0-
.7
5
2
.7
2 5-
.9
0
2
.9
3 0-
.0
5 .0
3 5-
.2
0 .2
3 0-
.3
5 .3
3 5-
.5
0 .5
3 0-
.6
5 .6
3 5-
.8
0 .8
3 0-
.9
5 .9
4 5-
.1
0
DISTRIBUTION OF MODEL PARAMETER - K2, TWO PARALLEL RESERVOIRS
DISTRIBUTION OF MODEL PARAMETER - K1, TWO PARALLEL RESERVOIRS
30% Percentage (%)
30%
20%
10%
10%
0%
0
0
2.5
3.0
0
1.5
Intervals
Intervals Distribution of K1-TPRED, 4 mm deviation, combined runs
Fig. 2b.
03 3.5 .50 04 4.0 .00 04 4.5 .50 05.0 5.0 0 05. 5.5 50 06 6.0 00 06.5 0
0
5.
3.0
5
4.
0-
0
4.
6 0-
0-
5
3.
0 0 00 6.5 7.0 0- 0.5 5 6.0 6.5
.5
5 0-
2.5
0
3.
.0
5 0-
0-
5
2.
.5
4 0-
0
0
0
.0
4 0-
1.5
0
2.
.5
3 0-
0-
5
1.
.0
3 0-
0
0
0
.5
2 0-
1.0
0
.0
2 0-
0
0%
1 0-
0
0
0
.5
1.
20%
2.0
2
.3
2 0-
2.0
Percentage (%)
DISTRIBUTION OF MODEL PARAMETER - K1, TWO PARALLEL RESERVOIRS
Percentage (%)
73
Distribution of K2-TPRED, 4 mm deviation, combined runs
Parameter distribution, model 2 .
to model structure for 95% confidence level were calculated for the combined results based on Gaussian approximation and percentile interval approaches and these values are shown in Table 1. 2.3.2. Determination of uncertainty limits for model estimation There is still a question as to how to construct confidence intervals for the model estimation. There are two different techniques that can be used for this purpose percentile interval approach and Gaussian approximation. Percentile interval approach. The “percentile interval approach” is very useful when the distribution of the parameters is non-normal and it estimates upper and lower confidence limits independent of the type of parameter distribution and does not affect the interval estimation.
74
N. K. M. Nanseer et al. 40%
40%
DISTRIBUTION OF MODEL PARAMETER - K1, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
DISTRIBUTION OF MODEL PARAMETER - K1, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
30% Percentage (%)
20%
20%
10%
10%
0%
0% 1.0 01.5 0 1.5 01.2 0 2.0 02.5 0 2.5 03.0 0 3.0 03.5 0 3.5 04.0 0 4.0 04.5 0 4.5 05.0 0 5.0 05.5 0
2.8 02.8 5 2.9 02.9 5 3.0 03.0 5 3.1 03.1 5 3.2 03.2 5 3.3 03.3 5 3.4 03.4 5 3.5 03.5 5
Intervals
Intervals
Distribution of K1-TPRUED, 1mm deviation, combined runs
Distribution of K1-TPRUED, 4mm deviation, combined runs
40%
40%
DISTRIBUTION OF MODEL PARAMETER - K2, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
DISTRIBUTION OF MODEL PARAMETER - K2, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
30% Percentage (%)
20%
5
5
3.5 03.5
5
3.4 03.4
5
3.3 3.3
0-
3.2 3.2
0-
3.1
1
0 .0
-1
.5
0 1
0 .5
-1
.2
0 2
0 .0
-2
.5
0 2
0 .5
-3
.0
0 0 .0
-3
.5
0 0 .5
-4
.0
0
3 3 Intervals
4
0 .0
-4
.5
0 4
0 .5
-5
.0
0 5
0 .0
-5
.5
0
Distribution of K2-TPRUED, 4 mm deviation, combined runs
40%
40%
DISTRIBUTION OF MODEL PARAMETER - Alpha, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
30%
DISTRIBUTION OF MODEL PARAMETER - Alpha, TWO PARALLEL LINEAR RESERVOIRS, UNEQUALLY DISTRIPUTED
Percentage (%)
30%
20%
10%
20%
10%
Intervals Distribution of α-TPRUED, 1 mm deviation, combined runs
-1 .0
-0 .9
0.9
-0 .8
0.8
-0 .7
0.7
-0 .6
0.6
-0 .4
-0 .3
0.3
-0 .2
0.2
-0 .1
0%
0.1
0.3
6 0.3 -0.3 8 8 0.4 -0.4 0 0 0.4 -0.4 2 2 0.4 -0.4 4 4 0.4 -0.4 6 6 0.4 -0.4 8 8 0.5 -0.5 0 0 0.5 -0.5 2 2 0.5 -0.5 4 4 0.5 -0.5 6 6 0.5 -0.5 8 8 0.6 -0.6 0 0 0.6 -0.6 2 2 0.6 -0.6 4- 4 0.6 6
0%
0.0
03.1
3.0 03.0
2.9 02.9
2.8 02.8
5
0%
5
0%
5
10%
Intervals Distribution of K2-TPRUED, 1 mm deviation, combined runs
Percentage (%)
20%
10%
5
Percentage (%)
30%
0.5
Percentage (%)
30%
Intervals Distribution of α-TPRUED, 4 mm deviation, combined runs
Fig. 2c. Parameter distribution, model 3. TPRUED, two-parallel reservoirs unequally distributed.
Determination of Confidence Limits for Model Estimation Table 1. Model
Parameter
75
Uncertainty limits for model parameters.
Deviation (mm)
Mean
Std. Deviation
Gaussian mtd
Percentile mtd
2.5% Value
97.5% Value
2.5% Value
97.5% Value
1
k
1 4
3.21 3.25
0.107 0.237
3.00 2.78
3.42 3.71
2.99 2.75
3.40 3.62
2
k1
1 4
3.21 3.54
0.356 1.194
2.51 1.20
3.91 5.88
2.54 1.12
3.90 5.74
k2
1 4
3.22 3.22
0.425 1.120
2.38 1.03
4.05 5.41
2.28 1.02
3.93 5.30
k
1 4 1 4 1 4
0.50 0.47 3.20 3.12 3.24 3.31
0.054 0.192 0.138 0.855 0.154 0.804
0.40 0.10 2.93 1.44 2.93 1.74
0.61 0.85 3.47 4.80 3.54 4.89
0.40 0.13 2.95 1.50 2.94 1.72
0.60 0.83 3.45 4.88 3.50 4.80
3
k1 k2
MODEL 1 OUTPUTS - Combined runs, 4-mm deviation
Discharge (mm/6 h)
20 Mean 97.50% 2.50%
15 10 5 0 50
60
70
80
90
100
Time steps (6 h) Fig. 3a.
Upper and lower limits for model 1 estimation, 4-mm deviation.
For the determination of confidence intervals using the percentile interval approach model estimates of all sample for each time step are ranked in ascending order and 2.5% and 97.5% confidence limit values are picked up for each time steps. The plots of model estimation (part of estimation) of each model for 4-mm deviations with lower and upper confidence limits calculated based on percentile method are shown in Figs. 3(a)–(c).
76
N. K. M. Nanseer et al.
MODEL 2 OUTPUTS - Combined runs, 4-mm deviation 20 Discharge (mm/6 h)
Mean 2.50%
15
97.50% 10 5 0 50
60
70
80
90
100
Time steps (6 h)
Fig. 3b.
Upper and lower limits for model 2 estimation, 4-mm deviation.
MODEL 3 OUTPUTS - Combined runs, 4-mm deviation
Discharge (mm/6 h)
25 Mean 97.50% 2.50%
20 15 10 5 0 50
60
70
80
90
100
Time steps (6 h) Fig. 3c.
Upper and lower limits for model 3 estimation, 4-mm deviation.
From the figure it can be interpreted that the upper and lower confidence limits are significant for 4-mm deviations and uncertainty has mainly affected peaks and recessions. However, the effect on the peak is higher compared with the effect on the recession. Another feature notable from the figure is confidence limits widen with increases in the number of model parameters, as might be expected since the effect of uncertainty is cumulative for higher number of model parameters. Gaussian approximation. In this technique, the upper and lower uncertainty limits for nth time step of model estimation are defined,
Determination of Confidence Limits for Model Estimation
77
respectively, as, ¯ n + t2.5 × σ ¯ n − t97.5 × σ and Xn =X Xn2.5 = X 97.5
(9)
where X¯n is the mean of model estimations for nth time step, t2.5 is value from Student’s t-distribution for (n − 1) degrees of freedom, σ is standard deviation of model estimations for nth time step, and n is the number of resamples. Figures 4(a)–(c) shows parts of confidence limits computed based on the Gaussian method and in the same figures confidence intervals constructed based on the percentile method are also included for the comparison purpose (P, percentile method; G, Gaussian method). From the figure it is clear that the two different methodologies (Gaussian and percentile) used to compute the confidence intervals for model estimation with respect to model structure have produced similar results.
3. Conclusions In this research, the possibility of applying resampling techniques to determine the confidence limits for model estimation has been investigated. Linear spreadsheet models were used for this purpose. The uncertainties considered in this study are those related to model structure. The main conclusions obtained from the present applications may be summarized as follows: COMPARISION OF CONFIDENCE LIMITS (Model 1, 4-mm deviation)
Discharge (mm/6 h)
20 Mean P - 2.5% P - 97.5% G - 2.5% G - 97.5%
15 10 5 0 60
Fig. 4a.
70
80 Time steps (6 h)
90
100
Comparison of confidence limits — model 1, 4-mm deviation.
78
N. K. M. Nanseer et al. COMPARISION OF CONFIDENCE LIMITS (Model 2, 4-mm deviation)
Discharge (mm/6 h)
20
Mean P - 2.5% P - 97.5% G - 2.5% G - 97.5%
15 10 5 0 60
Fig. 4b.
70
80 Time steps (6 h)
90
100
Comparison of confidence limits — model 2, 4-mm deviation.
COMPARISION OF CONFIDENCE LIMITS (Model 3, 4-mm deviation)
Discharge (mm/6 h)
20 Mean P-2.5% P-97.5% G-97.5% G-2.5%
15 10 5 0 60
Fig. 4c.
70
80 Time steps (6 h)
90
100
Comparison of confidence limits — model 3, 4-mm deviation.
(i) The number of resamples required for the construction of proper model parameter distributions is a minimum of 200. This limit applies for all the models used in this research. (ii) The range of parameter values varies from parameter to parameter; some are widely scattered and some are narrowly distributed, depending upon the dependency of model estimation on model parameters, model structure, and the degrees of freedom of model parameters. From these ranges of parameter values, it is possible to obtain different combinations of parameters values that are equally likely for the particular catchment.
Determination of Confidence Limits for Model Estimation
79
(iii) Both the percentile and the Gaussian methods were used to construct confidence limits for model estimation. Confidence limits were very much more significant for the higher deviations, and uncertainty mainly affected peaks and recessions. However, the effect on the peak was higher compared with the effect on the recession, and confidence limits widened with higher number of model parameters since the effect of uncertainty was added together for higher number of model parameters and confidence limits based on the percentile approach coincided with those calculated based on the Gaussian approximation. (iv) According to the results of analyses, the resampling technique is very suitable for the determination of confidence limits for model estimation with respect to model structure.
References 1. R. I. Bras, A channel network evolution model with subsurface saturation mechanism and analysis of the chaotic behaviour of the model (MIT press, Cambridge, 1990), pp. 135. 2. A. W. Minns and M. J. Hall, Artificial neural networks as rainfall-runoff models, Hydrological Science 41(3) (1996) 399-405. 3. G. E. P. Box and M. E. Muller, Ann. Math. Statist. 29 (1958) 610–611. 4. R. L. Anderson, Ann. Math. Statist. 13 (1942) 1–13. 5. J. M. Bernier, in, eds. L. Duckstein and E. J. Plate, Engineering Reliability and Risk in Water Resources (ANTO ASI Series, Series E: Applied Sci., no. 124, Nijhoff, Dordrecht, 1987). 6. K. J. Beven, J. Hydrol. 105 (1989) 157–172. 7. K. Beven and A. Binley, Hydrol. Process. 6 (1992) 279–298. 8. K. J. Beven, Adv. Water Resour. 16 (1993) 41–51. 9. B. Efron and R. Tibshirani, An introduction to the Bootstrap (Chapman & Hall, 1993). 10. C. T. Haan, ASAE 32(1) (1989) 49–55. 11. B. M. Troutman, Water Resour. Res. 21(8) (1995) 1195–1222.
This page intentionally left blank
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
REAL-TIME HIGH-VOLUME DATA TRANSFER AND PROCESSING FOR e-VLBI YASUHIRO KOYAMA∗ , TETSURO KONDO and MORITAKA KIMURA Kashima Space Research Center, National Institute of Information and Communications Technology, 893-1 Hirai, Kashima, Ibaraki 314-8501, Japan ∗ [email protected] MASAKI HIRABARU New Generation Network Research Center, National Institute of Information and Communications Technology, 4-2-1 Nukui Kita, Koganei, Tokyo 184-8795, Japan HIROSHI TAKEUCHI Institute of Space and Astronautical Science, Japan Aerospace Exploration Agency 3-1-1 Yoshinodai, Sagamihara, Kanagawa 229-8510, Japan
In the conventional very long baseline interferometry (VLBI) measurements, radio signals received at radio telescopes are digitized and recorded with highspeed data recorders and the recorded media are transported to a correlator site for data processing. Because of the rapid developments in the high-speed network and communication technologies, it is becoming possible to perform real-time high-volume data transfer for VLBI observation data from the radio telescopes to the correlator and this mode of operation is called as e-VLBI. To ensure the inter-operability among heterogeneous observing systems, it is very important to define the standard specifications. For this purpose, a series of standard specifications were defined and are being defined as the VLBI standard interface (VSI) specifications including the data transfer protocol VSI electric (VSI-E). Efforts to support such standardizations and the current status of the e-VLBI developments will be introduced.
1. Introduction The research and developments of very long baseline interferometry (VLBI) technique began in late 1960s,1 and this technique has been extensively used in various fields of science and engineering such as geophysics, astronomy, astrometry, and space science. VLBI observation technology applies the principles of interferometry to celestial radio signals received by multiple radio telescopes, enabling high-resolution imaging of celestial radio sources 81
82
Y. Koyama et al.
received by multiple radio telescopes and high-precision determination of the time delay between received signals. Although each radio telescope presents relatively coarse resolution, the formation of an array of multiple radio telescopes can provide resolution corresponding to a virtual radio telescope with an aperture comparable to the maximum distance between radio telescopes in the array. This allows for imaging of celestial bodies at resolutions which cannot be achieved by other techniques. Another powerful aspect of the geodetic VLBI is its ability to determine the precise earth orientation parameters2 and the construction of the celestial reference frame.3,4 VLBI is recognized as the fundamental method to determine nutation parameters and the difference between the coordinated universal time (UTC) and the time defined by the earth’s rotation (UT1). The construction of the current celestial reference frame is also relying on the data obtained by global VLBI observations. However, in past international VLBI observations, observation data had to be recorded on magnetic media and the recorded media had to be transported to a site equipped with a correlator system for correlation processing of the observation data. Thus, several days to weeks were required before the results of the processing and analysis of observation data became available, and applications requiring real-time earth orientation parameters had to use values predicted by extrapolation of past observation data, which led to inevitable errors in measurement. Recent research and developments in network technology have resulted in an environment in which the network data-transmission rate far exceeds the data-recording rate of magnetic tape recorders (typically 1024 Mbps) and recent disk-based VLBI recording systems. Therefore, significant improvement in the sensitivity of VLBI observation can be expected if real-time correlation processing of observation data can be performed without the data-recording procedure at observing sites. In addition, with the near-real-time processing of VLBI observation data enabled by e-VLBI, high-precision estimation of the irregular variations of earth orientation parameters becomes theoretically possible, which in turn can lead to improved precision in tracking deep-space probes and more precise determination of the satellite orbital information required for high-precision satellite-based positioning systems such as GPS. These developments are expected to make a significant overall contribution to the fields of space exploration and geodesy. Nevertheless, technical problems remain in high-speed transmission of massive volumes of data over the Internet. Some of these problems form
Data Transfer and Processing for e-VLBI
83
interesting themes for research and developments in network technology, with topics including maximizing use of available data-transmission capacity in the presence of other types of traffic or effective congestion control under the situation of significant network transmission delay. Accordingly, numerous researchers are currently focusing concerted efforts on these and similar challenges. In the past, the Tokyo Metropolitan Wide Area Crustal Deformation Monitoring Project (also known as the Keystone Project, or KSP) launched by the Communications Research Laboratory (CRL, presently NICT) represented the initial efforts to introduce e-VLBI concept in the daily VLBI operations. System development for the project had as its goal highprecision, high-frequency measurements of the relative positions of four VLBI stations, and was structured to override the existing framework of VLBI observation systems at several points.5 One such innovation involved the realization of real-time VLBI observation data processing for the four-station, six-baseline array via an asynchronous transfer mode (ATM) network. This system, developed under the auspices of collaboration between CRL and NTT Communications, Inc., dramatically reduced the time required for data processing relative to existing systems, which relied on magnetic tape recording. Further, this was the first system of its kind to be completely automated from observation to data processing and analysis with the results of analysis of nearly non-stop VLBI observation automatically available to the public on the Internet.6 The development of this system proved that e-VLBI technology could enable near-real-time data processing with virtually no delay between observation and processing. However, this observation and processing system dedicatedly used an ATM network. As such, the system could not be used to connect multiple VLBI stations throughout the world since it was not feasible to construct an international dedicated ATM network just for e-VLBI. To increase the versatility of this system, development of the K5 observation and processing system began in 2000, eventually resolving earlier problems by enabling data transmission via Internet Protocol (IP), under Internet network conditions involving the presence of other types of traffic. 2. Developments of the K5 System Developments of K5 VLBI system have been persuaded to perform realtime or near-real-time VLBI observations and correlation processing using IP over commonly used shared network lines. Various components have
84
Y. Koyama et al.
been developed to realize the target goal in various sampling modes and speeds. The entire system is designed to cover various combinations of sampling rates, number of channels, and number of sampling bits. All of the conventional geodetic VLBI observation modes will be supported as well as the other applications like single-dish spectroscopic measurements or pulsar timing observations will also be supported. Table 1 shows comparisons of the K3, K4, and K5 systems in various aspects to identify the characteristics of the K5 system. As shown in Table 1, K5 system is characterized by the use of disk-based recording method and by the use of IP protocol for e-VLBI. The data-correlation processing is performed by using software correlator programs running on multiple PC systems in the K5 system. Similarly, the K4 system can be characterized by the use of rotary-head, cassette-type magnetic tape recorders, and the K3 system can be characterized by the use of open-reel magnetic tape recorders. Even in the time of K3 system, very primitive type of e-VLBI was examined by using telephone line and acoustic modem. The data transfer speed was very low and it took very long time to transfer the observed data. Although it was not feasible to transfer all the observed data, but small fraction of the data were transferred and were processed in order to verify the observations were successful. With the K4 system, real-time e-VLBI processing was realized by using the ATM network. This attempt was quite ambition and demonstrated the unique capabilities of real-time e-VLBI. However, it was necessary to realize e-VLBI over the shared IP
Table 1.
Comparisons of the K3, K4, and K5 systems. K3
K4
K5
Data Recorders
Magnetic Tapes Longitudinal Recorders
Magnetic Tapes Rotary Head Recorders
Hard Disks
e-VLBI
Telephone Line
ATM
IP
Correlator
Hardware
Hardware
Software
Years in use
1983–
1990–
2002–
Systems
M96 Recorder, K3 Formatter, K3 Video Converter, K3 Correlator
DIR-1000, -L, -M, DFC1100, DFC2100, K4 VC (Type-1, 2), TDS784, ADS1000, GBR1000, GBR2000D, K4 Correlator, GICO, GICO2
K5/VSSP, K5/VSSP32, K5/VSI, ADS1000, ADS2000, ADS3000, K5 Correlator
Data Transfer and Processing for e-VLBI
85
networks in order to expand the e-VLBI network to the international baselines. For this purpose, the developments of the K5 VLBI system was initiated. With the K5 system, it became possible to transfer observation data over the shared IP networks. In contrast to the real-time correlationprocessing unit developed for the KSP, which realized high-speed digital data processing through the use of the field programmable gate array (FPGA), a software correlator has been developed for the K5 system to perform distributed processing using a PC running on a versatile operating system. While a hardware correlator lacks flexibility due to the extended required development time, a software correlator may be modified flexibly to add new functions or to revise processing modes. In addition, development is also underway of software required to enable distributed processing using available computer resources (consisting of multiple CPUs) to the maximum extent, in order to provide the needed processing capacity for data collected at numerous VLBI stations in the context of large-scale VLBI experiments. The concept of the entire K5 system is shown in Fig. 1. ADS1000, ADS2000, and ADS3000 shown in Fig. 1 are the high-speed A/D sampler units. Output digital signals from these A/D sampler units are interfaced to the PCI bus of conventional PC system by complying the PC-VSI board according to the VLBI standard interface hardware (VSI-H) specifications. Whereas the ADS1000 is designed as a single-channel high-speed sampler unit, the ADS2000 is designed for geodetic VLBI observations by supporting 16 channels at the sampling rates up to 64 Msps for each channel. The ADS3000 A/D sampler unit is currently under development and it will support various observing modes by using FPGA and 2048 Msps highspeed A/D sampler chip. The prototype unit of the ADS3000 has been completed and the first fringe was detected between 34 and 11-m stations at Kashima with the single-channel observation mode with the sampling mode of 2048 Msps 2 bits per sample. When relatively low sampling rates are required, PC systems with the IP-VLBI units are used as shown in the bottom of Fig. 1. Originally, specially designed PCI boards called IP-VLBI boards were used in the K5/VSSP system. The boards are installed to the PCI expansion bus slots to support four channels of input signals sampled with the rates up to 16 Msps. Recently, the external K5/VSSP32 units have been developed. The K5/VSSP32 units are connected with PC systems by using USB 2.0 interface cables. The new units support 32 and 64 Msps sampling rates which were not supported by the PCI type IP-VLBI boards. By using four units of the K5/VSSP32, 16-channel geodetic VLBI observations can be performed. VSSP is an acronym of
86
Y. Koyama et al.
Fig. 1.
Concept of the entire K5 system.
the Versatile Scientific Sampling Processor. This name is used because the system is designed to be used for general scientific measurements. The system has a capability to sample analog data stream by using the external frequency standard signal and the precise information of the sampled timing. The system is also used to process the sampled data. For geodetic VLBI observations, software correlation program runs on K5/VSSP and K5/VSSP32 systems. Therefore, it can be said that the functions of the formatter, the data recorder, and the correlator are combined into the single system. It consists of four UNIX PC systems. In Table 2, the characteristics of these A/D sampling components developed for the K5 system are shown.
3. e-VLBI Data Transfer Using VSI-E By using the developed K5 system, we have started efforts to transfer the observed data over the network using the VSI-E protocol. VSI-E is under discussion to formalize the standard protocol aimed for massive
Data Transfer and Processing for e-VLBI Table 2.
87
Comparisons of A/D sampling systems in the K5 system.
K5/VSSP
K5/VSSP32
ADS1000
ADS2000
ADS3000
Sampling speed
40, 100, 200, 500 kHz, 1, 2, 4, 5, 16 MHz
40, 100, 200, 1024 Msps 500 kHz, 1, 2, 4, 8, 16, 32, 64 MHz
64 Msps
2048 Msps
No. Bits
1, 2, 4, 8
1, 2, 4, 8
1, 2
1, 2
8
No. Channels
1, 4, 16 1, 4, 16 (with 4 units) (with 4 units)
1
16
Programmable (FPGA)
Max. data 512 Mbps 1024 Mbps rate (with 4 units) (with 4 units)
2048 Mbps 2048 Mbps 4096 Mbps
real-time data transfer for e-VLBI. The current draft of the VSI-E (Revision 2.7 which was revised on January 28, 2004) is available at http://www.haystack.mit.edu/tech/vlbi/vsi/docs/VSI-E-2-7.pdf. Based on the current draft proposal of the VSI-E, VLBI Transport Protocol (vtp) libraries have been developed at Haystack Observatory.7 The library interfaces data stream from the transmitting sites to the receiving sites over the Internet network using either TCP/IP or UDP/IP. An interface program for the K5 data stream at the transmitting site has been developed by us and the program was used with the vtp libraries. By using these programs, the file format conversion demonstration was performed in July 2005 as shown in Fig. 2. Recently, we are continuing our efforts to use these programs for real-time correlation by using Mark-4 correlator at the Haystack Observatory after converting the K5 format data stream at the transmitting site and sending the converted data stream over the network using the VSI-E protocol as shown in Fig. 3. In the next step, we are planning to develop an interface for the receiving side to be converted into the K5 data stream format. When the program is completed, it becomes
Fig. 2.
File format conversion by using VSI-E.
88
Y. Koyama et al.
Fig. 3.
Real-time data transfer by using VSI-E.
possible to correlat the transmitted data with the K5 software correlator in real-time by using the data stream signals transmitted using the VSI-E protocol.
4. Conclusions As discussed in the previous sections, we have been pursuing methods of establishing a standard data-transmission format, which will reduce the time required to convert the various file formats of the different observation systems. Our goal is to enable estimation of all earth orientation parameters, especially UT1, instantaneously after observation through the development of a real-time correlator system offering increased processing speed, allowing observation data to be transmitted in real time, and through correlation processing that does not require prior hard-disk storage. To accomplish this goal, technologies must be established for high-speed transmission of wide-band e-VLBI observation data over extremely long distances between points featuring large time delays. Thus, it is important to promote research on the application of a research-dedicated high-speed Internet to studies on scientific instrumentation such as advanced congestion-control technologies and dynamic-control technologies to ensure efficient use of transmission bandwidth in these applications without inhibiting concurrent network traffic. In the typical VLBI observations, a common celestial radio source is observed by many radio telescopes for certain amount of time such as a few minutes. Cross-correlation function and then the peak amplitude and the phase of the peak are used for data analysis. For this calculation, a fractional data loss effects the signal-to-noise ratio, but it does not cause a serious problem for the important observables. Therefore, it is usually acceptable to drop a fraction of VLBI observation data when the data traffic on the network is quite high. The currently proposed VSI-E protocol is designed on the Real-time Transport Protocol, and it is possible to implement such
Data Transfer and Processing for e-VLBI
89
feature to limit the data traffic of e-VLBI when the other purpose traffic is using the same network. Additional associated themes for future research and developments include the software correlator system for distributed and time-sharing processing of observation data to maximize the use of available computer resources as well as the development of a distributed multicast processing system capable of sending identical sets of data to multiple CPUs for distributed processing, in order to perform large-scale e-VLBI experiments where multiple stations perform observations at the same time. In the near future, we need to design the large-scale software correlator system for large-scale e-VLBI observations. In case when many radio telescopes are involved in the e-VLBI session, the multicasting capability of the VSI-E is considered to be a very adequate feature to construct effectively the largescale software correlator. Such efforts for research and developments are ultimately expected to produce new technologies not only useful in e-VBLI development, but also in the application of research-dedicated high-speed networks to various fields of scientific instrumentation and further areas of research and developments.
Acknowledgments The e-VLBI research and developments has been conducted jointly with a group of researchers from various institutes including the Haystack Observatory of the Massachusetts Institute of Technology. In Japan, the NICT is collaborating with the National Astronomical Observatory of Japan, the Geographical Survey Institute, the Japan Aerospace Exploration Agency, Gifu University, and Yamaguchi University. We are deeply grateful for the support of the NTT Laboratories, KDDI R&D Laboratories, NTT Communications, Inc. in the use of the research-dedicated high-speed network, and to the staff members managing the research networks of the JGNII, TransPAC2, Internet2, and Super SINET.
References 1. T. A. Clark, B. E. Corey, J. L. Davis, G. Elgered, T. A. Herring, H. F. Hinteregger, C. A. Knight, J. I. Levine, G. Lundqvist, C. Ma, E. F. Nesman, R. B. Phillips, A. E. E. Rogers, B. O. R¨onn¨ ang, J. W. Ryan, B. R. Schupler, D. B. Shaffer, I. I. Shapiro, N. R. Vandenberg, J. C. Webber and A. R. Whitney, IEEE Trans. Geosci. Remote Sens. GE-23 (1985) 438–449.
90
Y. Koyama et al.
2. T. M. Eubanks, in Contributions of Space Geodesy to Geodynamics: Earth Dynamics, eds. D. E. Smith and D. L. Turcotte, Geodynamics Series, 24 (American Geophysical Union, Washington, DC, 1993), pp. 1–54. 3. C. Ma and M. Feissel (eds.), Definition and Realization of the International Celestial Reference System by VLBI Astrometry of Extragalactic Objects, IERS Tech. Note 23 (Central Bureau of IERS, Obs. de Paris, Paris, 1997). 4. C. Ma, E. F. Arias, T. M. Eubanks, A. L. Fey, A.-M. Gontier, C. S. Jacobs, O. J. Sovers, B. A. Archinal and P. Charlot, Astron. J. 116 (1998) 516–546. 5. T. Yoshino, J. Comm. Res. Lab. 46 (1999) 3–6. 6. Y. Koyama, N. Kurihara, T. Kondo, M. Sekido, Y. Takahashi, H. Kiuchi and K. Hek, Earth Planets Space 50 (1998) 709–722. 7. D. Lapsley and A. Whitney, in Proceedings of the Seventh European VLBI Network Symposium, eds. R. Bachiller, F. Colomer, J. F. Desmurs and P. de Vincente Toledo, Spain, October 2004.
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
A COMPARISON OF SUPPORT VECTOR MACHINES AND ARTIFICIAL NEURAL NETWORKS IN HYDROLOGICAL/METEOROLOGICAL TIME SERIES PREDICTION DULAKSHI S. K. KARUNASINGHA Department of Engineering Mathematics, University of Peradeniya Peradeniya, Sri Lanka [email protected] SHIE-YUI LIONG Tropical Marine Science Institute, National University of Singapore No. 14, Kent Ridge Road, Singapore 119223, Singapore [email protected]
Artificial Neural Networks (ANN) and Support Vector Machines (SVM) are the two most popular machine learning techniques now used in time series prediction of many different areas. ANN relies mostly on heuristics whereas SVM is mathematically well founded. The general perception is that SVM may outperform ANN. Many advantages of SVM over ANN, from theoretical point of view, have been identified. However, such drawbacks of ANN are not always identified as practical issues and remedial measures to overcome them are also available. This study compares the two techniques, SVM and ANN, in hydrological/meteorological time series prediction with respect to the most important practical aspects: prediction accuracy and computational time required. The results of this study and the evidence from other published research show that, in practice, performance of both ANN and SVM are similar in terms of prediction accuracy and computational effort.
1. Introduction Machine learning techniques are becoming very popular in time series prediction in many different areas over the conventional prediction approaches. Hydrological/meteorological time series analysis is one such area. The two most popular machine-learning techniques are: (i) artificial neural networks (ANN) and (ii) support vector machines (SVM). ANN has been in practice for about two decades now whereas the relatively new SVM came into practice only about 6–8 years ago. SVM is mathematically well 91
92
D. S. K. Karunasingha and S.-Y. Liong
founded1,2 compared with ANN, which is founded mostly on heuristics. The general perception is, therefore, that SVM may outperform ANN, although no direct comparisons have been conducted. Many advantages of SVM over ANN, from theoretical point of view, have been listed in many references. However, how effective they are in practice is not clear. For example, it is now known that ANN’s convergence to local optima, which is the major criticism of ANN, is not a practical issue,1 since these local optima are found to be close to optimal solutions most of the time. Vapnik,2 in the first introductory literature on SVM and the statistical learning theory, lists another two drawbacks of ANN. One is the rather slow convergence of gradient-based method of ANN. However, Vapnik also agreed that there are several heuristics available to speed up the rate of convergence. The other drawback is the sigmoid function used in ANN has a scaling factor that affects the quality of approximation. The choice of the scaling factor is a trade-off between the quality of approximation and the rate of convergence. In prediction problems this is the tradeoff between prediction accuracy and the computational time required. This study compares the two techniques, SVM and ANN, in hydrological/ meteorological time series prediction with respect to the prediction accuracy and computational time required. 2. Methodology Most of the time, the aim of prediction problems is to make accurate predictions with minimal computational time and effort. This paper compares the two techniques ANN and SVM with respect to the following practical considerations: (i) prediction accuracy and (ii) computational time and effort needed to tune the model parameters. Several data sets of record lengths varying from approximately 600 to 7000 are considered. In this study, the tests are conducted on a theoretical chaotic data set and a river flow time series. Some results from other published documents will also be presented. A noise-free and a Gaussian noise-added chaotic Lorenz time series each of 6000 record length are used as a benchmark. The noise level of the noisy series is 5%; where noise level is the ratio of standard deviation of noise to standard deviation of noise-free series signal. A mean daily river flow time series of a record length of 6900 (19 years of data of Mississippi river measured at Vicksburg) is then analyzed. Multilayer perceptron models are used for ANN predictions. Matlab neural network toolbox is used for this purpose. For details of the data used and the implementation of ANN, readers are referred to Karunasingha
SVM and ANN in Hydrological/Meteorological Time Series Prediction
93
and Liong.3 In this study, the SVM with epsilon insensitive cost function and Gaussian kernel function, which has lot of power in approximating nonlinear relationships, is used. The standard SVM solution method is not efficient when large numbers of data has to be handled. Two approaches may be used for such large-scale problems: (i) the decomposition methods4,5 where the optimization problem of SVM is decomposed into small problems of manageable sizes and (ii) the least squares support vector machines (LSSVM)6 where the regression is carried out in the feature space. Application of these techniques in hydrological time series can be found in Yu et al.7 and Yu.8 This study uses the decomposition technique. A software implemented in C++ language, SVMTorch II of Collobert and Bengio,9 which uses a decomposition technique, is used for the simulations in this study. The SVM with epsilon insensitive cost function and Gaussian kernel function has three parameters of which the optimal values have to be found. Although different guidelines are available, finding optimal values for these parameters is still a topic of research.10 It has been observed that the values suggested in the literature on SVM model parameters do not necessarily provide best predictions.7,11 Sivapragasam11 used a trial-anderror approach whereas Yu et al.7 used an evolutionary algorithm to find the optimal values for the SVM parameters. This study uses a micro–genetic algorithm12 to find optimal SVM parameters. Measuring computational time and effort for different techniques is not a straightforward task due to several reasons; for example, different techniques have different phases in their applications, which are not directly comparable. The software that implement the techniques, speed of the computers, also affect the computational time. In this study, the ANN and SVM are implemented in different programming languages on different platforms. However, the results are finally transformed to comparable figures (details can be found in Ref. 13) and the values that will be shown in this paper are to be taken only as indicative values for the comparison of relative performance. It should also be noted that the times measured are not the times taken to train a single SVM model or a single ANN model. They are the times taken to train a series of models in order to determine the best model/parameters for prediction with the respective method.
3. Results The prediction accuracies and the computational times for the two methods are shown in Table 1. Results shown are for validation data sets and the
94 Table 1. time.
D. S. K. Karunasingha and S.-Y. Liong Prediction errors (MAE) of ANN and SVM on validation set and computation
Time series
Lead time
Prediction error (MAE) ANN
SVM
% Prediction improvement of SVM over ANN
Approximate computational time (h) ANN
SVM
Noise-free Lorenz
1 3 5
0.0032 0.0036 0.0042
0.0044 0.0064 0.0088
−38% −78% −110%
26*
213
5% noisy Lorenz
1 3 5
0.6395 0.6761 0.7167
0.6392 0.6847 0.7231
0% −1% −1%
4.3*
5
Mississippi river flow
1 3 5
207.31 m3 /s 767.93 m3 /s 1465.24 m3 /s
206.99 m3 /s 792.65 m3 /s 1483.97 m3 /s
0% −3% −1%
0.8*
10
* Times taken to derive the optimal parameters (the number of hidden neurons and epochs, etc.) are not included.
prediction accuracy is measured in terms of mean absolute error (MAE) as follows: N
i=1
xi − x ˆi
(1) N where xi is the desired value and x ˆi is the predicted value. For the case of perfect prediction, the value of MAE is zero. Results show that both the techniques are equally effective in terms of prediction accuracy except in the case of noise-free Lorenz data. Poor performance of SVM in noise-free data could be due to the numerical problems in solving the optimization problem of SVM using the decomposition method. The computational times are also comparable again with the exception of the case of noise-free Lorenz data. It should be reminded that the times reported are the times taken to train a series of SVM/ANN in order to determine the best model/parameters for prediction with the respective method. The increased time taken in SVM with noise-free time series could be due to the algorithm taking large number of iterations to reach the very high accuracy that is possible with noise-free data. Although the computational times needed to tune ANN parameters are not included in Table 1 (which amounts to another couple of hours, depending on the experience of the user), the total computational effort do not differ significantly between the two methods. For a given set of model parameters training an SVM is much faster than training an MAE =
SVM and ANN in Hydrological/Meteorological Time Series Prediction
95
ANN. However, for good prediction accuracy search for optimal parameters is necessary. Therefore, in the absence of accurate methods to determine optimal parameters, longer times for parameter selection and training are inevitable for SVM as well. No study exploring the computational times taken by ANN and SVM has appeared in hydrological/meteorological time series prediction. However, trends similar to this study have been reported for prediction accuracy in other published literature. Yu8 has applied an LSSVM approach for the Mississippi time series prediction. Comparison of the prediction accuracy with ANN predictions of the same series3 shows no significant difference between the two methods. Sivapragasam11 reported that the SVM predictions are as good as ANN predictions for Bangladesh waterlevel prediction problem. In addition, the present authors have used the two techniques in the prediction of sea surface Anomaly data in South China Sea region (record length of approximately 600) and observed same level of effectiveness in terms of prediction accuracy (Internal Report; Tropical Marine Science Institute, National University of Singapore). In overall, the results show that, in practice, performance of both ANN and SVM are similar in terms of prediction accuracy and computational effort. 4. Discussion ANN has been in practice for about two decades and the research on ANN technique is almost saturated. Thus, ANN may now be considered as a standard prediction tool. The heuristic techniques ANN uses have been implemented in many ANN software and are easily available to the user. However, research related to SVM still goes on, and this study considered SVM in its present form compared with ANN. The study showed, SVM in its present form does not outperform, in terms of prediction accuracy and computational effort, the ANN at least in hydrological/meteorological time series prediction. Rather than SVM is appreciated as a “novel” technique, it is high time now to explore the “real” strengths of SVM compared with other techniques. The sound theoretical foundation is unarguably the biggest strength of SVM. However, how these theoretical insights can be incorporated in practice, in order to achieve better performance compared with other techniques, is more important for the application people. It appears it is more advisable to identify which prediction technique is more suitable for a given problem with respect to the desirable criterions of that particular problem such as, for example, ease of implementation.
96
D. S. K. Karunasingha and S.-Y. Liong
References 1. V. N. Vapnik, The Nature of Statistical Learning Theory, 1st Ed. (Springer, New York, 1995). 2. V. N. Vapnik, The Nature of Statistical Learning Theory, 2nd Ed. (Springer, New York, 1999). 3. D. S. K. Karunasingha and S. Y. Liong, J. Hydrol. 323 (2006) 92–105. 4. J. C. Platt, in Advances in Kernel Methods — Support Vector Learning, eds. B. Sch¨ olkopf, C. Burges and A. Smola (MIT Press, Cambridge, MA, 1999), pp. 185–208. 5. T. Joachims, in Advances in Kernel Methods — Support Vector Learning, eds. B. Sch¨ olkopf, C. Burges and A. Smola (MIT Press, Cambridge, MA, 1999), pp. 169–183. 6. J. A. K. Suykens, G. T. Van, B. J. De, M. B. De and J. Vandewalle, Least Squares Support Vector Machines (World Scientific, Singapore, 2002). 7. X. Y. Yu, S. Y. Liong and V. Babovic, J. Hydroinform. 6(3) (2004) 209–223. 8. X. Y. Yu, Support Vector Machine in Chaotic Hydrological Time Series Forecasting. PhD Thesis (National University of Singapore, Singapore, 2004). 9. R. Collobert and S. Bengio, J. Machine Learn. Res. 1 (2001) 143–160. 10. V. Cherkassky and Y. Ma, Neural Networks 17(1) (2004) 113–126. 11. C. Sivapragasam, Multi-objective Evolutionary Techniques in Defining Optimal Policies for Real-time Operation of Reservoir Systems. PhD Thesis (National University of Singapore, Singapore, 2003). 12. K. Krishnakumar, SPIE Conference on Intelligent Control and Adaptive Systems, Philadelphia, PA, Vol. 1196 (1989). 13. D. S. K. Karunasingha, Efficient and Prediction Enhancement Schemes in Chaotic Hydrological Time Series Analysis. PhD Thesis (National University of Singapore, Singapore, 2006). 14. C. Sivapragasam, S. Y. Liong and M. F. K. Pasha, J. Hydroinform. 3(2) (2001) 141–152.
Advances in Geosciences Vol. 6: Hydrological Science (2006) Eds. Namsik Park et al. c World Scientific Publishing Company
LONG-TERM WATER AND SEDIMENT CHANGE DETECTION IN A SMALL MOUNTAINOUS TRIBUTARY OF THE LOWER PEARL RIVER, CHINA S. ZHANG and X. X. LU Department of Geography, National University of Singapore Arts Link One, Singapore 117570, Singapore [email protected]
Hydrological regimes of river systems have been changing both qualitatively and quantitatively due to the profound human disturbances, such as river diversions, damming, and land use change. In this study, a mountainous tributary (the Luodingjiang River) of the lower Pearl River, China, was investigated to illustrate the impacts from human activities on river systems during the period 1959–2002. Mann–Kendall test and Spearman test for gradual trend and Pettitt test for abrupt change were employed to investigate the hydrological characteristics of the Luodingjiang River. Annual minimum water discharge and annual sediment yield series have significant increasing and decreasing trends, respectively, and also significant upward and downward shifts were detected by abrupt change tests, respectively, for these two data series. Neither statistically significant trends nor abrupt shift were found for annual maximum water discharge and annual mean water discharge series. The detected changes both in water and sediment point to the impacts of reservoir constructions, water diversion programs as well as land use change. However, the sediment-increasing impacts from other anthropogenic disturbances, such as road construction and mining, cannot be discerned from the recent hydrological responses.
1. Introduction River water discharge and sediment load in the world context have been changed due to both natural factors (climate change) and anthropogenic factors (land use change, river regulations, and water abstraction).1 Detection of changes of water discharge and sediment load in long-term records is of scientific and practical importance, for planning of water resources and flood protection.2 In this study, both gradual trends and abrupt changes of water discharge and sediment yield in a mountainous tributary of the Lower Pearl River were examined based on long-term 97
98
S. Zhang and X. X. Lu
records (1959–2002). Potential factors contributing to these changes were discussed. 2. Study Area The study area, the Guanliang Catchment with an area of 3164 km2 , is located between 22◦ 14′ 54′′ –23◦ 01′ 30′′ N and 111◦ 10′ 25′′ –111◦ 48′ 08′′ E in Guangdong Province of South China. The location of the study area is shown in Fig. 1. Luodingjiang River is a main tributary of the lower Xijiang River, the main river of the Pearl River system, China. The main stem of Luodingjiang River originates from Jilongshan of Xinyi County in Guangdong province of China, and drains a length of 201 km from south to north before entering Xijiang River and the river length is 180 km at the Gauging station, Guanliang. The annual water discharge and sediment yield at Guanliang station (averaged for the period 1959–2002) are 2.67 × 109 m3 and 460.7 t/km2 , respectively. There are four climate stations in and around the catchment, where long-term precipitation records for the period 1959– 2002 are also available.
Fig. 1.
The location of the study area.
Long-Term Water and Sediment Change Detection
99
The elevation in the catchment ranges from less than 50 to 1350 m, decreasing from south to north and from west to east. The catchment is located in a subtropical monsoon zone, south of the Tropic of Cancer. The annual average temperature in the catchment is 18.3–22.1◦ C, and the maximum and the minimum values for average monthly temperature are 28.7◦ C in July and 13.3◦ C in January, respectively. The annual average precipitation is between 1260 and 1600 mm with a distinct spatial pattern. The precipitation in the period from April to September (the wet season) accounts for 88.8% of the annual precipitation. The annual average evaporation at the Luoding weather station is 1640 mm, which is 251 mm more than the annual average precipitation 1389 mm. 3. Data Series and Methodology
3500
y = -8.2027x + 17724.2; R 2 = 0.024; p=0.312
3000 2500 2000 1500 1000 500 0 1959
1965
1977 1983 Year
1989
1995
y = 0.1968x -375.1; R 2 = 0.120; p=0.021
35 30 25 20 15 10 5 1965
1400
y = 0.2474x -405.7; R 2 = 0.017; p=0.398
140
1971
1977
1983 Year
1989
1995
2001
y = -7.3685x + 15036.7; R 2 = 0.149; p=0.010
1200
120 100 80 60 40
1000 800 600 400 200
20 0 1959
40
0 1959
2001
Sediment yield SY (kg/km2/year)
Annual mean discharge Qmean (m3/s)
160
1971
Annual minimum discharge Qmin (m3/s)
Annual maximum discharge Qmax (m3/s)
In this study, the series of annual maximum water discharge (Qmax ), annual minimum water discharge (Qmin ), annual mean water discharge (Qmean ), and annual sediment yield (SY) during the period 1959–2002 at the Guanliang hydrological station were examined to represent the longterm hydrological characteristics in the study area (Fig. 2). Both Mann–Kendall (MK) test3,4 and Spearman’s rho (SR) test for gradual trend test, and Pettitt test5 for abrupt change test were employed to examine change trends in this study. Before applying the above nonparametric change detection methods, the simple linear regression was used
0
1965
1971
1977
1983 Year
1989
1995
2001
1959
1965
1971
1977
1983
1989
1995
2001
Year
Fig. 2. Annual series of water discharge and sediment yield at Guanliang hydrological station.
100
S. Zhang and X. X. Lu
to show the linear trend in the hydrological time series as a preliminary step to show the general behavior of hydrological variables in the long-term period (Fig. 2). The significance of the linear trends was tested using the t-statistic applied to the standard error of the regression coefficient. The non-parametric MK statistical test has been widely used to assess the significance of trends in hydro-meteorological time series due to its robustness against non-normally distributed, censored, and missing data as well as its comparable power as parametric competitors.6 The null hypothesis H0 is that a sample of data Xi (i = 1, 2, . . . , n) is independent and identically distributed. The alternative hypothesis H1 is that a monotonic trend exists in Xi . The statistic S of Kendall’s τ is defined as follows: S=
n−1
n
i=1 j=i+1
sgn(Xj − Xi )
(1)
where if θ > 0 1 sgn(θ) = 0 if θ = 0 −1 if θ < 0
The standardized test statistic Z is defined as follows: S−1 var(S) S > 0 Z= 0 S=0 S+1 S Z(1−α/2) , where Z(1−α/2) is the value of the standard normal distribution with a probability α/2. Pettitt test is used to test one unknown change point by considering a sequence of random variables X1, X2, . . . , XT, which have a change point
Long-Term Water and Sediment Change Detection
101
at τ (Xt) for t = 1, 2, . . . , τ have a common distribution function F1(x) and Xt for t = τ + 1, . . . , T have a common distribution function F2(x), and F1(x) = F2(x)). The null hypothesis H0: no change or τ = T is tested against the alternative hypothesis Ha: change or 1 ≤ τ < T using the non-parametric statistic KT = max1≤t≤T |Ut,T | = max(KT+ , KT− ), where if θ > 0 t T 1 Ut,T = sgn(Xi − Xj ), sgn(θ) = 0 if θ = 0 i=1 j=t+1 −1 if θ < 0 KT+ = max1≤t≤T Ut,T for downward shift and KT− = − min1≤t≤T Ut,T for upward shift. The significant level associated with KT+ or KT− is determined approximately by −6KT2 ρ = exp T3 + T2
when ρ is smaller than the specific significance level, e.g., 0.05 in this study, the null hypothesis is rejected. The time t when the KT occurs is the changepoint time. However, both MK test and Pettitt test are not robust against autocorrelation or serial correlation, which may be statistically significant in some of the hydrological series. In this study, preliminary analysis for checking autocorrelation was conducted by examining the autocorrelation coefficients of the time series. The autocorrelation coefficient is calculated using the following equation by Haan7 : n−k ¯ ¯ (1/n − k) t=1 (Xt − X)(X t+k − X) n (3) rk = ¯ (1/n − 1) t−1 (Xt − X) ¯ = n (Xt )/n is the sample size, and k is the lag. The critical where X t=1 value for a given confidence level (e.g., 95%) is calculated following the equation by Salas et al.8 : √ −1 ± 1.96 n − k − 1 rk (95%) = (4) n−k If the autocorrelation coefficients fall into the 95% confidence interval at different lags, the independent null hypothesis cannot be rejected. Otherwise, the alternative hypothesis of dependence is accepted at the 5% significance level.
102
S. Zhang and X. X. Lu
0.5
0.5
0 0
1
2
3
4
5
6
7
8
9
10Lag
ACF
(b) 1
ACF
(a) 1
0 0
−0.5
−0.5
−1
−1
(c) 1
(d)
2
3
4
5
6
7
8
9
10 Lag
3
4
5
6
7
8
9
10 Lag
1 0.5
0 0
1
2
3
4
5
6
7
8
9
10
Lag
ACF
ACF
0.5
1
0 0
−0.5
−0.5
−1
−1
1
2
Fig. 3. Autocorrelation coefficient (ACF) of annual minimum water discharge (Qmin ) and annual sediment yield (SY) before and after pre-whitening using ARIMA model with 95% confidence limits: (a) ACF of Qmin before pre-whitening; (b) ACF of Qmin after pre-whitening using ARIMA(1, 0, 0); (c) ACF of SY before pre-whitening; (d) ACF of SY after pre-whitening using ARIMA(2, 0, 0).
The results of serial correlation analysis indicate that Qmin and SY series have significant serial correlations at lag = 1 and 2, respectively (Fig. 3a, c). The residuals of series of Qmin and SY after pre-whitening through models ARIMA(1, 0, 0) and ARIMA(2, 0, 0) were treated as new time series of Qmin and SY in the further analysis (Fig. 3b, d).
4. Results Both MK and SR tests show that Qmin has a significant increasing trend, and SY has a significant decreasing trend; Qmax has a slightly decreasing trend and Qmean has a slightly increasing trend, but they are statistically insignificant at the significant level of 0.10 for both tests (Table 1). Pettitt test for abrupt change detection indicate there are significant upward shift sign and downward shift sign for Qmin SY, respectively, at the significant level of 0.05 (Table 2). For the Qmin series, the change point is detected around 1972, where the mean level of Qmin increased from 9.86 m3 /s for the period 1959–1971 to 16.63 m3 /s for the period 1972– 2002 (Fig. 4a). For the SY series, the change point is detected around 1986,
Long-Term Water and Sediment Change Detection
103
Table 1. Results of the Mann–Kendall (MK) test and Spearman rho (SR) test for gradual trend detection. Time series
MK test
Annual maximum discharge, Qmax (m3 /s) Annual minimum discharge, Qmin (m3 /s)* Annual mean discharge, Qmean (m3 /s) Annual sediment yield, SY (t/km2 /yr)*
SR test
tau
Z
p
rho
t
p
−0.125
−1.183
0.238
−0.205
−1.355
0.210
0.180
1.709
0.087
0.255
1.707
0.095
0.097
0.920
0.358
0.127
0.831
0.397
−0.192
−1.831
0.067
−0.301
−2.043
0.047
* The variables which have been pre-whitened to remove the serial correlation.
Table 2.
Annual Annual Annual Annual
Pettitt test results for step change detection.
maximum discharge, Qmax (m3 /s) minimum discharge, Qmin (m3 /s)* mean discharge, Qmean (m3 /s) sediment yield, SY (t/km2 /yr)*
KT
Shift
Year
ρ
176 210 145 228
− + + −
1986 1972 1969 1985
0.118 0.048 0.235 0.028
Note: −, downward shift; +, denotes upward shift. * The variables which have been pre-whitened to remove the serial correlation.
where the mean level of SY decreased from 564.72 t/km2 /year for the period 1959–1985 to 289.85 t/km2 /year for the period 1986–2002 (Fig. 4b).
5. Discussions 5.1. The impacts of climate change Based on the precipitation data at the four climate station in the catchment, the averaged precipitation fluctuates randomly and shows no obviously increasing or decreasing trend during the study period 1959–2002 (Fig. 5). Less importance of the impacts of climate change on sediment yield can also be illustrated by using double mass plot of cumulative sediment load versus cumulative water discharge (Fig. 6a). Sediment load shows a sudden decrease response to water discharge around 1988, close to the change point 1986 of the SY series detected by Pettitt test, which suggests that the decrease of sediment yield is mainly influenced by anthropogenic
S. Zhang and X. X. Lu 50
40
0
35
−50
30 25
Ut,T
−100 20 −150 15 −200
10
−250
5
−300 1959
1964
1969
1974
1979
1984
1989
1994
0 1999 Year
annual minimin discharge
Ut,T
1400
(b) 400 350
1200
300 1000
Ut,T
250 200
800
150
600
100
400
50 200
0 −50 1959
1964
1969
1974
Ut,T
1979
1984
1989
1994
Sediment yield (t/km 2/year)
(a)
Annual minimum discharge (m 3/s)
104
0 1999 Year
sediment yield
Fig. 4. Change point detected by Pettitt’s test: (a) annual minimum discharge, (b) sediment yield. Both series have been pre-whitened using ARIMA model.
disturbances in the catchment, such as reservoirs/dams construction, river diversions, and land use change, rather than climate change. The decline of sediment concentration is the main cause for the decrease of sediment load, which can be illustrated from the relationship between sediment load and water discharge (Fig. 6b).
5.2. The impacts of reservoir construction The history of reservoirs/dams construction in the Guanliang Catchment can be divided into four periods (Fig. 7): (i) before 1950, almost no water
Long-Term Water and Sediment Change Detection
Precipitation (mm/year)
2500
2
y = 113.5+0.623x ; R = 0.001; = 0.833
2000 1500 1000 500 0 1959
1965
Fig. 5.
80
1971
1977 1983 Year
1989
1995
2001
Precipitation variation in the catchment.
1959–1987 1988–2002
60
y = 0.3226x + 26.356 R 2 = 0.9755
y = 0.6398x + 0.1805 R 2 = 0.9991
40 (10 6 /t)
Cumulative annual sediment load
105
1988 20
0 0
20
40
60
80
100
120
140
Cumulative annual water discharge (109 m3)
Annual sediment load (106/t )
4
1959–1987
3.5
1988–2002
3 2.5
y = 0.8319x - 0.5035 R 2 = 0.8343
2 1.5 1
y = 0.3397x - 0.0395 R 2 = 0.3455
0.5 0 0.0
1.0
2.0
3.0
4.0
5.0
Annual water discharge (109 m3)
Fig. 6. (a) Double mass plot of cumulative annual sediment load versus cumulative annual water discharge, (b) the relationship between annual sediment load and annual water discharge.
S. Zhang and X. X. Lu
Cumulative storage capacity (106 m 3)
106
300
1st period
4th period
3rd period
250 200
2nd period
150 100 50 0 0) Kd = Cw (z, 0) (Kd = 0)
Cw (z, t)|t=0 =
(8a)
Cw (z, t)|t=0
(8b)
where Cs (z, 0) is the initial solid-phase concentration specified by the user. When the distribution coefficient (Kd = Koc · foc ) is zero, liquid-phase concentration must be entered as an initial concentration to avoid the program run-time error (division by zero). The most common type of boundary condition to be applied at the top of the soil column is either the first type (Dirichlet’s) or the third type (Cauchy’s) of boundary condition as shown below: Cw |z=0 = C0 (z = 0) exp(−γt)
Cw |z=0 = 0
(t ≤ t0 )
(t > t0 )
(9a) (9b)
or ∂Cw + qw Cw = qw C0 exp(−γt) ∂z z=0 ∂Cw + qw Cw =0 −D ∂z z=0 −D
(t ≤ t0 )
(10a)
(t > t0 )
(10b)
where C0 is the liquid-phase solute concentration in the infiltration water, γ is the decay rate [1/year] of the solute source due to either degradation or flushing by the infiltration, and t0 is the duration of solute release (years) which can be selected to simulate either “slug” or continuous input.
Unsaturated-Zone Leaching and Saturated-Zone Mixing Model
185
At the bottom of the soil column, the second-type boundary condition (Neuman’s) is applied. ∂
∂Cw =0 ∂z
(z = ∞)
(11)
In applying this boundary condition, Eq. (11) is actually implemented at a finite column length (i.e. z = ∞). To reduce the finite length effect, dummy cells are added at the bottom of the soil column automatically in the numerical calculation in the model. After evaluation of Cw (z, t), the total contaminant mass (M ) per unit volume of the soil is calculated as: M (z, t) = Ma + Mw + Ms = [θa H + θw + ρb Kd ]Cw = θCw
(12)
2.2. Saturated-zone mixing After estimating the liquid-phase solute concentration (Cw ) at the bottom of the soil column, the mixed concentration in the aquifer can be calculated using a mass-balance technique as below6,7 : Cmix =
Caq qaq Aaq + Cw qw Asoil qaq Aaq + qw Asoil
(13)
where Caq is the concentration of horizontal groundwater influx, qaq is the Darcy velocity in the aquifer, Aaq is the cross-sectional aquifer area perpendicular to the groundwater flow direction, and Asoil is the crosssectional area perpendicular to the vertical infiltration in the soil column. The aquifer area (Aaq ) is determined by multiplying the horizontal width of the soil column with the vertical solute penetration depth. Procedure for the mixing calculation is different depending on the type of soil column arrangement. In the case of the transverse (right angle) arrangement, the mixing calculation is straight forward: simply apply Eq. (13) at the each mixing element underneath the soil columns. For the parallel arrangement case, however, the mixed concentration at the upgradient cell is considered as an influx concentration to the next cell. The mixing concentration at the next cell is estimated by reapplying Eq. (13) using the two inflow concentrations. The solute penetration depth is the mixing thickness of the contaminant in the aquifer beneath the soil column. An estimation of the plume thickness
186
S. S. Lee
in the aquifer can be made using the relationship8 : −Lq w Hd = 2αv L + B 1 − exp qaq B
(14)
where Hd is the penetration depth (m), αv is the transverse (vertical) dispersivity (m) of the aquifer, L is the horizontal length dimension of the waste (m), and B is the aquifer thickness (m). In Eq. (14) the first term represents the thickness of the plume due to vertical dispersion and the second term represents that due to displacement from infiltration water. When implementing this relationship, it is necessary to specify that in the event the computed value of Hd is greater than B, the penetration thickness, Hd is set equal to B.
3. Numerical Implementation 3.1. Unsaturated-zone leaching The governing solute transport Eq. (7) is solved using the finite difference method. Differential equations dealing with liquid contaminant concentration Cw as a function of time and depth are converted into the finite difference equations dealing with the corresponding variable Cik centered on time between two time steps: Cw →
C k+1 − Cik Cik+1 + Cik ∂Cw , → i 2 ∂t ∆t
(15)
where ∆t is the time increment, the subscript i refers to the discretized soil column cell and the superscript k refers to the time level. The subscript w is dropped for simplicity. Converting the other terms into finite difference form, the governing equation can be written as: k+1 k+1 + (1 + 2Mi + Ni′ + Li )Cik+1 + (−Mi − Mi′ + Ni )Ci+1 (−Mi + Mi′ − Ni )Ci−1
k k = (Mi − Mi′ + Ni )Ci−1 + (1 − 2Mi − Ni′ − Li )Cik + (Mi + Mi′ − Ni )Ci+1 (16)
where the dimensionless constants Mi , Mi′ , Ni , Ni′ , and Li are: ∆t 1 Di+1 − Di−1 ∆t 1 Di , Mi′ ≡ 2 2(∆z) θi 2(∆z)2 θi 4 1 ∆t ∆t ∆t 1 Ni ≡ µ qi , Ni′ ≡ (qi+1 − qi−1 ), Li ≡ 4∆z θi 4∆z θi 2
Mi ≡
(17)
Unsaturated-Zone Leaching and Saturated-Zone Mixing Model
187
Similarly, the finite difference form of the initial condition for the liquid phase solute concentration is (Cs1 )i Kd
(Kd > 0, 2 ≤ i ≤ n − 1)
(18a)
Ci1 = (Cw1 )i
(Kd = 0, 2 ≤ i ≤ n − 1)
(18b)
Ci1 = or
The finite difference forms of the top boundary conditions for the soil column are: First-type top boundary condition: Csk (z = 0) exp{−γ(k − 1)∆t} Kd Csk = 0 when t ≤ t0 , Csk = 0 when t > t0 Cik =
(k = 1, 2, . . . )
(19)
Ψ′ k+1 Ω′ k Ψ′ k qw C0 C = − C + ′ C2 + exp(−γt) 2 Φ′ Φ′ 2 Φ Φ′
(20)
Third-type top boundary condition: C1k+1 − where DM D(2M + L + 1) qw + , Ψ= , 4(∆z)(M + N ) 2 4(∆z)(M + N ) D(2M + L − 1) qw Ω= + , 4(∆z)(M + N ) 2
Φ′ =
M , N , and L were defined in Eq. (17). The second-type bottom boundary condition is used in this model as follows: k+1 Cnk+1 − Cn−1 =0 ∆z
(21)
The above finite difference form of simultaneous equations are computer coded in C++ to solve for the value of C1k+1 by the Thomas algorithm.9
3.2. Numerical stability Often, the efficiency of a numerical technique is limited due to the instability, oscillation, and mass-balance problems. Several methods have been proposed to determine the stability criteria of finite difference calculation (e.g. Fourier expansion method, matrix method, and other.10
188
S. S. Lee
The Fourier expansion method, developed by von Neumann, relies on a Fourier decomposition of the numerical solution in space neglecting boundary conditions. It provides necessary conditions for stability of constant coefficient problems regardless of the type of boundary condition.11 The matrix method, however, takes eigenvectors of the spacediscretization operator, including the boundary conditions, as a basis for the representation of the spatial behavior of the solution.10,12 Based on the von Neumann method, Crank–Nicolson scheme of finite difference equation can be derived as: ∆z