Solar Irradiance and Photovoltaic Power Forecasting (Energy Analytics) [1 ed.] 1032068124, 9781032068121

Forecasting plays an indispensable role in grid integration of solar energy, which is an important pathway toward the gr

110 25 31MB

English Pages 659 Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Solar Irradiance and Photovoltaic Power Forecasting (Energy Analytics) [1 ed.]
 1032068124, 9781032068121

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Solar Irradiance and Photovoltaic Power Forecasting Forecasting plays an indispensable role in the grid integration of solar energy, which is an important pathway toward the grand goal of achieving planetary carbon neutrality. This rather specialized field of solar forecasting constitutes both irradiance and photovoltaic power forecasting. Its dependence on atmospheric sciences and implications for power system operations and planning make the multi-disciplinary nature of solar forecasting immediately obvious. Advances in solar forecasting represent a quiet revolution, as the landscape of solar forecasting research and practice has dramatically advanced as compared to just a decade ago. Solar Irradiance and Photovoltaic Power Forecasting provides the reader with a holistic view of all major aspects of solar forecasting: the philosophy, statistical preliminaries, data and software, base forecasting methods, post-processing techniques, forecast verification tools, irradiance-to-power conversion sequences, and the hierarchical and firm forecasting frameworks. The book’s scope and subject matter are designed to help anyone entering the field or wishing to stay current in understanding solar forecasting theory and applications. The text provides concrete and honest advice, methodological details and algorithms, and broader perspectives for solar forecasting. Both authors are internationally recognized experts in the field, with notable accomplishments in both academia and industry. Each author has many years of experience serving as editor of top journals in solar energy meteorology. The authors, as forecasters, are concerned not merely with delivering the technical specifics through this book, but more so with the hopes of steering future solar forecasting research in a direction that can truly expand the boundary of forecasting science.

Energy Analytics Series Editor Tao Hong Solar Irradiance and Photovoltaic Power Forecasting Dazhi Yang and Jan Kleissl Analytics and Optimization for Renewable Energy Integration Ning Zhang, Chongqing Kang, Ershun Du, and Yi Wang

Solar Irradiance and Photovoltaic Power Forecasting

Dazhi Yang and Jan Kleissl

First edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 Dazhi Yang and Jan Kleissl Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC, please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-06812-1 (hbk) ISBN: 978-1-032-06814-5 (pbk) ISBN: 978-1-003-20397-1 (ebk) DOI: 10.1201/9781003203971 Typeset in Nimbus Roman font by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.

Dedication To our children

Contents Authors....................................................................................................................................xv Preface...................................................................................................................................xvii Acknowledgments..................................................................................................................xxi Chapter 1

Why We Do Solar Forecasting ....................................................................1 1.1

1.2

Chapter 2

Philosophical Thinking Tools ....................................................................26 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10

Chapter 3

Five aspects of solar forecasting ........................................................6 1.1.1 Base methods ........................................................................7 1.1.2 Post-processing .....................................................................9 1.1.3 Verification..........................................................................11 1.1.4 Irradiance-to-power conversion ..........................................13 1.1.5 Grid integration...................................................................14 Outline of the book ..........................................................................19 1.2.1 The world of solar forecasting: Understanding before doing ...................................................................................20 1.2.2 A forecasting framework for grid integration and firm power delivery: Toward dispatchable solar power..............22

Arguing with the anonymous...........................................................27 Rapoport’s rule.................................................................................30 Reduction: What is “goodness”? .....................................................32 “I own my future” ............................................................................33 Occam’s razor ..................................................................................35 Occam’s broom................................................................................37 The “novel” operator .......................................................................39 The smoke grenade ..........................................................................40 Using the lay audience as decoys ....................................................43 Bricks and ladders............................................................................46

Deterministic and Probabilistic Forecasts .................................................50 3.1 3.2

On deterministic nature of the physical world.................................51 Uncertainty and distributional forecasts ..........................................54 3.2.1 Preliminary on probability distributions .............................54 3.2.2 What is a predictive distribution? .......................................56 3.2.3 Ideal predictive distributions...............................................58 3.2.4 Types of predictive distributions .........................................59 vii

viii

Contents

3.3

3.4 Chapter 4

Solar Forecasting: The New Member of the Band ....................................83 4.1 4.2

4.3

4.4

4.5

4.6

4.7

4.8 Chapter 5

3.2.5 Other representations of distributional forecast ..................65 Ensemble forecasts ..........................................................................66 3.3.1 Typology of ensemble weather forecasts ............................67 3.3.2 Statistical ensemble forecasting..........................................70 3.3.3 Ensemble learning...............................................................74 After thought....................................................................................81

Maturity of domains of energy forecasting .....................................84 Load forecasting: Tiny differences matter .......................................87 4.2.1 A bird’s eye view on load forecasting.................................88 4.2.2 Gauging the goodness of load forecasts..............................90 4.2.3 Examples of recent load forecasting innovations ...............92 Wind forecasting: It is all about weather .........................................93 4.3.1 NWP and wind speed forecasting .......................................93 4.3.2 Wind power curve modeling ...............................................95 4.3.3 Statistical and machine learning in wind forecasting .........96 Price forecasting: Causal relationship .............................................98 4.4.1 Marketplaces and forecasting horizons...............................98 4.4.2 Short-term price forecasting models.................................100 4.4.3 Modeling tricks for price forecasting................................102 Salient features of solar irradiance ................................................103 4.5.1 Clear-sky expectations ......................................................103 4.5.2 Distribution of clear-sky index..........................................106 4.5.3 Physical limits of solar irradiance.....................................109 4.5.4 Spatio-temporal dependence.............................................111 4.5.5 Solar power curve .............................................................113 Shared research frontiers in energy forecasting.............................115 4.6.1 Advanced physics-based forecasting ................................115 4.6.2 Machine learning ..............................................................116 4.6.3 Ensemble, combining, and probabilistic forecasts............118 4.6.4 Hierarchical forecasting ....................................................119 Common issues in and recommendations for energy forecasting research .......................................................................120 4.7.1 Common issues .................................................................121 4.7.2 Recommendations.............................................................123 Chapter summary...........................................................................127

A Guide to Good Housekeeping..............................................................129 5.1

5.2

Picking the right software and tools ..............................................130 5.1.1 Python and R.....................................................................130 5.1.2 Solar forecasting packages in Python and R.....................131 Terminology of irradiance components and the k-indexes........................................................................................133

Contents

ix

5.3

5.4

5.5

5.6

Chapter 6

5.2.1 Irradiance components ......................................................134 5.2.2 k-Indexes...........................................................................136 Performing quality control.............................................................138 5.3.1 Automatic tests..................................................................139 5.3.2 Visual screening ................................................................141 5.3.3 Gap-filling techniques.......................................................145 Choice of clear-sky models in solar forecasting............................146 5.4.1 REST2, McClear, and Ineichen–Perez .............................147 5.4.2 Access and computation of clear-sky irradiance...............150 5.4.3 MSE scaling ......................................................................152 5.4.4 Stationarity test .................................................................153 5.4.5 Comparing different clear-sky models..............................156 5.4.6 Section summary...............................................................157 Temporal alignment and averaging of data....................................159 5.5.1 Data alignment and impacts on forecast verification ........159 5.5.2 Clear-sky irradiance averaging at sunrise and sunset hours..................................................................................162 Making forecasts operational.........................................................164 5.6.1 Time parameters in operational forecasting......................164 5.6.2 CAISO’s and CSG’s operational forecasting requirements......................................................................166 5.6.3 Verification of operational forecasts .................................167

Data for Solar Forecasting .......................................................................169 6.1

6.2

6.3 6.4

6.5

Complementarity of solar data.......................................................170 6.1.1 Ground-based instrumentation..........................................171 6.1.2 Remote sensing .................................................................173 6.1.3 Dynamical weather model ................................................176 What is a representative dataset? ...................................................176 6.2.1 Coverage ...........................................................................176 6.2.2 Consistency .......................................................................178 6.2.3 Accessibility......................................................................181 Choice of data presented................................................................183 Ground-based monitoring networks ..............................................184 6.4.1 Baseline Surface Radiation Network ................................185 6.4.2 SURFRAD and SOLRAD ................................................189 6.4.3 Oahu Solar Measurement Grid .........................................192 6.4.4 Oklahoma mesonet............................................................193 6.4.5 Other mesonets..................................................................196 6.4.6 Measurement and Instrumentation Data Center ...............197 Satellite-derived irradiance data ....................................................199 6.5.1 Copernicus Atmosphere Monitoring Service radiation service ...............................................................................201 6.5.2 National Solar Radiation Database ...................................203 6.5.3 Other satellite-derived irradiance databases .....................205

x

Contents 6.6

Chapter 7

Base Methods for Solar Forecast Generation ..........................................221 7.1

7.2

7.3

Chapter 8

NWP forecasts and reanalyses.......................................................207 6.6.1 European Centre for Medium-Range Weather Forecasts .208 6.6.2 National Centers for Environmental Prediction................212 6.6.3 Fifth-generation ECMWF reanalysis................................217 6.6.4 Modern-Era Retrospective Analysis for Research and Applications, version 2 .....................................................218

Numerical weather prediction........................................................222 7.1.1 The core engine of NWP: The primitive equations ..........226 7.1.2 Physical parametrizations .................................................230 7.1.3 From physical principles to NWP models ........................242 7.1.4 Initializing an NWP model: The art of data assimilation .......................................................................248 Satellite-based forecasting .............................................................252 7.2.1 Satellite-to-irradiance models ...........................................253 7.2.2 Forecasting using satellite-derived irradiance...................266 Forecasting with ground-based data ..............................................282 7.3.1 Data-driven forecasting with information from a single spatial location........................................................282 7.3.2 Spatio-temporal statistical methods ..................................283

Post-processing Solar Forecasts ..............................................................294 8.1 8.2 8.3

8.4

8.5

8.6

8.7

An overview of conversion between deterministic and probabilistic forecasts ....................................................................295 Notational convention....................................................................297 D2D post-processing .....................................................................298 8.3.1 Regression.........................................................................299 8.3.2 Filtering.............................................................................301 8.3.3 Downscaling .....................................................................304 P2D post-processing ......................................................................308 8.4.1 Summarizing predictive distribution.................................308 8.4.2 Combining deterministic forecasts ...................................313 D2P post-processing ......................................................................321 8.5.1 Analog ensemble...............................................................321 8.5.2 Method of dressing ...........................................................328 8.5.3 Probabilistic regression.....................................................331 P2P post-processing.......................................................................337 8.6.1 Calibrating ensemble forecasts .........................................339 8.6.2 Combining probabilistic forecasts ....................................348 Summary........................................................................................362

xi

Contents Chapter 9

Deterministic Forecast Verification .........................................................363 9.1

9.2 9.3

9.4

Chapter 10

Making reference solar forecasts ...................................................365 9.1.1 Skill score..........................................................................366 9.1.2 Requirements on the standard of reference.......................368 9.1.3 Optimal convex combination of climatology and persistence.........................................................................370 Problem of the measure-oriented forecast verification framework......................................................................................372 Murphy–Winkler distribution-oriented forecast verification framework......................................................................................376 9.3.1 Calibration–refinement factorization ................................378 9.3.2 Likelihood–base rate factorization....................................380 9.3.3 Decomposition of mean square error................................382 9.3.4 Computation of verification statistics ...............................383 Goodness of forecasts: A case study on verification of operational day-ahead solar forecasts............................................385 9.4.1 Case study setup................................................................385 9.4.2 Verifying raw NWP forecasts with Murphy–Winkler framework .........................................................................387 9.4.3 Verifying the post-processed NWP forecasts: The notion of consistency ........................................................394 9.4.4 Assessing NWP forecasts through their value ..................396

Probabilistic Forecast Verification...........................................................399 10.1 Definitions of properties in probabilistic forecast verification ......399 10.1.1 Gneiting’s definitions of calibration and sharpness ..........400 10.1.2 Br¨ocker’s definitions of reliability and resolution ............404 10.2 Scoring rules for probabilistic forecasts ........................................407 10.2.1 Brier score.........................................................................409 10.2.2 Continuous ranked probability score ................................413 10.2.3 Ignorance score .................................................................421 10.2.4 Quantile score ...................................................................422 10.3 Tools for visual assessment............................................................424 10.3.1 Assessing calibration ........................................................424 10.3.2 Assessing sharpness ..........................................................433 10.4 Calibration and sharpness: A case study on verification of ensemble irradiance forecasts ........................................................435

Chapter 11

Irradiance-to-power Conversion with Physical Model Chain .................439 11.1 An overview of the three classes of techniques for irradiance-to-power conversion .....................................................441 11.2 Irradiance-to-power conversion by regression...............................443 11.2.1 Integrating clear-sky information during PV power forecasting.........................................................................444

xii

Contents 11.2.2 Feature selection and engineering.....................................445 11.2.3 Other considerations for regression-based irradiance-to-power conversion.........................................448 11.3 Irradiance-to-power conversion with physical model chains ........450 11.3.1 Solar positioning ...............................................................452 11.3.2 Separation modeling .........................................................453 11.3.3 Transposition modeling.....................................................462 11.3.4 Reflection loss modeling...................................................472 11.3.5 Cell temperature modeling................................................480 11.3.6 PV modeling .....................................................................486 11.3.7 Various loss factors ...........................................................492 11.4 Irradiance-to-power conversion via hybrid methods .....................507 11.4.1 Constructing a hybrid model chain ...................................509 11.4.2 Optimizing a hybrid model chain .....................................511 11.5 Probabilistic irradiance-to-power conversion ................................512

Chapter 12

Hierarchical Forecasting and Firm Power Delivery ................................516 12.1 Notation and general form of hierarchical forecasting ..................520 12.2 Optimal forecast reconciliation using regression ..........................523 12.2.1 Condition for unbiased reconciled forecasts.....................524 12.2.2 Generalized least squares reconciliation...........................525 12.2.3 Minimum trace reconciliation...........................................525 12.2.4 Further simplifications to the MinT reconciliation ...........526 12.3 Probabilistic forecast reconciliation ..............................................528 12.3.1 Two equivalent definitions of probabilistic coherency .....528 12.3.2 Probabilistic forecast reconciliation in the Gaussian framework .........................................................................530 12.3.3 Probabilistic forecast reconciliation in the nonparametric framework .................................................531 12.4 A case study on forecast reconciliation .........................................532 12.4.1 Deterministic forecast reconciliation ................................535 12.4.2 Probabilistic forecast reconciliation..................................539 12.5 From hierarchical forecasting to firm power delivery ...................543 12.5.1 Firm power enablers .........................................................545 12.5.2 Optimizing the firm power premium ................................551 12.5.3 Moving forward with firm power delivery........................559

Epilogue ...............................................................................................................................561 Appendix A

Statistical Derivations ..............................................................................565 A.1 Expectations...................................................................................565 A.2 MSE decomposition in linear forecast combination......................566 A.3 Mean and variance of a traditional linear pool ..............................567

Contents

xiii A.4 Retrieving mean and variance of Gaussian distribution from prediction intervals ........................................................................568 A.5 MSE decompositions for forecast verification ..............................569 A.5.1 Calibration–refinement decomposition.............................569 A.5.2 Likelihood–base rate decomposition ................................571 A.5.3 Bias–variance decomposition ...........................................572 A.6 Kernel conditional density estimation ...........................................572 A.7 Calculating CRPS from ensemble forecast....................................573

Appendix B

Detailed Derivation of the Perez Model ..................................................576 B.1 Fundamentals, definitions, and units .............................................576 B.2 The three-part geometrical framework of the original Perez model .............................................................................................578 B.3 The framework of the simplified Perez model...............................584 B.3.1 Reparameterization of the model coefficients...................584 B.3.2 Allowance for negative coefficients ..................................585 B.3.3 Physical surrogate of the three-part geometrical framework .........................................................................586 B.3.4 Revised binning for differentiating the sky conditions .....587 B.4 The canonical Perez model............................................................587

Appendix C

Derivation of the Momentum Equations .................................................589

Acronyms .............................................................................................................................591 References ............................................................................................................................599 Index.....................................................................................................................................653

Authors Professor Dazhi Yang received a BEng, MSc, and PhD from the Department of Electrical Engineering, National University of Singapore in 2009, 2012, and 2015, respectively. He is currently a professor with the School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, China. Prof. Yang has authored more than 140 journal articles, among which many are ESI highly cited papers. He is the current subject editor of Solar Resources and Energy Meteorology for Solar Energy, the official journal of the International Solar Energy Society, where he has handled over 2000 submissions. He is also serving as an associate editor for Advances in Atmospheric Sciences. He is also an active participant in the International Energy Agency, Photovoltaic Power Systems Programme, Task 16: Solar Resource for High Penetration and Large Scale Applications. Over several consecutive years, Prof. Yang has been listed by Stanford University as one of the World’s Top 2% Scientists (for both Single Year and Career). Similarly, he has also been consecutively listed as one of the World’s Top 100,000 Scientists published by the Global Scholars Database. His research interests are diverse including, but not limited to, forecasting, energy meteorology, grid integration, multienergy systems, satellite remote sensing, spatio-temporal statistics, thermochemistry, battery thermal management and fault diagnosis, electromagnetic compatibility, and cultural heritage. Among these directions, he achieved the most success in solar forecasting; at the moment, according to a search on the Web of Science, he has the most journal publications on solar forecasting in the world. Professor Jan Kleissl researches the interaction of weather with engineering systems in buildings, solar power systems, and the electric power grid. He received an undergraduate degree from the University of Stuttgart and PhD from Johns Hopkins University, both in environmental engineering with a focus on environmental fluid mechanics. He is the director of the UC San Diego Center for Energy Research and professor in the Department of Mechanical and Aerospace Engineering at UC San Diego. Prof. Kleissl has published over 100 papers in top journals of solar power resources, forecasting, and integration. He and his students and postdocs developed one of the first and most successful PV variability models for large solar power plants. The model has been released as open source in Sandia National Lab’s PV-Lib toolbox and used by hundreds of researchers and practitioners globally. Prof. Kleissl also pioneered the field of sky imager forecasting and developed some of the most advanced physics-based modeling tools for sky imagery. In numerical weather prediction, his group specializes in Stratocumulus clouds and their representation in simple and complex models of the atmosphere. He received the 2009 U.S. National xv

xvi

Authors

Science Foundation CAREER Award and the 2021 Charles Greeley Abbot Award of the American Solar Energy Society for significant contribution to the advancement of solar energy through research and development (R&D). In 2012, Prof. Kleissl became associate editor, and in 2015, subject editor for Solar Resources and Energy Meteorology for the Solar Energy, where he handled over 1200 articles and was recognized for efficiency, impartiality, and quality. He has been deputy editor of the AIP Journal of Renewable and Sustainable Energy since 2019.

Preface A forecast is a prediction of a future event or quantity; forecasts can thus be considered as a subset of predictions. Forecasting has three principal characteristics. First and foremost, it only consists of dealing with events or quantities partially predictable, in that, it is concerned with neither of those events nor quantities which can be known with absolute certainty, nor those to which there is nothing, other than issuing a random guess, that can be done. The second source of theory is that a forecast must carry a notion of impact, such that its value can be materialized during decisionmaking. The third attraction of pragmatism is that forecasts can only be verified at a future time, and thus are liable to the danger of circular reasoning, making the concocted explanations unfalsifiable. These characteristics are briefly considered in the following three paragraphs. People often think they know more than they actually do—this knowledge gap is called epistemic arrogance. Consequently, people who know little about forecasting are tempted to think in the extremes: either that forecasts are not worthy of pursuing since there is no certainty in them, or that science is mature enough for all kinds of forecasting. Arguments employed by people in the first type of extreme often come from a perspective guided by memorable disappointments from the past. Upon repeated conditioning by stories on how forecasts have failed—“The weather man is wrong again,” “economists fail to forecast the recession,” “the sports analyst’s predicted score is far off from the actual one”—one may lose faith in activities of similar kind and eventually become a skeptic. What the skeptics fail to notice, however, is how far weather forecasting has come in the past century, how complex and unpredictable economics is, and how the actual score was affected by an early injury of a star player. Skeptics tend to look for instances that do not meet their expectations (or confirm their theories that forecasting does not work), yet ignore those that do. This could, in turn, lead to what is more generally known as a confirmation bias. Equally hazardous as skepticism on forecasting is to place a blind trust in it, which describes the second type of extreme. The reason for this danger is much simpler— if stock market forecast accuracy was exact, we would all be millionaires. In both extremes of forecasting—skeptics and worshippers—what needs to be factored in is predictability—the degree to which a good forecast can be made. “Good,” as has been debated by philosophers for millenniums, is an abstract notion. The goodness of a forecast, with no exception, is subject to interpretation. To assign any meaning to it, one must put it in context. One may interpret a forecast to be good if it corresponds to the forecaster’s best judgment. How to issue a forecast truthfully and how to detect an untruthful forecast have been the topics under heated debate among forecasters. In other circumstances, one may interpret a forecast to be good if it performs better than its alternatives under some specific scoring rules. This is known as forecast verification, in which forecasts generated by different forecasters are compared with respect to the matching observations. Having established xvii

xviii

Preface

these interpretations, there is still something missing, that is, a forecast by itself possesses no intrinsic value, but can only acquire value through its ability to influence the decision made by its user. It is rational to believe a truthful forecast with high performance is more likely to result in higher value, the relationship among truthfulness, quality, and value is, however, almost surely nonlinear and may not even be monotonic. In any case, it leads to the conclusion that forecasting is only performed if it can be appreciated by the potential user of the forecasts. The last characteristic of forecasting is tied to the philosophy of science, insofar as all scientific methods require induction from specific facts to general laws. Human beings are hardwired to learn specifics when in fact they should be concerned with generalities. As argued by Bertrand Russell, who is perhaps the greatest philosopher of the 20th century and the greatest logician since Aristotle, all inductive arguments in the last resort reduce to the form: “if this is true, that is true; now that is true, therefore this is true.” This is a bare display of a circular reasoning fallacy. However, this seemingly foolish form of argument would not be fundamentally different from those upon which all scientific laws are based. In forecasting, we often argue that since some specific facts obey a certain law, therefore other facts of a similar sort should obey the same law. In other words, we fit data to derive a forecasting model and expect that model can be used to explain the future. We may verify the law in a region of the future by waiting for that to happen, but the practical importance of forecasting is always in regard to those regions of the future where the law has not yet been verified. Hence, the doubt as to the validity of induction is an inherent limitation of not just forecasting, but scientific methods in general. Be that as it may, because this limitation affects the whole of scientific knowledge, one has no choice but to assume pragmatically that the inductive procedure, with appropriate caveats, is admissible. Given that solar forecasting is forecasting, it therefore possesses the same three characteristics as mentioned above. It follows that to understand solar forecasting, as a subject, one must make inquiries in terms of predictability, goodness, and falsifiability. Unfortunately, in the present academic climate, any critical proposition as to these topics might be hijacked by intellectually dishonest champions of obscurantism, whose sole interest is to survive under the “publish or perish” regime. Researchers in that category tend to focus more on narratives, eye-catching titles, and attention-grabbing punch lines over more substantive matters. Examples are those non-reproducible hybrid machine-learning solar forecasting papers, whose contribution to forecasting science has hitherto been no more than mere statements of “who did what” in the literature review sections of some other irresponsible papers of the same ilk. Ironically, those researchers are also the ones who have the lowest tolerance for criticism. All this is depressing, but gloom is a useless emotion. The only method to remedy this strangely curious state of affairs is to raise among young scientists the awareness of principles of forecasting, with absolute clarity, and without censoring criticisms against the suboptimal practices in the literature. In the following three paragraphs, a brief introduction to predictability, goodness of forecasts, and falsifiability is presented, and these ideas are revisited repeatedly throughout the book.

Preface

xix

Predictability. The predictability of solar irradiance or solar power is known a priori as a function of both space and time. Some view space through the lens of the K¨oppen–Geiger climate classification or simply geographic location, and think of time with reference to forecast horizon, sky condition, day of year, or time of day. It is then obvious that predictability has no meaning without specifying the forecasting situation under which it is quantified. This is an utmost challenging task that only a handful of forecasters so far have attempted to address. Moreover, most proposals on quantifying predictability are restricted to narrow setups in reliance on statistics relevant to variability, such as assuming a white noise series has no predictability and a Fourier-constructed series has full predictability. It is thought that a purely statistical treatment of solar irradiance or power defies reality, since solar irradiance is by nature a physical process. In this regard, a more sensible way to advance with predictability is to scrutinize how past works on solar forecasting have performed. For a particular setup, such as “day-ahead irradiance forecasting with hourly resolution in an arid location during clear-sky conditions,” certain forecasts delivered by authoritative solar forecasters in the past might be able to set a practical expectation to the minds of others who are about to issue forecasts under the same setup. It then seems logically attractive to know who used what method on which setup can be regarded as authoritative. This book will deliver, on this account, a literature review. Goodness. Consistency between judgment and forecast is a precondition for that forecast to be of use. However, that is not a sufficient condition, because the judgment of a forecaster can be deficient. The quality of a forecast is quantified through verification. As evidenced by the diverse verification procedures found in the literature, a consensus has hitherto been lacking, though reasonable recommendations have been put forth very recently. Value, in a solar forecasting context, is acquired from two groups of forecast users, one public and one private, depending on whether or not the solar power plant is tied to the grid. In both cases, very little—if that can be quantified at all—has been invested in order to connect forecast to value. To that end, this book seeks to bridge the aforementioned gaps by addressing the following three topics: (1) best practices for generating and issuing forecasts, (2) formal forecast verification, and (3) applications of solar forecasts. Falsifiability. Also known as refutability, falsifiability is the capacity for a theory to be contradictory by evidence. This at first seems rhetorical, as we are taught in schools that science is about adventuring into the unknown, so all new scientific ideas must be created under the condition of failure being a probable outcome. That might have been true a century ago. At present, if one inspects the literature carefully, it would not be surprising to arrive at the conclusion that all too many scientific publications only serve to confirm preconceived true beliefs. In solar forecasting, people have shown hybrid models outperforming stand-alone ones, physics-based approaches being more advantageous than purely statistical ones, or spatio-temporal methods dominating univariate ones. However, provided the premises of these experiments are proper, the probability of having the converse being true is negligible. In other words, it would be nearly impossible to find situations where standalone, purely statistical, or univariate methods outperform hybrid, physics-based, or

xx

Preface

spatio-temporal ones. Under such scenarios, despite that one can verify their theory to be true—the theory is no doubt verifiable—it is nevertheless unfalsifiable. Proving common true beliefs is unnecessary, and beginning with what one tries to end with is circular. In this book, widely known facts about solar forecasting are enumerated as thoroughly as possible as is within our capacity, with the hope that future research on this subject can truly expand the boundary of forecasting science. Dazhi Yang Lunar New Year, 2021 Singapore

Acknowledgments We warmly thank the following friends for their invaluable discussions over the years, which helped shape our understanding of this subject. Jamie M. Bright UK Power Networks London, UK

Marc J. Perez Clean Power Research Napa, CA, USA

Carlos F. M. Coimbra University of California, San Diego San Diego, CA, USA

Richard Perez University at Albany, SUNY Albany, NY, USA

Amelie Driemel Alfred Wegener Institute Bremerhaven, Germany

Ian Marius Peters Forschungszentrum J¨ulich GmbH Erlangen, Germany

Tilmann Gneiting Karlsruhe Institute of Technology Karlsruhe, Germany

Pierre Pinson Imperial College London London, UK

Christian A. Gueymard Solar Consulting Services Colebrook, NH, USA

Gordon Reikard U.S. Cellular Chicago, IL, USA

Tao Hong University of North Carolina, Charlotte Charlotte, NC, USA

Frank Vignola University of Oregon Eugene, OR, USA

Rob J. Hyndman Monash University Clayton, VIC, Australia

Rafał Weron Wrocław University of Science and Technology Wrocław, Poland

Martin J´anos Mayer Budapest University of Technology and Economics Budapest, Hungary

Xiang’ao Xia Chinese Academy of Sciences Beijing, China

Dennis van der Meer Uppsala University Uppsala, Sweden

Gokhan Mert Yagli National University of Singapore Singapore

xxi

We Do Solar 1 Why Forecasting “The essential novelty about scientific technique is the utilization of natural forces in ways that are not obvious to untrained observation, but have been discovered by deliberate research.” — Bertrand Russell

One of the most formidable problems of modern times is that brought by burning fossil fuels. Industry consumes at an alarming rate substances that have been formed over millions of years from the burial of photosynthetic organisms and held in the earth’s crust for geological time, and which are not being replaced at any sustainable rate and in any usable form. Moreover, the growth of photosynthetic organisms removes carbon dioxide from the atmosphere and the ocean, but their burial ceases the movement of that carbon through the natural carbon cycle. Burning fossil fuels releases that carbon at a pace that is several orders of magnitude quicker than when it was buried and much faster than the current carbon cycle can remove it. The accumulation of greenhouse gases has hitherto been widely regarded as the primary cause of the recent uptrend in earth’s average temperature, which is believed to have led to weather anomalies that are more severe and more frequent than there were during the pre-industrial era. This crisis, which impacts mankind as a whole, is called climate change. The pursuit of carbon neutrality, as a climate change mitigation goal, has dramatically intensified. Carbon neutrality, prior to the 2010s, was concerned only by a minority of academics, who themselves confined its discussion to a minority of the questions upon which they had opinions. Since the Paris Agreement on climate change, which is the first agreement to use the concept of carbon neutrality on a global scale, carbon neutrality has rapidly entered the sphere of concern of the general public. Carbon neutrality is said to be reached if the balance between carbon emissions and the absorption of carbon from the atmosphere by carbon sinks is achieved. Stated differently, no more greenhouse gases should be emitted than absorbed. Despite that debates in regard to how carbon emission and absorption should be measured and calculated are ongoing and are likely to persist into the foreseeable future, the definition of carbon neutrality itself naturally and unequivocally divides the relevant scientific methods into two categories: One seeks to reduce emission and the other dilate capture. Insofar as emission reduction is concerned, it can only be achieved with radical and rapid changes to the forms of the world’s energy generation and consumption. The background problem with which this book is concerned is the grid integration of renewable energy, although the discussions herein also apply to many other scenarios, such as off-grid energy systems, space and water heating and cooling, DOI: 10.1201/9781003203971-1

1

2

Solar Irradiance and Photovoltaic Power Forecasting

or transportation. Grid integration of renewable energy entails reimagining planning and operation for a stable, reliable, and cost-effective power system predominated by clean energy generators. Electricity generation from renewables has burgeoned over the past decade, yet it continues to function only on the periphery of conventional generation from fossil fuels in the global energy mix. The continuous growth of the renewable energy industries depends necessarily upon whether they can eventually, or at least be believed to be able to, compete against conventional generation and provide high-value energy on demand. Indeed, the characteristic difference between the two most abundant renewable energy sources, that is, wind and solar, and fossil fuels, is that the power generated by the former is intermittent and variable. To that end, technologies of various sorts have been conceptualized, invested, and cultivated, in attempts to tame the intermittency and variability of renewable generation. Regrettably, as of the present moment, no solitary technology is capable of providing a wholly satisfactory solution. Energy storage technologies, for instance, are able to modify the demand and supply profiles on different spatial and temporal scales, enabling the peaks and valleys in these profiles to be better matched, but cost reduction from its present level is imperative. Similarly, deliberate expansion of installed capacity (also known as overbuilding or implicit storage), which endeavors to elevate and diversify the supply profiles to an exaggeratedly higher level well beyond the demand profile, so that the effect of any local intermittency and variability does not affect serving the demand as a whole, is also beset by economic challenges. The third sort of technology seeks to address the intermittency and variability indirectly, in that, demand-side management and load shaping remold the electricity consumption pattern to better match the generation, which is nevertheless confronted by strains in persuading and maintaining the consumers’ long-term commitment to a lifestyle that differs from what they are accustomed to. Because the costs of these technologies are colossal, any means to reduce them must be deemed welcoming. If someone searches on the internet for the prices of lithium-ion battery or photovoltaic (PV) systems, some stimulating results would, before long, be found in full concert with our a priori supposition that the prices are declining—the PV industry, for instance, has seen a five-fold markdown of world’s average installation cost from 4808 $/kW in 2010 to 857 $/kW in 2021. It is based on such statistics that many have come to the conclusion that PV has reached grid parity in many parts of the world. Similarly, lithium-ion batteries, which take a leading position in the maturity ranking of electric storage technologies, have seen a six-fold price drop, reaching 137 $/kWh in 2020, which appears low enough as compared to 917 $/kWh in 2011. This price trend appears to be equally if not more encouraging than that of PV. Be that as it may, the very fact that battery, PV, alongside other renewable technologies have not replaced conventional generation would render incomplete any argument put forth by optimists who believe renewables are now entirely ready to take a leading role in supplying electricity for our daily life. There exist both intrinsic and extrinsic factors that influence the true costs of renewables and related technologies. Intrinsic factors pertain to those associated with the technologies themselves, such as hardware, materials, or the manufacturing

Why We Do Solar Forecasting

3

process, which aggregate into what is commonly referred to as the capital expenditure (CapEx) component of any investment in renewables. In contrast, extrinsic factors are those related to the utilization of the technologies, such as software, operation, or control, which correspond to the operating expenditure (OpEx) component of the investment. It is important to note that the OpEx of a renewable energy system extends beyond routine maintenance and repairs; instead, it encompasses all possible expenses that could be incurred by anyone associated with the system, including the plant owner, the grid operator, and electricity consumers. For example, if the power from a cluster of uncontrolled distributed PV systems surges unexpectedly during a low-demand period and, in turn, causes reverse power flow and over-voltage, additional electrical components, such as distribution static compensator (DSTATCOM; Yan et al., 2014) or battery storage (Liu et al., 2012), would be required to assist the on-load tap changer (OLTC) to mitigate such voltage fluctuations. Such technologies further imply DSTATCOM–OLTC coordination and scalability issues, or additional cost for storage (Li et al., 2018), which must be solved and borne by grid operators.1 In another scenario, the rapid construction and expansion of wind and solar energy systems very often outpaces the infrastructural upgrades of the grid, as such, transmission congestion is a common reason for renewable energy curtailment (Gu and Xie, 2014), which not only leads to energy waste but also compels grid operators to utilize generators with higher marginal costs (Bird et al., 2016). Once these “hidden” costs are accounted for, renewables at present can hardly be regarded as substitutive to conventional fossil fuels. The lowering of the CapEx of renewable energy technologies is usually accompanied by declines in raw material prices, growth in market size and innovation activities, or advancement in technical proficiency of the technologies. These changes are fated to manifest in the future, and it is plausible that a day may arrive when the prices of renewable energy technologies become so low that they no longer instigate financial apprehension when implementing them. But can we bide our time waiting for this day to dawn? In light of the exigent necessity of attaining carbon neutrality by mid-century, just betting on a reduction in CapEx in the near future does not seem to be a prudent course of action. To rephrase the question in a more technical fashion: Can we achieve in another 30 years on-demand renewable power at a price that falls below grid parity? This ideal evokes the witty remark that unlimited nuclear fusion energy is always 30 years away. Since true grid parity, nuclear fusion, and other disruptive energy solutions depend for their success upon technical breakthroughs that do not come at regular time intervals, investors should diversify their investments, and policymakers and innovators should diversify their programs and research. Fortunately, judging from the multitude of serviceable policies devised and research priorities placed on lowering the OpEx, it is evident that many share this outlook. Central to reducing the OpEx is the interplay of various strategies and techniques that mitigate the adverse effects and improve operational efficiency when 1 It is noted that modern grid codes allow and even require solar PV systems to resolve these voltage issues through reactive power support and/or curtailment, which would reduce or eliminate extrinsic costs.

4

Solar Irradiance and Photovoltaic Power Forecasting

intermittent and variable renewables are integrated into the power grid. The backbone of all strategies and techniques of that sort is information, which ought to be chiefly characterized by visibility and timeliness. The former is realized through internet of things (IoT) technology, which monitors the power system with sensors (such as smart meters or synchrophasors), processing ability, software, and other means that connect and exchange data within the system over internet-based communication platforms. The latter is, in the main, ensured by performing forecasting over a variety of spatial clusters and time horizons. Recalling the two aforementioned examples on regulating voltage fluctuations in distribution grids and minimizing curtailment of renewable generation, forecasting has been found to be value-adding in both cases (Li et al., 2018; Bird et al., 2016). In fact, it would be entirely reasonable to assert that forecasting represents the quintessential initial stage in any electricity system decision-making processes that have a lead time requirement—for instance, some large generators are too costly to be turned on or off frequently, necessitating the determination of their operational state at least one day in advance, which is referred to as unit commitment. Forecasting in a power system setting may be regarded interchangeably with energy forecasting, in which there are four main subjects of interest: electric load demand, electricity price, and wind and solar power generation (Hong et al., 2020). There are several other forecasting tasks that are relevant to power system operations and control, such as oil and gas forecasting or outage forecasting, but those are beyond the purview of this book. The ultimate goal of a power system is to ensure the perpetual balancing between generation and load. The complication of this task is illustrated through Fig. 1.1, in which the generation from different energy sources and the load demand over June, 2022, across Germany, are plotted in the upper panel, and the wholesale electricity price at day-ahead auction in the lower panel; the intricate interplay between price, generation, and load is clearly manifested in the plot. In the day-ahead market (DAM), the matching between the forecast/scheduled generation and the forecast load has to be conducted before the operating day, subject to transmission, price, and generator constraints. Since the total generation is composed of not just the dispatchable fossil-fuel-based generation from coal, oil, and natural gas, but also various forms of clean and (variable) renewable generation including solar, wind, nuclear, hydro, geothermal, biomass (the latter four are grouped under “others” in the figure), the optimization as to answering which unit generates how much over which hour is by no stretch simple. Because forecasts are always wrong—we shall circle back to this idea numerous times throughout the book—the discrepancies between what was planned and what is actually happening have to be resolved in the intra-day market (IDM), and then if that does not solve the problem completely, which is almost surely the case, the remaining discrepancies are resolved in the real-time market (RTM). For historical reasons, such as the deregulation and introduction of competitive markets in the early 1990s or the aggressive growth in solar penetration since the 2010s, the four energy forecasting domains are of different levels of maturity (Hong et al., 2016). Knowing solar power forecasting has the shortest history, it is understandable that those involved in this domain may seek the opinions of experts of,

5

Why We Do Solar Forecasting

Generation

Solar

Wind

Fossil

Others

Power [MW]

60000

40000

20000

Price [€/MWh]

0 400 200 0

Jun 06

Jun 13

Jun 20

Jun 27

Date

Figure 1.1 One month (June, 2022) of power generation from different sources (shaded areas), load demand (solid line), and day-ahead auction prices in Germany.

and draw upon methodologies from, the more mature domains of load, price, and wind forecasting during its initial stages of development. That said, although the approaches to forecast load, price, wind, and solar often share certain resemblances, for instance, they can all benefit from considering forecasts of weather variables, the sources of variability in solar power are profoundly different from the rest. More specifically, the three most influential weather variables on PV generation are, in order of importance, global horizontal irradiance (GHI), ambient temperature, and wind speed. Among these three variables, the latter two can already be forecast to a fairly satisfactory degree of accuracy; however, GHI remains an intractable weather variable to be forecast, for its variation, aside from the diurnal and yearly cycles, is due in large part to cloud dynamics, which is one of the few challenges that still confront meteorologists today (Bauer et al., 2015). To that end, whether solar power forecasts can reach the quality that is needed for grid integration depends necessarily on our ability to forecast irradiance. It should be noted that there is another form of solar power aside from PV that can be used to provide electricity to the grid, namely, concentrating solar power (CSP). The generation of power through CSP is contingent upon the availability of beam normal irradiance (BNI), which, unlike GHI, falls to near-zero values when thick clouds obstruct the sun—BNI is generally perceived as more difficult to forecast than GHI. In compensation, however, CSP stores solar energy in the form of heat and therefore

6

Solar Irradiance and Photovoltaic Power Forecasting

has thermal inertia that makes its output less susceptible to sudden changes in irradiance. Because the grid penetration level of CSP is negligible to this day, and the irradiance-to-power conversion mechanisms of CSP and PV share no commonality, we should confine the content of this book to irradiance forecasting and PV power forecasting; in the landmark review by Yang et al. (2018a), the two topics are jointly defined to be solar forecasting.

1.1

FIVE ASPECTS OF SOLAR FORECASTING

The phrase “research frontier” has often been used to exaggerate the importance of those trending topics in a branch of learning. In most scientific endeavors, the trending stops when one of the following two things occurs. First, it may cease upon the advent of a disruptive technology, which can significantly alter the way in which the present research ecosystem is sustained. For example, if a new energy storage technology that costs 1 $/kWh is invented, the necessity for solar forecasting research would considerably decrease, if not entirely dissipate. Alternatively, it may come to saturation after all easily attainable goals have been accomplished, and any further progression towards success can only be achieved with great difficulty, such that those who are less capable or impatient for swift outcomes can no longer reap benefits from participating. The latter is precisely what has transpired and is presently unfolding in the field of solar forecasting. Throughout the history of solar forecasting, the research frontier, with no surprise, has shifted multiple times. From the first revolutionary employment of sky cameras (Chow et al., 2011) to the grafting of machine learning and solar forecasting (Pedro and Coimbra, 2012), to the recent hype of deep learning (Gensler et al., 2016), each time a new research frontier is put under the spotlight and starts to attract those ordinary men in the field, it is almost surely followed by an explosion of publications. The solar forecasting literature has now been populated by an eclectic mix of methods, which all seek to establish how the future irradiance (or PV power in many cases) is linked to the current observations, through examining such relationships from historical data of various kinds. Irrespective of how complex a method is, may it be a hybrid or cascade use of several sophisticated data-driven algorithms or a carefully selected suite of physics packages (i.e., parameterization schemes) that power a numerical weather prediction (NWP) model, inasmuch as the method derives forecasts from observations alone, we should refer to that it as a base method. A sizable proportion of literature on solar forecasting to date has centered on proposing base methods. As the number of unexplored base methods gradually diminishes, the novelty of such approaches becomes harder to justify. As simpler methods became exhausted, researchers turned to hybrid methods, and when this strategy was no longer perceived as novel, greater procedural or structural complexity was introduced. Some have joked about the necessary inclusion of unnecessary multiplication of complexity that has crept into the current publishing regime: 1 + 1 = 2 is not very impressive, but     ∞   2 cosh(y) 1 − tanh2 y 1 z 2 + sin x + cos x = ∑ ln lim 1 + z→∞ z 2n n=0

Why We Do Solar Forecasting

7

sure is.2 (In Chapter 2, we should discuss the philosophy behind such “smoke grenades,” and how can we counter those.) True experts in the domain of solar forecasting espouse the conviction that the field has transcended the mere framing of borrowed methods and algorithms onto the problem of concern, regardless of their level of intricacy or scientific veneer (Hong et al., 2020). For that reason, it is necessary to inquire what exists beyond base methods, and to determine the requisites for the progression of solar forecasting research. The development of new methods to produce base forecasts, while important, is not the sole concern of solar forecasting. Rather, the true value of forecasting lies in its eventual adoption during decision-making processes. From the earliest times, proposals of base methods have made greater claims, and exerted smaller impacts, than any other aspect of solar forecasting. To that end, this section should provide a high-level overview of solar forecasting. The discussion should expand from the mind map shown in Fig. 1.2, which depicts the five technical aspects of solar forecasting. 1.1.1

BASE METHODS

Perhaps it is due to the seminal review of Inman et al. (2013) that the five-category classification of base methods has been widely recognized hitherto. These categories, which include sky camera, satellite, NWP, statistics, and machine learning, are not entirely unrelated, as one can use the outputs of sky cameras, satellites, and NWP models as exogenous inputs for statistical or machine-learning-based forecasting models, through typical procedures as documented in the work of Pedro et al. (2019). One should note that the output variables of NWP, which include irradiance, are themselves forecasts. Hence, according to our earlier definition of base forecasts, it is often more appropriate to view any irradiance forecast produced by methods that involve the NWP output as a post-processed forecast, of which the concept is elaborated in Section 1.1.2. Owing to the differing spatial resolutions and time scales at which camera, satellite, and NWP operate, the data and information collected by these means were historically matched to the forecast horizon. In particular, sky cameras were deemed suitable for intra-hour horizons, satellites for intra-day horizons, and NWP for dayahead horizons (Inman et al., 2013). After a decade, this view has been formally rejected by Yang et al. (2022b), who argued that, with the technological advances in remote sensing and meteorology, both satellite- and NWP-based solar forecasting can now yield satisfactory results on shorter horizons than the ones they were once seen fit. To enumerate a few examples, one should be informed that hourly updated NWP (e.g., Benjamin et al., 2004), which revises day-ahead forecasts every hour and thus is apt for intra-day forecasting, and 5-min satellite-derived irradiance data (e.g., Yang, 2021b), which is well suited for intra-hour forecasting, are both in the public domain. The core premise of statistical and machine-learning-based forecasting, or datadriven forecasting in short, is that the future will in some way resemble the past 2 If

you look closely, the second expression is no different from the first one.

8

Solar Irradiance and Photovoltaic Power Forecasting

statistics

NWP

machine learning satellite

base forecasting methods

deterministic

probabilistic

camera

irradianceto-power conversion

verification

solar forecasting BNI to CSP power conversion

GHI to PV power conversion

load shaping

deterministic to deterministic

grid integration

post-processing deterministic to probabilistic

energy storage geographical smoothing

overbuilding & curtailing

probabilistic to probabilistic

probabilistic to deterministic

Figure 1.2 Five aspects of solar forecasting research.

in terms of patterns or distribution (Hong et al., 2020). As such, the ability to discover useful patterns or distributions from historical data is key to making good data-driven forecasts. As both statistics and machine learning predate solar forecasting by many decades, these methods are undeniably beneficial to the forming of the present ecosystem of solar forecasting. The basic principles of machine learning can be easily understood, and its implementation is facilitated through the myriads of open-source software packages and libraries, which both explain the popularity of this category of base methods. In recent years, the field of machine learning has experienced another wave of development that is owed largely to the advent of

Why We Do Solar Forecasting

9

powerful computing technologies. Numerous stimulating works based on deep learning, reinforcement learning, and transfer learning, have thence appeared in solar forecasting. However, while the “black-box” nature of machine learning is often cited as a limitation, a more insidious problem lies in the misguided belief that the omnipotent and omniscient algorithms can automatically discover all relevant information pertaining to issuing good forecasts. Although most solar forecasters may be well aware of this pitfall, it is another commonplace misbelief that some new input variables, a few tweaks to the algorithm, some additional optimization constraints, or anything of the same ilk can answer the decade-long unsolved questions in weather dynamics, that is more dangerous. We are by no means discouraging the applications of machine learning in solar forecasting, as truly thought-provoking demonstrations of machine-learning-based weather forecasting outperforming traditional NWP are made every couple of weeks or so (e.g., Bi et al., 2023; Nguyen et al., 2023; Espeholt et al., 2022; Lam et al., 2022).3 What we wish to advocate is to have a solid understanding of the domain knowledge as a prerequisite for algorithmic and procedural innovations. 1.1.2

POST-PROCESSING

In light of the aforementioned points, physics-based solar forecasting must be endorsed. However, much empirical evidence has suggested that the output of base methods using sky cameras, satellites, and NWP, is frequently biased or inaccurately distributed, stemming from measurement uncertainties as well as the lack of precision and granularity in modeling. Such drawbacks are not unique to solar irradiance but general to all weather variables. Hence, post-processing, which seeks to correct the bias or remold the distribution so that they resemble the observed reality more closely, is a universally acknowledged practice of forecasting in a physical context. In contrast, post-processing is less emphasized elsewhere, for example, during statistical forecasting in a social context, for most rigorous statistical forecasting models are already mathematically proved to be unbiased and thus need not be tempered further. Besides correcting and adjusting the forecasts, post-processing is needed in a broad range of other contexts. In compiling those we should first make clear two concepts: deterministic forecasts and probabilistic forecasts. In short, a deterministic forecast is simply a number, which usually corresponds to the best-possible estimate of the future event or the future state of the forecast quantity. A probabilistic forecast, on the other hand, can be a collection of several deterministic forecasts (i.e., ensemble), a set of quantiles, a prediction interval, or in the best case, a predictive distribution, which fully characterizes the probable outcomes or the probability law that governs the outcome of the future event or the future state of the forecast quantity. The discussion that we wish to make in regard to deterministic and probabilistic forecasts goes beyond these simple definitions, and therefore is arranged separately in Chapter 3. But now, we are ready to collect all post-processing techniques 3 In all cases mentioned, the machine-learning-based weather forecasting models are trained with traditional NWP results, underscoring the continued importance of NWP in the field of forecasting.

10

Solar Irradiance and Photovoltaic Power Forecasting

under ten heads, in a mutually exclusive and collectively exhaustive fashion. In each instance, we shall illustrate one scenario in which it can be of service. Given some initial deterministic forecasts, one may take three types of actions: (1) Regression: Regression establishes a relationship between initial forecasts and observations. This is useful in removing the bias in the initial forecasts or making the initial forecasts statistically consistent with the verification measure. (2) Filtering: Statistical filtering is a process in which signals or system states are estimated from noisy measurements. One of the purposes that filtering serves is stabilizing spiky forecasts, which may often result from camera-based forecasting under broken-cloud conditions. (3) Resolution change: Initial forecasts can be downscaled or averaged, such that their new temporal resolution agrees with the rest of the planning and operation procedure. Resolution change is needed because some power systems, such as the Chinese and Hungarian grids, require solar power forecasts at a 15min resolution to conduct day-ahead and intra-day dispatching, whereas NWP forecasts are generally provided only at an hourly resolution. Since all three types of actions above post-process the initial deterministic forecasts into another set of deterministic forecasts, they are called deterministic-todeterministic (D2D) post-processing (Yang and van der Meer, 2021). In other circumstances where the initial forecasts are probabilistic, one has two choices to transform them into a set of deterministic forecasts, depending on the form of the initial probabilistic forecasts; these actions fall under probabilistic-to-deterministic (P2D) post-processing. They are: (4) Summarizing predictive distribution: This class of post-processing techniques is concerned with optimally eliciting deterministic forecasts from predictive distributions. It is particularly rewarding in cases where forecasts submitted to grid operators are to be penalized according to a predefined scoring rule. (5) Combining forecasts: Forecast combination seeks to issue weights to the members of an ensemble of deterministic forecasts such that the combined forecasts enjoy the wisdom of the crowd. Combining forecasts has been found beneficial both when the members of an ensemble are equally likely (i.e., equal weight assignment) and when some members are superior to others (i.e., unequal weight assignment). A probabilistic forecast carries information on the uncertainty associated with that forecast. In some situations, when an initial forecast is deterministic, the forecast user may want to quantify its uncertainty. There are three classes of techniques available for deterministic-to-probabilistic (D2P) post-processing: (6) Analog ensemble: Analog ensemble identifies historical data patterns that resemble the current one, and whatever happened in the past may be regarded as probable realizations of the future. The analog ensemble is attractive when

Why We Do Solar Forecasting

11

the dynamical weather ensemble, which generates forecasts using the same weather model but with slightly perturbed initial conditions, is unavailable or inconvenient to use. (7) Method of dressing: Method of dressing decorates the forecast issued for a certain known condition (e.g., at a particular time of day and/or under a particular solar zenith angle) with past errors observed for the same condition. This approach is appealing when the error behavior of a forecasting model is consistent. (8) (Probabilistic) regression: Advanced regressions, such as generalized additive model or quantile regression, often allow uncertainty quantification. These regressions are advantageous when the errors are believed to be nonhomogeneous, e.g., errors under clear-sky conditions have variances smaller than those under cloudy-sky conditions. Last but not least, when the initial forecasts are probabilistic, one may need to retain them as probabilistic but with better quality. The kind of post-processing that warrants such actions is called probabilistic-to-probabilistic (P2P) post-processing, which can be separated into two types: (9) Calibrating ensemble forecasts: Calibration improves the resemblance between distributions of observations and forecasts. Owing to the similar modeling philosophy or insufficient perturbation of initial conditions, ensemble solar forecasts are almost always under-dispersed, and have to be calibrated for meaningful uncertainty quantification. (10) Combining probabilistic forecasts: Combining as a strategy also applies when consolidating two or more predictive distributions or sets of ensemble forecasts into a final probabilistic forecast. Careful readers may have noticed that we did not mention any specific technique of post-processing as how they would be enumerated in a typical review article. This is because post-processing should be goal-oriented rather than technique-oriented— if the goal is to turn a set of initial forecasts into another set with such and such properties, one should try to first summarize the nature of the problem, and then discuss the class of techniques that is suitable for the problem. Stated in another way, it is crucial to first understand the nature of the problem and then explore the class of techniques that is best suited for solving the problem rather than starting with a specific technique. On this point, all ten classes of post-processing techniques are fully elaborated in Chapter 8, alongside the problems they aim to address. 1.1.3

VERIFICATION

Much of the virtue of the science of forecasting is connected with verification. A forecast is unable to be appreciated without characterization and quantification of its goodness, and forecast verification aims exactly at doing so. It is likely for someone

12

Solar Irradiance and Photovoltaic Power Forecasting

new to forecasting to equate verification with error calculation, for a vast majority of forecasting works indeed contain such a component. Following such a train of thought, one may soon derive the saying that forecasts that can acquire a higher score are better. But this statement can be easily refuted by the concept of hedging, which has been fairly well known by meteorologists for over 50 years (Murphy and Epstein, 1967). Stated informally, hedging can be understood as “gaming the score,” in that, the forecaster may deliberately submit an inaccurate forecast yet receive a better score—Murphy (1973a) gave one such example and Pinson and Tastu (2014) provided another. Besides hedging, there are many other contexts in which the idea of “higher score means better forecasts” fails (e.g., Mayer and Yang, 2023a; Yang and Kleissl, 2023). The underlying statistical theory connecting these counterintuitive yet not uncommon phenomena is one of consistency and elicitability, which are abstract notions of correspondence between forecasts and forecaster’s judgment. Although consistency and elicitability may be unfamiliar to many, they should become increasingly comprehensible as we proceed to Chapters 8 and 9. Verification of deterministic forecasts and verification of probabilistic forecasts are known a priori distinct. One shared dimension, nevertheless, is that they both devise procedures to assess different aspects of quality through quantitative means. Some error metrics, such as mean absolute error (MAE) or root mean square error (RMSE), have hitherto been very popular in deterministic solar forecast verification. Whereas both MAE and RMSE are measures of accuracy, there exist many other aspects of forecast quality. Mean bias error (MBE) quantifies the unconditional bias in forecast; the correlation coefficient quantifies the association between forecast and observation; the expected difference between forecast and the conditional expectation of observation given that forecast quantifies the calibration; the expected difference between the conditional and unconditional expectations of forecast quantifies the discrimination; and skill score quantifies the forecast skill. These example aspects of quality may or may not sound familiar, but they must be collectively assessed in practice, for focusing on only a few of them could lead to prejudiced judgment. On the other hand, verification of probabilistic forecasts is less tedious—as long as proper scores are used, things usually would not go terribly wrong. Two aspects of quality are always sought during probabilistic forecast verification, one is calibration, and the other sharpness. As famously noted by Gneiting et al. (2007), the verification of probabilistic forecasts is best based upon the maximum-sharpness criterion, while maintaining calibration. On this point, some scoring rules such as the continuous ranked probability score or the pinball loss, which assess simultaneously the calibration and sharpness, are ubiquitously used to date in solar forecasting. One underrepresented area of forecast verification is visualization. This is due in large part to the excessive reliance and trust we place in the use of quantifiable criteria for quality assessment. A very unpleasant consequence of overlooking the importance of visualization is that a few large errors may render all forecasts under verification unattractive. For instance, a forecast irradiance of 10 W/m2 and an observation of 1 W/m2 gives a 900% percentage error, but such a forecast would not be intolerable to the actual operation of the power grid. Visualization constitutes a vital supplement to quantitative methods. For deterministic forecasts, the joint

Why We Do Solar Forecasting

13

distribution of forecast and observation contains all information relevant to verification, and it can be decomposed into conditional and marginal distributions, which contain, in themselves, revealing facts that facilitate the understanding of the forecasts. Joint distributions can be visualized in the form of scatter plots, contour plots, or heat maps; conditional distributions are suitable to be represented in ridge plots; and marginal distribution may be depicted as histograms or simply the probability density functions. As for probabilistic forecasts, their calibration can be visualized through the probability integral transform (PIT) histogram or the reliability diagram, whereas their sharpness can be depicted in the form of a sharpness diagram. On top of consistency and quality, a third type of goodness of forecasts is related to the benefits realized or expenses incurred after using the forecasts, i.e., value. As all right-thinking forecasters strive to generate high-quality forecasts, and all forecasts seek to create value, therefore one should be tempted to conclude that forecasts of higher quality should possess higher value. This is nevertheless not always the case in reality. Various descriptive and prescriptive methods are available to estimate forecast value. In a solar grid integration setting, forecasts bring value to two groups of forecast users. The first group of users are PV plant owners, who assign value to the weather forecasts through the penalty incurred after submitting the PV power forecasts to system operators. The second group of users are system operators, who may wish to gauge the value of PV power forecasts through the costs for absorbing and compensating the forecast errors through reserves. It is evident from both cases that the value of forecasts is materialized only through using the forecasts. In summary, verification is concerned with assessing the three types of goodness of forecasts: (1) consistency, (2) quality, and (3) value (Murphy, 1993). 1.1.4

IRRADIANCE-TO-POWER CONVERSION

The mapping from wind speed, along with certain auxiliary weather variables, to wind power is known to experts as the wind power curve. In a similar fashion, the mapping from GHI, in conjunction with other auxiliary weather variables, to PV power may reasonably be designated the solar power curve, though this term has yet to gain wide acceptance. Irradiance-to-power conversion is a compulsory step in the solar forecasting process, for methods that depend solely upon extrapolating PV power measurements into the future stand absolutely no chance when competing against physics-based methods. In arriving at PV power there are two approaches. One of those is the data-driven or direct approach, which establishes the required mapping through regressions. The other is the physical or indirect approach, which explicitly considers the physical principles and plant design that govern the transition through a series of energy meteorology and engineering models, in that, the approach is also known as the model chain. Irradiance-to-power conversion via regression requires necessarily a training phase, as such it is not suitable for any newly commissioned plant. Moreover, since the regression-based irradiance-to-power conversion considers just a minimal amount of domain knowledge pertaining to how photons strike the panel surface, transmit to the PV cells, and eventually turn to AC power, it is often perceived as an

14

Solar Irradiance and Photovoltaic Power Forecasting

inferior approach to the model-chain-based conversion. In comparison, conversion via model chain requires as input the plant information (e.g., location, surface type, or PV orientation) and design details (e.g., panel model, series–parallel connection information, or inverter size), therefore, it can hardly produce satisfactory results when those are unavailable. That said, irradiance-to-power conversion is a task handled by plant owners, who should possess all required plant information and design details, so the unavailability of plant information and design details should not be a concern in practice. When the inputs to the irradiance-to-power conversion models are irradiance measurements or satellite-derived irradiance, the models can be used for PV system simulation and performance evaluation—through comparing simulated and actual PV power, one obtains the performance ratio and performance index. When the inputs are irradiance forecasts, the models generate PV power forecasts. Typically, besides the time and location information, model chains require four weather variables to operate, namely, GHI, ambient temperature, wind speed, and surface albedo. Because all four variables can be acquired from standard NWP models, the applicability of the model chain for grid integration must be deemed universal. The remaining task is only one of execution, which might not be as straightforward as it is often perceived. PV panels are installed on tilted surfaces to maximize the amount of perpendicular-incident radiation throughout the year, yet NWP issues only GHI. For this reason, transposition models are needed to convert GHI to global tilted irradiance (GTI). Depending on the PV module encapsulation design and its material, GTI would be refracted and attenuated differently, of which the process is explained by a relative transmittance model. Cell temperature elevates with GTI and ambient temperature, and drops with convective heat removal by wind. Panels that are connected in series stack the voltage, whereas panels connected in parallel stack the current, and the connections, in turn, affect the final DC power output of the system. When DC power is received by an inverter, it is converted to AC power, which is then passed to a transformer to raise its voltage to that of the local grid. Each of the above processes can be modeled in numerous ways with different model complexities and accuracies; these classes of energy meteorology and engineering models have been a major focus of solar engineers over the past 70 years, so the literature is naturally bulky. To that end, the optimal selection of component models of a model chain and quantifying the uncertainty associated with it (i.e., probabilistic model chain) constitute the core of irradiance-to-power conversion in a grid integration context. 1.1.5

GRID INTEGRATION

Without a practical use case, any exploration into the realm of forecasting lacks justification, for the mere generation of forecasts is insufficient to render them valuable unless they hold the potential to influence the decision-making process. Up to this stage, we have made clear that the overarching motivation of producing solar forecasts, inasmuch as this book is concerned, is serving various grid integration needs. It is true that such motivation has often been mentioned in the introductory lines

Why We Do Solar Forecasting

15

of research papers, but it is also true that an in-depth explanation of how forecasts can be actually useful is frequently absent from those papers. One possible reason for this phenomenon, we think, is the knowledge gap between solar forecasters (i.e., meteorologists and solar engineers) and end users of forecasts (i.e., power system operators), who possess disparate educational backgrounds and grapple with distinct sets of problems (Yang et al., 2022b). In this regard, if we are to proceed with discussing the techniques of solar forecasting, the knowledge gap must be bridged. It must be clarified that grid-integrated solar power only becomes a challenge if its penetration exceeds a certain level. At low penetration levels, as it was a decade ago, the grid is able to absorb the solar power, as voltage and frequency disturbances caused by variable solar power were negligible compared to variations in the load. As the penetration increases, the stress and strain placed on various constituents of the grid, which include the transmission and distribution infrastructure, load, conventional generation, as well as the operations and control workflow, escalate quickly. The stress and strain on the power grid can be categorized based on time scale. On a time scale of up to an hour, cloud-induced variability in distributed PV power generation may cause temporary over-voltages. On a daily time scale, numerous challenges find root in the (in)famous “duck curve,” which describes the increasingly large mismatch between the demand and solar generation during the course of a day at high solar penetration; Fig. 1.3 shows a typical duck curve, which would cause an acute bimodal daily transient in the conventional generation, as also seen in Fig. 1.1.4 On long time scales ranging from weeks to years, the multi-day (e.g., due to a hurricane) and seasonal variations in solar energy call for major attention in planning storage or other forms of back-up generation for prolonged low- and no-resource periods. Of course, it is not possible to fully expand all grid-side challenges brought by solar integration and how forecasting can mitigate those. We should therefore elaborate on just two: (1) load following and regulation, and (2) power flow, of which the first requires deterministic solar forecasts and the second probabilistic. 1.1.5.1

Load Following and Regulation

Ancillary services in the power market refer to a series of services required to maintain the safe and stable operation of the power system or restore system security, as well as to ensure the supply of electric energy and meet voltage quality, frequency quality, and other requirements. These services are formulated according to the results of load forecasting and market bidding. Moreover, load forecasting, generator scheduling, and economic dispatch are to be conducted on different time horizons. In a two-settlement market structure, such as that of the California Independent System Operator (CAISO), the bidding and scheduling of generators first take place in the DAM, whereas the discrepancies between the planned and the most-updated load conditions are cleared in the RTM via intra-day unit commitment and economic dispatch. 4 Note that the duck curve is not always true of all time periods and locations, e.g., places with large cooling demand tend to have much higher synergy between load and solar generation.

16

Solar Irradiance and Photovoltaic Power Forecasting

Net load [MW]

2000 1500 1000 500 0 02:00

Year

06:00 2010 2011 2012 2013

10:00 14:00 Time of day 2014 2015 2016 2017

18:00 2018 2019 2020 2021

22:00 2022 2023 2024 2025

Figure 1.3 An illustration of a typical “duck curve.” As the penetration level rises over the years, solar power generation gradually causes larger differences in peak and mid-day net load, which needs to be met with other forms of generation.

In former days when solar penetration was modest, electric loads on different buses and nodes of a power system exhibited some regularity and therefore were quite predictable—transmission-level load forecasting could achieve a mean absolute percentage error (MAPE) of ≈ 3% (Hong and Fan, 2016). Although, due to the load forecast errors, the day-ahead schedules are to be revised in the RTM, the amount of load following and regulation can be estimated to a fairly satisfactory level of accuracy, and are usually within the flexibility constraints of conventional generators. In current power systems, day-ahead scheduling depends no longer on just load forecasts but also renewable generation forecasts, which are far more difficult to perform and are associated with much larger errors. A necessary consequence of having highly uncertain renewable generation forecasts is a set of uncertain net load forecasts, upon which the day-ahead schedules have to be based. These inaccurate day-ahead schedules of conventional generators, in turn, place much burden of the eventual supply–demand balancing onto the RTM, in which load following and regulation take place. The CAISO’s RTM operation is summarized in Fig. 1.4. The black dashed line represents the day-ahead generation schedule. CAISO’s day-ahead schedule is made for each hour of the operating day, and between two consecutive hours, there is some time (∼ 15 min) allocated for ramping, during which generators gradually shift their power output from the set point for one hour to that for the next. The day-ahead schedule needs to be revised based on the latest information, in a rolling fashion, via intra-day scheduling runs, of which the outcome is represented by the black solid line in Fig. 1.4. As it is in the case of the DAM, the selection of generators is based

Why We Do Solar Forecasting

MW Load following

17

Regulation

Intra-day scheduling revise

Time Operating hour Day-ahead schedule After economic dispatch

Intra-day schedule Actual generation

Figure 1.4 California Independent System Operator (CAISO) real-time regulation and load following. The dashed black line indicates the result of the day-ahead schedule, whereas the solid black line corresponds to the revised intra-day schedule. Based on the revised intraday schedule, unit commitment and economic dispatch are conducted, arriving at the gray dot-dashed load-following curve. Finally, any remaining discrepancy between the gray dotdashed curve and the solid gray curve (i.e., actual generation/load) is the amount of regulation needed, which is provided by flexible resources, such as spinning reserves or batteries. Figure inspired by Makarov et al. (2009).

on market participants’ bids in the RTM. Real-time unit commitment (RTUC) then decides when and which generating units provide power at each system node. CAISO performs RTUC every 15 min, and within each RTUC cycle, multiple rounds of real-time economic dispatch (RTED) are required. The goal of RTED is to adjust the power output of conventional generators through the load-following automated dispatch system (ADS), in that, the generators’ power moves towards new set points in accord with the current demand. CAISO performs RTED every 5 min. In summary, load following is an RTED-instructed deviation from the scheduled output, which is illustrated as the gray dot-dashed curve in Fig. 1.4. Because the load-following curve may still differ from the actual load—these minute to sub-minute deviations are known as regulation—the last line of defense in system balancing is formed by spinning reserves and other forms of flexible resources, which are tasked to eliminate regulation every few seconds. It is then implied that the decision on the amount of flexible resources is of utmost importance. Orthodox flexible resources are mostly in the styles of spinning reserves and hydro power, whereas more recently, batteries and other new storage technologies have emerged as alternative forms of flexible resources (Sun et al., 2023), albeit their capacity has hitherto been limited by their high cost, which makes their

18

Solar Irradiance and Photovoltaic Power Forecasting

optimal allocation all the more vital. In any case, flexible services are planned months in advance, by analyzing the behavior of errors between the scheduled and actual net load over a period such as a few years. Clearly then, the quality of solar forecasts has a profound impact on the overall cost-effectiveness of the power system. Large solar power forecast errors often imply large net load forecast errors, which subsequently result in sub-optimal unit commitment and economic dispatch, which, in turn, translate to higher demand for flexible reserves and thus higher cost. Additionally, keeping a large amount of such flexible resources not only affects the cost-effectiveness of power generation but also reduces equipment utilization, which has hitherto been viewed as a major adverse component of energy economics. For a large power system, a small improvement in solar forecast quality can often lead to a substantial amount of economic benefit (Kaur et al., 2016; Notton et al., 2018). 1.1.5.2

Power Flow

Power flow (also known as load flow) is a tool that is used for long-term planning and design of the power system and to investigate its reliability. It computes the magnitude and phase angle of the voltage at each bus as well as the real and reactive power in each transmission line, and thus describes the operating state of the power system under a certain wiring and operating mode. The wiring mode of a power system refers to the connection between each electric component over wide areas, and it rarely changes. In contrast, the operating mode of a power system changes with the power injection to the nodes. In traditional power systems, operators run power flow under just a few typical load scenarios, to acquire information in regard to whether the system can operate in the desired steady-state secure region. Through power flow, one learns about the weak links in the system and checks whether the power system components are congested. Power flow also ensures that all nodes of the power system are within the required voltage levels. Power flow is historically performed in a deterministic fashion. With the high uncertainty introduced by solar generation, deterministic power flow can no longer support the analysis of modern grid systems, for the load scenarios are now more numerous, and operating modes are much harder to enumerate. To that end, running power flow just under a few scenarios, as was traditionally the case, is no longer reliable and may eventually lead to biased assessments. Since power system operators are known to be conservative rather than risk-takers, remedies for regaining that high-level visibility on operating status must be sought. To anticipate the possible changes in system operating status brought by renewable fluctuations, and to obtain the risk probability of the power system, probabilistic solar forecasting is needed to support power flow calculations (Prusty and Jena, 2017). In other words, it is the probabilistic power flow (PPF) that is of interest to modern power systems. PPF is not a new concept as it first became available in 1974 (Borkowska, 1974), but it is becoming increasingly essential. The result of PPF not only offers the expectation but also offers a probabilistic judgment on the bus voltage and line power. To account for the uncertainty associated with solar power injection when conducting PPF, different numerical or analytical approaches are available. Numerical

Why We Do Solar Forecasting

19

approaches proceed by sampling the possible states of the stochastic variables and parameters of the system model, via Monte Carlo and Latin hypercube sampling. Knowing solar generation is subject to spatial correlation, which confounds in complex ways with the power system layout, multivariate probabilistic solar forecasting becomes relevant, in which equiprobable irradiance or PV power samples describe the multivariate distribution over space and time. On the other hand, analytical PPF requires as input the probabilistic distributions of solar and load forecast error. At present, such distributions are chosen based mainly on the operators’ experience. For instance, the error of load forecasts is believed to be Gaussian (Wang et al., 2017b) and that of solar power forecasts is often assumed to follow a Beta distribution (Prusty and Jena, 2017). Nonetheless, the present literature seems to have shown only marginal interest in verifying these assumptions. A calibrated description of solar forecast error can be obtained from advanced probabilistic solar forecasting methods, which in turn help make the calculation of PPF resemble the actual system operating conditions, and thus increase the credibility of PPF.

1.2

OUTLINE OF THE BOOK

After the brief introduction to the five aspects of solar forecasting, we should now present the outline of the book. The content may be divided into two halves, one deals with cognition and tools, and the other with techniques and methods. The motive of this division is two-fold. First, this book is addressed, not only to forecasters, but to a much larger audience who is interested in the subject of solar forecasting without having a basic understanding of the technical background required to appreciate forecasting science, or even without having a background in research at all. There are many research and review papers available on the subject, but it is unfortunate that most of those lack broad perspectives on solar forecasting that can be instantly appreciated by forecasting novices. Second, it must be admitted that there are many ideas and thoughts in regard to forecasting research, such as the flaws in the current publication regime or some mistakes that are often being committed but not told, that can only be learned through experience or be debated philosophically. Since the scientific literature is mainly accustomed to compliments and applause but rarely embraces criticisms and disapproval, the latter has to be provided with care, and probably with much effort and over many pages. Combining both reasons, the first six chapters of the book are devoted to explaining the philosophical and practical concerns related to solar forecasting, alongside delivering necessary preliminaries for understanding the subsequent content. In the next six chapters of the book, the aforementioned five aspects of solar forecasting, namely, base methods, post-processing, verification, irradiance-to-power conversion, and grid integration are expanded fully. These five aspects jointly cover various aspects of the ideal forecasting framework for grid integration, which has become increasingly recognized by countries and organizations worldwide.

20

1.2.1

Solar Irradiance and Photovoltaic Power Forecasting

THE WORLD OF SOLAR FORECASTING: UNDERSTANDING BEFORE DOING

The methods of solar forecasting are technical in the same way as science is, but solar forecasting, we maintain, has an aspect of philosophy. In the present authors’ own pursuit of knowledge of solar forecasting, the initial interests were surely and wholly placed on techniques, but a part of our attention has slowly shifted towards beliefs and perceptions that are often more philosophical than technical. To that end, we have tried to include philosophical perspectives in our recent research papers. While many have responded to those perspectives with great interest, encouraging remarks, and stimulating discussions, others have merely condemned the work as bombastic nonsense that is not worthy of publishing. In response to the latter viewpoint, we offer a compelling reason for including philosophy as a necessary component of this book: Philosophy can be regarded as a tool in a way similar to how mathematics, statistics, or physics has hitherto been recognized, and it is able to offer just as much help to solar forecasting as any other branch of learning does. On this point, we introduce in Chapter 2 numerous philosophical thinking tools, which, we believe, can assist the readers in forming their worldview on solar forecasting research and practices. These thinking tools include, for example, how to manage unreasonable peer-review outcomes (Section 2.1), how to evolve constructive criticisms (Section 2.2), or how to maximize the utility of an idea through writing (Section 2.9), which, as we mentioned before, are every bit useful if we are to survive in the academia. Forecasts come in two types, one deterministic and the other probabilistic. However, some have argued that all judgment on a future event or a future value of a quantity is intrinsically probabilistic, so issuing the mean or median of predictive distribution shares the same nature as issuing any other quantile, in that all forecasts are probabilistic (pers. comm. with Rafał Weron, Wrocław University of Science and Technology, 2019). Similarly, others may argue that the perpetual existence of uncertainties in data, models, and parameters necessarily denies the possibility of a deterministic forecasting procedure. In view of these differing opinions, we should then make clear in Chapter 3 where the idea of deterministic forecast originates (Section 3.1). Suppose the explanation given for the dichotomous division of forecasts into deterministic and probabilistic is satisfactory, we should proceed to inquire about the forms of probabilistic forecasts in Section 3.2. Central to the probability theory are random variables and distributions, of which the concepts must be first reviewed. Then, the four types of predictive distributions, namely, parametric, semiparametric, nonparametric, and empirical, are defined and exemplified. It should be noted that a predictive distribution contains all information pertaining to the uncertainty of a forecast, but characterizing the predictive distribution can be difficult since each forecast only materializes once. In this regard, we introduce the idea of ensemble forecasting in Section 3.3. In particular, we explain how ensemble forecasts are classified and made in the fields of meteorology, statistics, and computer science. After Chapter 3, the reader should develop a concrete understanding of all possible forms of solar forecasts.

Why We Do Solar Forecasting

21

Solar forecasting becoming a domain of energy forecasting is quite recent, so it may be described analogously as a new member of the band, who ought not to act individually but to become a part coherent with the rest. Chapter 4 offers a brief history of energy forecasting and provides an account of the current maturity of each of the sub-domains (Section 4.1), which, besides solar forecasting, include mainly electric load, wind, and electricity price forecasting. Load forecasting has the longest history and smallest errors. However, it is still by far the most engaged domain of energy forecasting research in terms of the number of publications, which could be due to both the emergence of new problems with increasing renewable penetration and the large economic benefits associated with even marginal accuracy improvements. Wind power forecasting has had the most success in probabilistic forecasting owing to its dependence on probabilistic wind speed forecasting, which has long been a subject of study by weather forecasters. The weather behavior and business activity interact in complex ways in price forecasting. Although the former can be captured by weather forecasting effectively, the latter results in a large group of possibly confounding variables which may all influence the price. Since the problems concerning these three domains all share some resemblance with those concerning solar forecasting, Sections 4.2–4.4 review the state-of-the-art of load, wind, and price forecasting, respectively. On the other hand, since irradiance is the primary influencing factor of solar power, which is not shared by any other domain of energy forecasting, it is quite necessary to inspect its salient features, which are no doubt the most vital distinctions that separate solar forecasting from the others (Section 4.5). In the remaining parts of Chapter 4, the four domains are analyzed jointly, in that, Section 4.6 lists some shared research frontiers, whereas Section 4.7 discusses some common issues and provides some general recommendations for energy forecasting research. Ever since the popularization of solar forecasting, there has been what we regard as a mistaken tendency among solar forecasters to place excessive attention on mathematical abstraction and procedural sophistication. There are certain things that are often deemed as unimportant but are in fact what part the good forecasters from those mediocre ones. In Chapter 5 we present a guide to good housekeeping, in which various often overlooked or abbreviated, yet fundamental, practices of solar forecasting are emphasized. We open the chapter by discussing software and tools that can boost productivity during forecasting research, for which Python and R are highly recommended as everyday companions of solar forecasters (Section 5.1). Next, the terminology and symbols of irradiance components and the k-indexes, which are often confused or used wrongfully, are unified (Section 5.2). Data quality control, as a preprocessing step ensuring the validity of all subsequent analyses, is formalized in Section 5.3 according to the latest recommendations put forward by a panel of experts from the International Energy Agency. Because solar irradiance and thus PV power exhibit a double-seasonal transient (i.e., the yearly and diurnal cycles), one must isolate the seasonal component from the irradiance (or PV power) time series prior to forecasting, which can be achieved with a clear-sky model, which explains all significant factors influencing the atmospheric transmittance except clouds. The essentiality of clear-sky modeling in solar forecasting is unparalleled, and hence quite a few pages are devoted to this topic (Section 5.4). It is also due to the double-seasonal

22

Solar Irradiance and Photovoltaic Power Forecasting

patterns of irradiance and PV power that time alignment, in scenarios where data from multiple sources are to cooperate, becomes an issue, for two misaligned time series tend to exaggerate the forecast errors (Section 5.5). (Although time alignment and data averaging appear to be elementary tasks, even the most experienced can fall victim; we in fact think very few people know all potential pitfalls in these respects.) The last section of Chapter 5 depicts various issues pertaining to operational forecasting. Since solar forecasts are intended to serve grid integration, the operational aspects of forecasting must be highlighted; and the reader must be warned at this stage that a vast majority of existing works failed to acknowledge the operational aspects, in that, those works cannot be considered as contributive to real-world practices. To scientific common sense, it is plain that the virtue of forecasting can only be demonstrated when data is available. Empirical evidence is an essential component of any proposal of techniques. Open research has been recognized as monumental to scientific progress by all major publishers, who have now established platforms that encourage data sharing. In the last chapter of this first part of the book, data that can be used for solar forecasting is thoroughly reviewed. Data for solar forecasting can be characterized by two properties, one is complementarity (Section 6.1) and the other representativeness (Section 6.2). Complementarity suggests as probable a synergy created by employing data acquired by different means, which include (1) ground-based measurement, (2) space-borne remote sensing, and (3) numerical weather modeling. Representativeness is concerned with the generalizability and applicability of data, which can be collected under three heads: (1) spatio-temporal coverage, (2) consistency in meteorological scale, and (3) accessibility. There have been previous efforts in consolidating data sources—through tables and links—but most works of this kind are superficial in the sense that they omitted to offer any concrete advice on the accessibility nor any step-by-step guide for acquiring the data (see Section 6.3). To that end, the last three sections of Chapter 6 present a list of groundbased (Section 6.4), satellite-derived irradiance (Section 6.5), and NWP/reanalysis (Section 6.6) datasets, based on our own experience in working with those datasets. Given the level of detail, with which we wish to provide for each dataset, Chapter 6 is one of the bulkiest, yet profoundly informative, chapters of the book. 1.2.2

A FORECASTING FRAMEWORK FOR GRID INTEGRATION AND FIRM POWER DELIVERY: TOWARD DISPATCHABLE SOLAR POWER

The second part of the book proceeds with summarizing the fundamental knowledge that is required to generate base forecasts. Chapter 7 is separated into three sections, each dealing with one class of base forecast generation philosophy, namely, NWP, image-based forecasting, and forecasting using ground-based sensor network data. We begin with presenting a very brief history of NWP in Section 7.1. Then the following pages of that section explain in detail the main components of an NWP model, which include the primitive equations that describe the dynamics of the atmosphere, the parameterizations that deal with the physics, as well as the initialization and execution aspects. In Section 7.2, satellite-based forecasting is introduced in two halves:

Why We Do Solar Forecasting

23

The first focuses on satellite-to-irradiance models, which are needed to arrive at an irradiance field, and the second elucidates several ways with which one can project the estimated irradiance field forward. The satellite-based forecasting methods, as typified by optical flow, cloud motion vectors, and deep learning, could all be applied to images from sky imagers. As such, sky-imager-based forecasting is not explicitly discussed in this section. One should also note that forecasting using sky imagers is in general less attractive in a grid integration context due to its limited spatio-temporal extent, and its computational complexity cannot be swiftly integrated into the existing PV plant management software. The last section of Chapter 7 is devoted to those statistical methods that are suitable for forecasting with spatio-temporal data. Based on the extant literature on this topic, spatio-temporal statistical models can be classified in the main into descriptive models and dynamical models. Whereas these models have deep statistical roots, other more ad hoc (or non-geostatistical) models are also lightly touched on. We have already had occasion to speak of post-processing. In that, all actions of post-processing are to be exhaustively divided into four kinds depending on the initial and final forms of forecasts, both of which can be either deterministic or probabilistic (Section 8.1). Chapter 8 differs from its preceding chapters in terms of mathematical complexity. The literature on solar forecast post-processing is vast, and owing to the nature of post-processing, which is, in the main, statistical, most works cannot be exempted from employing a sizable collection of symbols and equations. Consequently, consolidating and systematizing post-processing techniques must be regarded as difficult tasks, for the clash of symbols is almost surely unavoidable if mathematical clarity and rigor are to be fully respected. As we do not want the readers to be carried away too much by ambiguities that could stem from notations, we make our best attempt to avoid confounding symbols, and explain the notational conventions in Section 8.2. Then, in the subsequent four sections, the D2D (Section 8.3, P2D (Section 8.4), D2P (Section 8.5), and P2P (Section 8.6) post-processing typology is developed, each containing several classes of techniques, among which each seeks to achieve a certain post-processing goal. To proceed with verification, one must first establish some form of reference to which the forecasts of interest can be compared. Making reference forecasts should not be treated as a casual or trivial task, for verifications made based on improper reference forecasts are unable to reflect the true skill of a forecaster. On this point, requirements on the standard of reference are debated and advanced in Section 9.1, which crystallize into a reference method for deterministic forecast verification that has been endorsed by 33 international solar forecasting experts, namely, the convex combination of climatology and persistence. Moving on from making reference forecasts, the remaining task is to devise a set of qualitative rules and quantitative methods that can be used to assess the forecasts of interest with respect to the reference forecasts. The kind of forecast verification methods familiar to solar forecasters can be gathered under the measure-oriented framework, which derives the outcome by comparing performance measures. We argue that the measure-oriented verification framework has a series of disadvantages which may mislead any conclusion thereby drawn (Section 9.2). Fortunately, a strategy that can amend those disadvantages is

24

Solar Irradiance and Photovoltaic Power Forecasting

available—the Murphy–Winkler distribution-oriented verification framework is introduced in Section 9.3. The Murphy–Winkler framework is concerned chiefly with assessing the quality of forecasts, which is but one aspect of the goodness of forecasts. In this regard, the two other aspects of quality, namely, consistency and value, are described through a case study, which provides a “coast-to-coast” example of how verification of deterministic solar forecasts should be performed. Whereas the quality of deterministic forecasts can be gauged by measures of accuracy, association, calibration, discrimination, among others, the quality of probabilistic forecasts is characterized by calibration (i.e., reliability),5 sharpness, and resolution. However, because these aspects of quality are often defined based on abstract notions, different versions of their interpretations propagate in the literature, which can sometimes cause major confusion. To clear potential confusion, the definitions as provided by two world-renowned forecasters are examined closely (Section 10.1). Next, in Section 10.2, the development of some of the most celebrated scoring rules, namely, the Brier score or the continuous ranked probability score, is reviewed. Unlike most documents in the literature of solar forecasting, in which the computation of the scoring rules are simply listed, we include in Section 10.2 stepby-step derivations of those scoring rules and show how different scoring rules are interconnected; the level of detail provided therein is unprecedented in the field of solar forecasting. Moving beyond the quantitative assessment of probabilistic forecasts, Section 10.3 offers four visualization tools, namely, the PIT histogram, the rank histogram, the reliability diagram, and the sharpness diagram, that can help forecasters quickly gain insights into the goodness of probabilistic forecasts. To close the chapter, a case study on the verification of ensemble irradiance forecasts is presented in Section 10.4. Besides verifying the raw forecasts, which are acquired from the stateof-the-art ensemble NWP model of the European Centre for Medium-Range Weather Forecasts, different post-processed versions of the forecasts are included, to depict how forecast calibration affects the verification results. Irradiance-to-power conversion has two purposes: One serves resource assessment, in which (long-term) satellite-derived irradiance is needed for siting, sizing, simulation, and performance evaluation of PV plants, and the other serves forecasting, in which irradiance forecasts are converted to PV power forecasts; the latter is dealt with in Chapter 11. Section 1.1.4 has introduced the two approaches of conversion, namely, direct conversion based on regressions (Section 11.2) and indirect conversion based on physical model chains (Section 11.3). Although regression is a general tool for mapping certain input variable(s) to the output variable, one must nevertheless take into context the physical nature of solar irradiance. For instance, the clear-sky irradiance, which is needed universally in constructing irradiance forecasting models, is also essential during irradiance-to-power conversion, for it allows one to estimate the clear-sky PV power, which gives the seasonal (or cyclic) components of the PV power time series. Aside from GHI and clear-sky irradiance, other meteorological variables, such as temperature or wind speed, also affect PV power 5 Calibration refers to the same property in deterministic and probabilistic forecast verification, but ways of quantifying it under the two scenarios differ.

Why We Do Solar Forecasting

25

output. For that reason, variable selection, transformation, and feature engineering are relevant, whenever multivariate regression is of concern. As for the model chain, each of its stages is studied in a subsection. However, knowing the sheer amount of available models for each stage, it is not possible to provide a full enumeration in just a chapter. Therefore, instead of excessively expressing the model formulation, the principles and modeling philosophy behind each stage of conversion are what this chapter should focus on. Sections 11.4 and 11.5 are set out to explore emerging topics of irradiance-to-power conversion. The former examines the hybrid conversion, which uses the model chain to a certain stage and leaves the remaining to regression, and the latter discusses the probabilistic extension of the model chain. Up to Chapter 11 of the book, most materials are dedicated to making solar forecasts for point locations. However, grid integration is an areal event, where the outputs of multiple PV plants that are geographically distributed can be, and need to be, aggregated according to certain hierarchies, e.g., based on load zone, or based on substations. On this account, Chapter 12 outlines the so-called “hierarchical forecasting framework,” which optimally aggregates forecasts from all PV systems within a power system, in that, it essentially performs spatial PV power forecasting. Hierarchical forecasting can generate either deterministic (Section 12.2) or probabilistic (Section 12.3) forecasts. Owing to the inherent spatial dependence among the power outputs of PV plants in a power system, performing hierarchical forecasting can exploit such dependence and thus provoke additional and quantifiable benefits on top of the original plant-level forecasts, as also demonstrated in a case study (Section 12.4). In real-world applications, hierarchical forecasting is a task that has to be handled by power system operators, for it requires as inputs all plant-level forecasts, which can only be collected if forecast-submission grid code mandates. Since such grid code is already enacted and put into practice by countries worldwide, hierarchical forecasting is no longer as conceptual as it was just a few years ago. Moving on from hierarchical forecasting, the last section of the book (Section 12.5) presents a new and attractive concept known as firm forecasting, and how we can transform hierarchical forecasts to firm forecasts. The word “firm” denotes the kind of solar power generation that can match the forecast value with 100% certainty, which implies forecasts with no error. The way to ensure firm forecasts is to absorb the mismatch between forecast and actual generation, through a combination of technologies, including electric storage, geographical smoothing, demand response, and smart capacity expansion (i.e., overbuilding & proactive curtailment). Overbuilding & curtailment may appear counter-intuitive for anyone who encounters it for the first time, for it suggests inevitable energy waste, it is nevertheless an important piece of the puzzle towards producing forecasts with no error at the lowest cost. In this regard, we shall keep the trick a secret till the last section of the book.

Thinking 2 Philosophical Tools “You can’t do much carpentry with your bare hands and you can’t do much thinking with your bare brain.” — Bo Dahlbom

Thinking tools are tools for thinking. No different from carpenters needing hammers and saws, or surgeons needing lancets and forceps, thinking tools are needed by philosophers to reliably and gracefully approach what confronts them. One of the best-known philosophical thinking tools among the general public is Occam’s razor, or the principle of parsimony, which prefers a simpler theory or explanation to its more complicated counterparts. There are many other similar thinking tools that are generally applicable to a broad range of problems, whereas equally many if not more are available for specific topics. Recent compendia of thinking tools include Fosl and Baggini (2020); Dennett (2013), which serve as good further reading should one decide to equip himself more in this respect. In this chapter of our solar forecasting book, we are going to explore philosophical thinking tools, a concept that might seem at first unrelated to the topic at hand, but soon will be found useful. At this stage, we anticipate those queries that one may have on the necessity of including thinking tools in an engineering book, as well as those doubts that one may have as to our capacity to make such discussions. Indeed, we are far from being “licensed” philosophers, at least for what our formal training would suggest. But why should discussing philosophy be taken in any different way from discussing statistics, meteorology, or physics? Neither of the two authors has a degree in statistics, meteorology, or physics, but we shall be discussing a lot of those throughout the book anyway, and no logical man could deny the catalytic effects of the knowledge in these fields of study on forecasting. Thinking tools are indispensable to reasoning, understanding arguments, problem solving, and decision making, and it is on this account that they must be placed among the most useful domains of knowledge to advance forecasting research. We share our interpretations about some philosophical thinking tools, in relation to some of the empirical observations we made in the past about forecasting research. It is certain that these interpretations will require modifications later on as the result of new ideals formed through new discoveries. This is, however, not incompatible with recording and sharing our current ideals. The thinking tools outlined herein mostly belong to a toolbox labeled intuition pumps. Initially emerging from the labor of Daniel Dennett, intuition pumps can be thought of as images, stories, analogies, and thought experiments that can furnish us with something tangible and lucid to grasp and comprehend in any other case abstract and obscure concepts. Intuition pumps are persuaders, which are essential in any exploratory process of getting the truth. It must be underscored that intuition DOI: 10.1201/9781003203971-2

26

Philosophical Thinking Tools

27

pumps are not philosophical arguments, which are employed strictly to demonstrate that something is true. Instead of that, they are little devices that make a “Ding!” sound to guide us logically, to caution us about potential dangers, and to nudge us along, so that we may eventually arrive at the truth.

2.1

ARGUING WITH THE ANONYMOUS

In our academic pursuits, we are occasionally subjected to discouraging and, at times, even disparaging evaluations of our manuscripts. The cloak of anonymity worn by reviewers renders them impervious to the kind of forceful counterarguments that might otherwise arise in a face-to-face dialog. When confronted with such unwelcome assessments, we are left with a binary decision: either to acquiesce to the authority of the reviewers and undertake tortuous revisions, or to withdraw the manuscript and vow never again to entrust it to the same journal. The former option is often motivated by a recognition of the folly of challenging those who wield considerable power, while the latter stems from feelings of frustration and disappointment. If such an experience ever triggers you to question the justice in the peer review process, we can boldly confirm that: You are not alone. With important benefits, peer review is a flawed process with many easily identifiable drawbacks, as argued by Smith (2006), it can be costly, highly subjective, prone to bias, and easily gamed. It is far from uncommon to receive inconsistent and conflicting opinions from different anonymous reviewers, which then urge any responsible editor to invite more reviewers. The paper, Energy forecasting: A review and outlook (Hong et al., 2020), was so persecuted with eight review reports holding vastly diverging views of what should or should not be included, what can or cannot be written, or even which reference should or should not be cited. One of our friends, Dipti Srinivasan, who serves as the Associate Editor for four IEEE Transactions journals, complained about being forced to invite 25 reviewers (with some declining to review, of course) for one paper, because each new reviewer was saying something different; if that sounds laborious, think about how the authors must have felt having the need to address such a mess. With all these resources and efforts pouring into organizing peer reviews, it is alarming to note that there is scant empirical evidence to support the efficacy of the process (Jefferson et al., 2002). In fact, some philosophers have gone so far as to suggest that pre-publication peer review should be eliminated altogether (Heesen and Bright, 2020). Their rationale is straightforward: If the highest authority comes from the people, if the customers are always right, and if the voters know best, then why not allow the market to determine what research is valuable and what is not—just let the invisible hand do its job? The pillar by which peer review is supported is the belief that such a process could filter for or select the best contributions to be published in a journal, based on overriding opinions if not a consensus. This, unfortunately, seems to be a necessarily difficult task for homo sapiens—disagreement is not within the control of individuals, but rather depends on the characters of others and the entire trajectory of their life which brings them to forming their present worldview. Any small disturbance to that trajectory can deviate it from another, even if the two may start off from the same

28

Solar Irradiance and Photovoltaic Power Forecasting

origin. (Just think about politicians with opposite views: Their shared initial ideal of making the country a better place soon evolves and bifurcates into throwing shoes, pulling hair, condemning the opposite, or worse, being indifferent.) It is, therefore, worthwhile to delve into the philosophy of disagreement. At the extremes of the “spectrum of reviewers” lie two polarized figures: the academic hooligans who dissent from everything except for that which they have previously endorsed, and the benevolent grandparents who acquiesce to anything as long as no insufferable errors are present. In both cases, there is little reason or need to engage in argumentation. Only when a reviewer occupies a position somewhere in between these two extremes, does one have the motivation to acknowledge the disagreement, and therefore proceed with settling it. The focus of this section, then, is centered on the fundamental epistemological issues tied to the recognition of disagreement. We take ourselves to know about something with reasons. For example, Dazhi knows his birthday because his mom told him the date; Jan knows Dazhi knows his birthday because he knows Dazhi is an organized person and thus should remember his birthday; a forecaster named Joe knows that it requires at least five members to make an ensemble forecast, because he read about that in J. Scott Armstrong’s book called Principles of Forecasting. Obviously, for the first two cases, there is probably nothing worthy of discussion. However, when the case does matter and the reasons are less than absolutely conclusive, there would be disagreements. For instance, a hypothetical reviewer of Joe’s hypothetical paper could believe that three members are sufficient to make an ensemble forecast. The central question is thus: What should we make the fact about such a disagreement? Should we hold fast to our views anyway; should we concede and move to the reviewer’s side; or should we simply moderate our conviction? On this point, three strategies seem logically available. For one, we can remain steadfast. Joe refuses to reduce the member forecasts from five to three, if Joe knows the reviewer has far less experience than him in forecasting, and thus is Joe’s epistemic inferior. Secondly, we can concede. Joe decides to reduce the member forecasts to three, if Joe knows that the reviewer is Armstrong himself who just revised his book, and thus is Joe’s epistemic superior. Lastly, we can moderate our conviction, by either suspending judgment or at least lowering our credence to it. Joe decides to test out another ensemble forecast consisting of three members and compare results, if he knows the reviewer is as reliable as him on matters related to forecasting, and thus is Joe’s epistemic peer. The rational response that Joe would make depends on the relative epistemic position of Joe and the reviewer. Epistemic peers are intellectually equal, but they could both be in a bad epistemic position. In other words, two fools could be peers. In that case, both should probably suspend judgment, but there is not much interest in pursuing such cases further, since forecasters are not fools. When epistemic peers pass some threshold of competence, things get trickier. First of all, neither person should take concession as a response, because each of the peers is and thinks of himself as good as the other, i.e., neither thinks he is relatively incompetent. Secondly, each of the peers acknowledges the usual rigor and skills of the other, but thinks that, this time, the

Philosophical Thinking Tools

29

other has made an unfortunate error—both peers take steadfastness as a response. This type of dismissive, or intellectually arrogant, mentality makes the disagreement completely insoluble. Nevertheless, this is contradictory to the fact that they are intellectual equals, because not considering the peer’s reasons, in itself, is disrespectful.1 The only remaining option, now, is conciliation. Suppose one peer takes an intellectually humble position, and believes there is no better reason to trust his own judgment than to trust the other peer’s judgment. Subsequently, he rethinks the issue on the basis of the disagreement. This may seem to be a more logical response as compared to concession or steadfastness, notwithstanding, backing off from the initial level of conviction in one’s judgment lowers the level of credence in his conclusion. It then leads to the problem that the one who backs off can no longer reach the threshold of competence to be considered as an epistemic peer to the other. In other words, the very action of backing off might make the intellectually humble person seem coward, because his initial level of conviction seems rather weak after confrontation and could be given up easily. The issue has now been diverted to the trade-off: What can the conciliatory person get in return? If epistemic peers have the same evidence, the same background beliefs, and the same reasoning ability, it might appear that the conciliatory person would not get any additional knowledge. That is not quite the case. The peer who carefully examines a rational disagreement could realize the different dimensions on which disagreement could materialize. For instance, back to the ensemble forecasting example, both Joe and the reviewer are obviously aware of the fact that it requires more than one member to form an ensemble. By thinking about why the reviewer needs fewer members—the number of members often correlates to the spread, and thus coverage, of the ensemble forecast—Joe may come to the conclusion that computing a coverage probability of the final predictive distributions could immediately resolve the argument. In other words, adopting a humble approach to disagreement enhances the understanding of the topic. Additionally, the disagreement itself reveals possible areas that call for future studies, for instance, in our example, the general correspondence between ensemble forecast accuracy and the number of members might be worth investigating. Though an actual case could be more complicated than this, the idea can be readily transferred across. Finally, we list some fundamental thinking tools for peer review: • Accept that peer review is flawed but with important benefits; • Do not consider an all-positive review as indisputable proof of paper quality, as it may come from benevolent grandparents; • Do not get discouraged by an all-negative review, as it may come from academic hooligans; 1 In reality, if editors have to make decisions under the presence of irresolvable differences between author and reviewer, they tend to favor the reviewer as a more precious resource, which is quite foolish, since turning down credible authors is equally, if not more, harmful to the journal. Based on our experiences, very few editors seem to care enough to evaluate such situations.

30

Solar Irradiance and Photovoltaic Power Forecasting

• Identify the best ways to respond to the reviewer’s disagreements through inductive or abductive reasoning on the relative epistemic position between you and the reviewer; • For communicating disagreements on major issues, use Rapoport’s rule (this applies to both authors and reviewers, see next section). We will make point-form summaries for the following sections as well. So if you do not have time to go through the details, just look for the summaries.

2.2

RAPOPORT’S RULE

The two Rapoport’s rules, one in ecology and the other in philosophy, do not share anything in common but the name itself. In ecology, Rapoport’s rule, attributed to the Argentinian ecologist, Eduardo H. Rapoport, is an ecogeographical empirical relationship that describes decreasing latitudinal ranges of species towards the equator. In philosophy, Rapoport’s rule, attributed to the American mathematical psychologist, Anatol Rapoport, sets an outline of the steps needed to criticize opponents effectively. It is the latter Rapoport’s rule that we are referring to in this section. Agreement in science, as we have seen earlier, is rendered difficult by the fact that it involves at least two minds, with the result that confusions between opposing points of view are constantly slowing down the progress of its forming. If any agreement were to be reached, one of those who are involved needs to initiate sensible communication. In the case of the peer review process, reviewers and authors take turns to express their views related to the submitted manuscript, until a final decision is made by the handling editor of the manuscript based on the collective information gathered throughout the process. Particularly due to the brevity in writing throughout the entire peer review process—both the reviewers’ reports and the response letters would be much shorter than the manuscript itself—effective communication may not be always guaranteed, in which case would cause frustration, accumulate misunderstanding, amplify disagreements, and eventually lead to undesirable outcomes. In other cases, one needs to be even more charitable when criticizing a published work. Rapoport’s rule, as reformulated by Dennett (2013), states: Rapoport’s rule 1. You should attempt to re-express your target’s position so clearly, vividly, and fairly that your target says, “Thanks, I wish I’d thought of putting it that way.” 2. You should list any points of agreement (especially if they are not matters of general or widespread agreement). 3. You should mention anything you have learned from your target. 4. Only then are you permitted to say so much as a word of rebuttal or criticism. Daniel C. Dennett Intuition Pumps and Other Tools for Thinking

Philosophical Thinking Tools

31

Echoing the earlier discussion on relative epistemic position, not every disagreement (or opponent) deserves the treatment of invoking Rapoport’s rule. For instance, if the contradiction in the opponent’s view is bare, one should simply point out the contradiction, forcefully. If the disagreement originates from a difference in preference rather than from differing scientific views, it is typically irreverent to the real issues; in such as case, one can also save the trouble of using Rapoport’s rule. However, if the contradiction or mistake is subtle or hidden, following Rapoport’s rule is likely to result in a receptive audience for the criticism. A good demonstration of Rapoport’s rule in a forecasting context can be found in the short communication by Pinson and Tastu (2014), in which the authors criticized the coverage-width-based criterion (CWC). CWC was proposed and used by Abbas Khosravi and colleagues in several works of their own (e.g., Khosravi et al., 2011; Khosravi et al., 2013; Khosravi et al., 2013); it is a score for evaluating interval forecasts. In order for any scoring rule to be useful during forecast verification, it needs to be proper, that is, a forecaster maximizes the expected score only by issuing forecasts according to her true beliefs. However, that is not the case of CWC, because it permits forecasters to hedge that score, by finding tricks to attain a better score without issuing meaningful forecasts. In just one and a half pages, Pinson and Tastu (2014) rephrased the reasons and motivation for proposing CWC, acknowledged the difficulty in evaluating interval forecasts, provided the background information on CWC including its formulation, and finally, disproved CWC by contradiction. Pinson did not know about Rapoport’s rule at the time he wrote the technical note. However, in a personal communication between the current authors and Pinson, he highlighted the necessity of constructive criticism in the scientific community, which was the primary concern he had in mind while writing that technical note. In that sense, Rapoport’s rule appears to be a priori. In other words, one does not need to be aware of Rapoport’s rule to use it, as long as the premise of the communication is placed based on goodwill, in contrast to if the sole goal of the communication is to humiliate the opponent. Unfortunately, even goodwill or strict alignment with Rapoport’s rule does not always lead to a settlement. According to Pinson, that technical note was rejected twice by the IEEE Transactions on Neural Networks and Learning Systems, and a few times by the IEEE Transactions on Sustainable Energy. Central to these rejections is the fear of the unknown or the reluctance of admitting to mistakes, which is to say that neither the editors were familiar with the statistical theory related to proper scoring rules, nor they were willing to publicly announce the (repeated) oversights that they have committed on judging the scientific rigor of those works proposing the CWC. In short, because Pinson and the editors can hardly be considered as epistemic peers—rather, Pinson as the Editor in Chief of the International Journal of Forecasting is the epistemic superior to the two IEEE journal editors in terms of forecasting—their disagreements were particularly painful to proceed with. Rapoport’s rule is the natural enemy of denialism. Denialism in general refers to the choice of denying reality as a means to circumvent a psychologically uncomfortable truth. One particular case of denialism can be found easily among the supporters of pseudo-science, e.g., flat earthers, climate change skeptics, or to some extent, the

32

Solar Irradiance and Photovoltaic Power Forecasting

creationists, who are driven by a desire to overturn the mainstream scientific theory or branch of science, and to propose alternative theories of their own (Hansson, 2017). A common trait of science denialism is to take parts of the scientific theory into isolation (also see Occam’s broom below), and use very specific evidence that is perceived to be true to argue against the isolated parts, so to dismiss the entirety. Because Rapoport’s rule emphasizes the complete, balanced views on the opponent’s arguments, it works in the direct opposite of science denialism. • No one likes to be pointed out wrong, but constructive criticism is essential; • Not all situations require Rapoport’s rule; • Rapoport’s rule does not guarantee that the opponent can accept your arguments, but it is your best chance; • Rapoport’s rule counters denialism.

2.3

REDUCTION: WHAT IS “GOODNESS”?

Reductionism (not to be associated with “reductionist”) is the process of explaining one sort of phenomenon or object in terms of simpler and more fundamental ones that underlie that phenomenon or object. It is the process of understanding things by taking them apart into smaller things. Reductionism is enormously important for scientific understanding. In that, it is anchored on the hypothesis that the properties of the constituents of a system determine how the system works. Whereas the concept of reductionism is quite abstract, some simple examples would help clear up the idea. To understand why water boils at 100◦ C, one needs to take water apart into molecules. As water is heated, its temperature increases, and the molecular motion increases, until a point where motion becomes so intense that some molecules break free from the attractions between them and become gas. It should be noted that although it is not necessarily easier to understand the concept of molecular motion than to understand boiling itself, this approach of reductionism describes what happens to the many small parts of a system (water molecules) that make up the more complex overall process (i.e., the bubbling water). That said, reductionism is not always helpful. In the case of biology, trying to understand how an organism works from its molecules is not a good strategy, due to the difficulty in connecting the theory for the constituents (molecular biology) and the theory for the whole system (microbiology). In physics, we have yet to be able to fully reconcile the theory of quantum mechanics for the very small and the theory of relativity for the very big. In this regard, reductionism does not seem to be useful at all, in terms of understanding gravity and the things it dominates: planetary motion, colliding galaxies, and the expanding universe. So, when does reductionism work? The model answer to this and every other similar question is “It depends.” The idea of reductionism can be used to break apart more abstract notions such as “goodness” or “badness” (Fosl and Baggini, 2020). We all have some idea of what goodness is, and what badness is. The concepts are however not quite selfexplanatory as they may seem. In the case of forecasting, when one says “this is a

Philosophical Thinking Tools

33

good forecast,” what exactly is the forecaster referring to? Does it mean that the forecast is accurate? Or could it be that the forecast is able to lead to monetary savings for its user? If such ambiguity is not resolved, there seems to be room for disagreement among forecasters when the word “good” is used. This also prevents one from making meaningful comparisons: the word “better” has no foundation when “good” is not well understood. This philosophical technicality also seems to have had bothered Allan H. Murphy, a late meteorologist who pioneered forecast verification. In his 1993 essay, Murphy (1993) raised the question—“What is a good forecast?”—and took goodness apart into consistency, quality, and value. Clearly, these constituents are more fundamental; they give a precision of meaning that “being good” does not. What is more useful is that each of the constituents can be further broken apart to enhance precision. For instance, quality can be jointly described by bias, association, accuracy, skill, reliability, resolution, sharpness, discrimination, among other factors. We will circle back to Murphy’s ideology of forecast verification and his belief on what constitutes a good forecast in a later chapter of the book. • When a concept is abstract or ambiguous, one can understand it by taking it apart; • The goodness of a forecast constitutes three parts: consistency, quality, and value; • All metric- or measured-oriented forecast verification approaches only deal with quantifying the forecast quality, but there are other aspects of goodness to a forecast.

2.4

“I OWN MY FUTURE”

Murphy’s reductionism describes how forecasts should be evaluated by taking the task into smaller pieces. But before one can verify forecasts, one needs to first make them. In contrast to the operational situations, where forecasters actually do not know how their forecasts perform until the observations materialize, in research settings, forecasts are verified based on hold-out samples, which refer to those known observations that are set aside (or “hid away”) during model building, but are subsequently brought back during performance evaluation. In a not-at-all exaggerating fashion, all academic forecasters can confidently announce, “I own my future,” since the future in this particular case refers to those hold-out samples, chosen carefully, or even deliberately, by no other than the forecasters themselves. There are, consequently, some potential dangers that need to be warned. To start the discussion, we study the two types of scientific achievement, which the 20th -century philosopher of science, Karl Popper, refers to as science and pseudoscience, by examining the distinction between the achievements of Sigmund Freud, the founding father of psychoanalysis, a method for treating mental illness, and

34

Solar Irradiance and Photovoltaic Power Forecasting

that of Albert Einstein, who owns the theory of general relativity (Thornton, 2018). Emerging roughly at the same point in history, both Freud and Einstein made predictions that helped to advance people’s views about the world. Freud’s prediction was concerned with the individual mind, and how childhood experiences have a heavy bearing on psychoneuroses. Einstein’s concern was, of course, related to the validity of his general theory of relativity. What was characteristically different between Freud’s and Einstein’s predictions is that Freud was able to make just about any data point work in service towards his theory, e.g., how a kid who was not being hugged enough during his youth can develop to become a serial killer, Einstein, on the other hand, had to wait for a solar eclipse to prove (or to disprove) his theory about how the sun’s gravity would bend the path of light from distant stars. It is noted that the distinction we are trying to make here is not on neurology versus physics, but rather on the nature of these two predictions: Freud’s prediction is ex pose (i.e., after the event), and Einstein’s ex ante (before the event). Because the serial killer was a kid who lacked hugs, the prediction made by the psychoanalysis was verified as true. During the solar eclipse, since light from a distant star was indeed observed to have been bent, the prediction made by the general theory of relativity was also verified as true. Notwithstanding, there does not seem to be any uncertainty associated with Freud’s prediction—whatever connection he made between childhood experiences and psychoneuroses must be true, because the events had already happened. In contrast, had the observation during the solar eclipse contradicted Einstein’s prediction, his whole theory would have been conclusively disproven. Stated differently, although both theories are verifiable, Einstein’s theory is inherently “riskier” than Freud’s. To that effect, Popper (2014) rejected verifiability as a criterion for a theory to be considered scientific. Instead, he proposed as the criterion the falsifiability, or refutability, that is, the capacity for a theory to be contradictory by evidence. Popper believes that scientific methods like Freud’s only serve to confirm preconceived beliefs, and this type of method should be classified as pseudoscience; pseudoscience can be used to confirm anything. For example, one can easily find evidence about Santa Claus, e.g., from Macy’s Santaland or Christmas postcards, but what can that really prove? It is only by seeking evidence to disprove Santa’s existence that one can disconfirm the theory. That said, Popper’s demarcation criterion on science and pseudoscience has the possibility of rejecting legitimate science, or giving pseudoscience the status of being scientific (Hansson, 2017). In one case or another, knowing this demarcation criterion seems to offer more benefits than drawbacks when it comes to judging the legitimacy of the design of a forecasting experiment. People in academia believe that it is the notion of having incremental contributions that expands the boundary of science. What this implies is the obligatory need for benchmarking, either by experiments or at least through logical arguments. Only after a new idea is verified to be both true and better than an old one, it acquires value. This compulsion of “being successful” has disbarred much good research from publication, in conjunction, it has been the incubator of not just forecasting science, but also forecasting pseudoscience. Echoing Popper’s theory, convincing results can

Philosophical Thinking Tools

35

be deceiving, because much attention, when assessing a forecasting work, is placed on verifiability rather than falsifiability. Stated differently, given the mere outcome of one forecast is better than the other, it is difficult to judge whether or not the advantage of that better forecast is due to the pseudoscientific setup of the forecasting experiment. Let us look at one example. When we think of verification, one common misapprehension is that: As long as the hold-out samples are not used during model fitting, the forecasts are legitimate. One course of action to game the system is by repeatedly “peeking the future” in ways other than directly including hold-out samples during model fitting. Suppose a forecaster trains a machine-learning model. Upon verification, he finds out that the results are unsatisfactory. Then he goes on to tune the model, repeatedly, until the results appear to be satisfactory. Throughout the process, although the forecaster did not directly feed hold-out samples into the model itself, the repeated verification did act as a feedback loop, for the forecaster to progressively change his initial theory or belief according to the new evidence he observed. This is a form of cheating, which is, unfortunately, almost impossible to be detected by others, since the action itself is internal to the forecaster. Nonetheless, because the forecasting models trained using this sort of stratagem are biased towards the particular set of hold-out samples, they often perform poorly against other datasets. (For those of you who just learned a new cheat of making accurate forecasts, we hope you will never use it, because if you do so, in the end, you are only fooling yourself.) In forecasting, it is actually best “not to own the future.” That is why forecasting competitions are so valuable in gauging the skill of forecasters, because no one will be given the verification dataset beforehand. For that, winners of forecasting competitions deserve a big round of applause. • Not all scientific achievements are created equal, but not just that, some can even be considered as pseudoscience; • Real forecasts are always ex ante; • As a forecaster, you should be true to yourself; • Good verification outcome does not imply good forecasts, it may simply be a result of circular reasoning, a careful manipulation of data, or cheating; • Forecasting competition winners are good forecasters.

2.5

OCCAM’S RAZOR

How could we discuss thinking tools without mentioning Occam’s razor? Sharpened by the 14th -century philosopher, William of Occam,2 the razor is a problem-solving 2 Evidence suggests that the term “razor” did not actually appear in Occam’s texts (Schaffer, 2015; Spade and Panaccio, 2019), but the sentiment towards this principle is certainly shared by Occam and many other philosophers in those and current ages. One should note that, in philosophy, there are other razors besides Occam’s, such as Hitchens’ razor or Alder’s razor, in that, the term “razor” generally refers to a rule of thumb that allows one to reject (or “shave off”) unlikely explanations or theories.

36

Solar Irradiance and Photovoltaic Power Forecasting

principle stating that “Don’t multiply entities beyond necessity,” see Schaffer (2015). It should be noted, however, that Occam’s razor was never meant to be used as a tool or support to deny putative entities. Rather, it allows us to abstain from postulating those entities in the absence of known compelling reasons for doing so. In other words, complex and intricate theories do exist, we just need to look out for those extravagant theories that lack convincing justifications. As an easy example, if the forming of clouds can be explained through water vapor turning into liquid water droplets, there is no need to hypothesize invisible alien weather machines floating in our atmosphere and making pseudo-random clouds based on a program running on the supercomputer on the mothership. The above example may sound hilarious, but when the theory and the area of study are less commonly known, it can be difficult to discern for sure what is and what is not “necessity.” The Covid-19 pandemic has given conspiracy theorists a stage to perform their evil acts. Many people seem to have fallen for the theory that the pandemic is deliberately planned by some billionaires to achieve their personal goals, may it be getting richer by developing vaccines, performing a purge to reduce population, or even taking over the world. The truth is that the theory behind the pandemic spread is complex and interdisciplinary, it would take some effort for anyone without a background in biology and immunology to just understand what a virus is, how a virus infiltrates the human body and replicates within the body, how the coronavirus is different from influenza, and how psychological and social behavior of people can change the trajectory of the spread. It may be tempting, somehow, to believe in the “easier” theory put forth by the conspiracy theorists. In this particular case, instead of performing useful shaving with it, one gets cut by the razor. It is absolutely key to understand that Occam’s razor is a rule-of-thumb, not a metaphysical principle. That said, Occam’s razor is a frequently useful suggestion. Rob Hyndman, the immediate past Editor in Chief of International Journal of Forecasting, operates a blog called Hyndsight, in which he regularly updates his thoughts on research, forecasting, and statistics, among other things. (We recommend every forecaster to subscribe to that blog.) In a blog entry dated March 12, 2015, Hyndman gave several common reasons for rejection, and one of those is “outrageous claims.” From time to time, Hyndman receives some submissions claiming how shockingly good the proposed forecasting algorithm is. Indeed, when such claims appear, two possibilities present: (1) the authors have, against all odds, discovered the holy grail of forecasting that can overturn the collective knowledge that forecasters over the past century have accumulated, and (2) the forecasts are evaluated in a grossly misleading way, or they are simply wrong. The choice is rhetorical. Certainly, much early evidence has suggested that, in many cases, the best forecasts are made from models that leverage a small number of stable and efficient predictors (Reiss, 1951), and more often than not, it is just the trend line that forecasters are interested in (Dorn, 1950). Why bother with a long list of input factors that have obscure relationships with the quantity being forecast? It is true that adding complexity to a model may result in a better fit of the experimental data, it often

Philosophical Thinking Tools

37

comes at a cost of overfitting, which leads to inferior performance when the scenario is extrapolated. Moving into the more recent years, after machine-learning models have flourished, solar forecasters have, too, realized that too complex a neural network can lead to overfitting (Voyant et al., 2017). While regularization or bootstrap aggregating can prevent overfitting, it is also the forecasters’s obligation to investigate the causal relationship between the input and output of a forecasting model. Occam’s razor comes in handy in predictor selection. Take wind speed for instance, if it is the power output from a photovoltaic system that is being forecast, the motivation for using wind speed is straightforward: Wind speed across modules plays a part in convective heat transfer, dropping module temperature, and thus positively correlates to power generation. As for irradiance forecasting, particularly when the data comes from a single location, there does not seem to be a simple explanation for why wind speed would affect the outcome of radiative transfer at that location. The “omnipotent” and “omniscient” neural network seems to have tricked all too many amateur forecasters into thinking the reason for bad forecasts is having too few variables. That is just absurd. A more plausible explanation of the bad forecasts is perhaps that the variables entering the neural network are simply garbage.3 • Occam’s razor is a useful suggestion, not a metaphysical principle; • Over-reliance on Occam’s razor gets you cut, because sometimes good forecasting is in fact complex; • Be suspicious of outrageous claims in forecasting papers; • Machine-learning forecasting is no different from any other use of machine learning—garbage in garbage out; • Seek reasoning and justification when choosing explanatory variables; • Simple models are not necessarily weak, on the contrary, they are often preferred by expert forecasters.

2.6

OCCAM’S BROOM

The term Occam’s broom is coined by the molecular biologist, Sydney Brenner, to describe the process in which inconvenient facts are whisked under the rug by intellectually dishonest champions of one theory or another (Dennett, 2013). The motivation for wielding Occam’s broom is to make a message or a conclusion sound, despite the wielder of the broom quite possibly knowing that it is not. Unlike other tools described in this chapter, Occam’s broom is an anti-thinking tool, preventing us from spotting the pitfalls of an otherwise perfectly useful statement. 3 This is a concern that we have repeatedly voiced out, in the past, with the most recent attempt being Hong et al. (2020), in which we argued that the utilization of exogenous data is not the simple inclusion of unprocessed weather variables into machine-learning models, rather, one ought to dive deeper into the intrinsic properties of the explanatory variables with respect to the forecast variable.

38

Solar Irradiance and Photovoltaic Power Forecasting

A classic example of Occam’s broom comes from the Business Insider: “They are myopic in the extreme. If you hear a man has $100 million in debt, perhaps you worry about his financial situation. Then when you find out he is Bill Gates, you stop worrying.” The inconvenient fact that the wielder of the broom tries to hide is that the balance sheet contains not only a liability side, but an asset side. Although $100 million in debt for an average Joe would perfectly justify any concern that one may have, such concern is completely dismissed if the conditional is placed on Bill Gates’ $100 billion net worth. However, without the missing piece of the puzzle, it is indeed quite difficult for anyone to detect the flaws in the statement. The concept of Occam’s broom is further elaborated using a pair of examples from the solar forecasting literature. The first example comes from the paper by Sperati et al. (2016), in which a method called the ensemble model output statistics (EMOS) was used to post-process the 50-member ensemble forecasts issued by the European Centre for Medium-Range Weather Forecasts (ECMWF). The deterministic-style post-processed forecast for time t from EMOS is given by a weighted sum of the member forecasts: 0 + w 1 xt1 + · · · , w m xtm , y t = w 1 , . . . , w m are where xt1 , . . . , xtm are m member forecasts at time t, respectively; w weights estimated via a regression; and wˆ 0 is an intercept term. Like all linear regressions, the implication of EMOS is that a better member forecast will be awarded with a heavier weight, and thus have more contribution to the final forecast. Notwithstanding, the “hid away” inconvenient fact is that ECMWF generates ensemble members that are meant to be equally likely, by carefully perturbing the initial conditions. Why should equally likely forecasts be associated with different weights? Obviously, some contradiction gets in our way. Another example is drawn from the paper by Diagne et al. (2014), where a Kalman filter was used to post-process the irradiance forecasts from the Weather Research and Forecasting (WRF) model. The Kalman filter, a major algorithm in statistics, economics, and engineering, is able to remove systematic bias from a signal, in an iterative manner. On the other hand, WRF issues day-ahead forecasts, which often suffer from model-led bias. The results shown in the paper, with little surprise, supported the authors’ main claim: The filtered forecasts have smaller errors than the raw forecasts, hence, the Kalman filter is a useful post-processing tool. However, the “hid away” inconvenient fact is that the Kalman filter requires the most recent observation to operate, i.e., to filter for the forecast made for time t, the measurement at t − 1 needs to be known. But the value of day-ahead forecasting resides in its ability to issue all forecasts over the next few days; employing the Kalman filter turns the day-ahead forecasts to hour-ahead forecasts, and thus defies the entire purpose. Both solar forecasting papers mentioned above come from reputable groups of researchers, with whom we have worked closely. More importantly, both facts, namely, the ECMWF member forecasts are equally likely, and Kalman filtering changes forecast horizon, were known by the authors; these facts were stated in inconspicuous sentences in their papers. Although we assuredly believe that we are not the only ones who would be able to spot these problems, there seems to be a large

Philosophical Thinking Tools

39

number of victims of Occam’s broom out there, e.g., there are quite many subsequent papers wrongfully interpreted the Kalman-filtered WRF forecasts as day-ahead forecasts. One may ask: How to spot something invisible? Unfortunately, you cannot, unless you are a master of solar forecasting. • Try to work with everything you know, not just the premises set by others, because those can miss out facts that are important to your judgment; • You need to do a lot of forecasting to detect the invisible facts swept away by the wielders of Occam’s broom.

2.7

THE “NOVEL” OPERATOR

Perhaps the most central component of any solar forecasting work, or any scientific publication for that matter, is novelty. Nevertheless, it is not as tangible as we would like it to be. The notion of novelty is built on the quality of being new, being original, and being unusual—but relative to what? It has to be new, be original, and be unusual relative to the entire body of literature on the subject. That is surely a hard thing to verify, unless one has already skimmed through all published works, a task that has become nearly impossible to accomplish today without the help of advanced language robots and access to all historical publications. There are too many self-proclaimed novel methods in solar forecasting, some of them simply lay down the word “novel” in the title: A novel method based on similarity for hourly solar irradiance forecasting (Akarslan and Hocaoglu, 2017), Novel short term solar irradiance forecasting models (Akarslan et al., 2018), A novel soft computing framework for solar radiation forecasting (Ghofrani et al., 2016), A novel composite neural network based method for wind and solar power forecasting in microgrids (Heydari et al., 2019), just to give a few examples. Let us take a closer look at the abstract of the first title: “[...]This methodology considers the past records of data to predict solar irradiance value for the desired hours. To predict next hour’s solar irradiance data, a day similar to the prediction day is sought in the history.[...]” It suggests that the forecasting method of interest leverages the similarity between the current and historical time series patterns, and uses whatever happened in the past as the prediction for the future. Wait a minute! Is not that just the analogbased method in weather forecasting, a method that has been known since 1963 after Edward Lorenz coined the term? In the load forecasting literature, the method is known as the similar-day method; in machine learning, the nearest neighbor— different names, but the same flavor. In any case, it would take a substantial amount of work to claim novelty on this topic, especially after six decades of research. The choice between two competing theories explaining why the authors use the word “novel”—(1) the authors are geniuses who expanded the boundary of science, and (2) the authors simply did not know the relevant literature and reinvented the wheel— is most obvious, if one flips to the reference section of that work.

40

Solar Irradiance and Photovoltaic Power Forecasting

Because of the intangibility of novelty, the use of the word “novel” sometimes marks the very edge of what the authors believe to be novel, and hope the readers can side with them without digging deeper into the literature. It often turns out to be a bald assertion after the authors realize the actual amount of work that needs to be done to proclaim novelty. What is worse is that the self-proclaimed novelty turns into “verified” novelty after the peer review process, during which the reviewers’ attention could be placed on a lot of things but nitpicking the choice of words (or in some cases, reviewers are inferiors who lack the knowledge the judge novelty). Consequently, the authors get their way. We call this trick the “novel” operator. The Chinese proverb “no 300 taels of silver buried here” describes a story of a man who was worried about someone stealing his silver, so he put up a sign saying “no silver here” above the place where he buried the silver. Of course, the silver was then stolen in no time, but another sign was put up saying “not stolen by your neighbor.” Funny as the proverb may sound, a clumsy denial results in self-exposure, and a very poor lie reveals the truth. Real novelty goes without saying, especially in the scientific community. The bare display of the word “novel” gives enough to arouse suspicion. To seek evidence supporting our argument, we scanned the most recent issue of International Journal of Forecasting at the time of writing, namely, the October–December 2020 issue, the word “novel” appeared a total of five times in 3 out of 26 articles, and none of which was used to describe the proposed method itself, but rather describing works in literature. The “novel” operator takes many different forms, including but not limited to “new,” “original,” “effective,” “efficient,” “innovative,” “intelligent,” “smart,” “advanced,” and most of all, “state-of-the-art.” All of them have the same trait, that is, they only make sense when there is a comparison or relativeness attached to the context. When these operators are used without context, an alarm bell should ring. “Ding!” Of course, the “novel” operator is not absolutely conclusive, as there could be many occasions where the authors are innocent and novelty actually exists. This is, however, not incompatible with developing a sensitivity to the “novel” operator. • True geniuses are few, so not every paper is novel; • Do not, or at least try not to, self-confirm your contributions using unnecessary eye-catchers such as “novel,” “original,” “state-of-the-art” or that sort; • Real novelty is crowned by others, if others wrote about the novelty in your paper, cite them; • Be cautious of those papers that directly put fancy descriptors in their titles.

2.8

THE SMOKE GRENADE

The use of smoke grenades (another anti-thinking tool right next to the “novel” operator) in various fields of study, including solar forecasting, refers to the deliberate creation of confusion or obfuscation to distract from weaknesses or flaws in a particular approach or method. It can be a tactic used by those who wish to push an

Philosophical Thinking Tools

41

agenda or claim without proper scrutiny or critical evaluation. “If you cannot convince them, confuse them,” so said to be said by Harry S. Truman, the 33rd president of the United States. No doubt that this quote has given rise to both positive and negative interpretations, reader discretion is advised. We use this quote to motivate the following discussions on unintelligible communication in scientific research. The key issues to be addressed are: (1) why do some researchers throw smoke grenades, (2) how do researchers form and justify their views on an unintelligible paper, and (3) how can we see through the smoke? There is a whole range of factors that may influence the audience’s judgment on whether or not a communication was effective. Among the obvious are the audience’s background on the topic, the attentiveness to the communication, and most importantly, the credibility of the speaker. Dr. Myron L. Fox, a fake authoritative figure on mathematical human behavior (whatever that means) posed by a professional actor, gave a lecture on “Mathematical game theory as applied to physician education” to a group of highly trained psychologists, psychiatrists, and social-worker educators. Whereas the actor was coached by a real expert to conduct the lecture using double talks, deepities, neologisms, and contradictory statements, alongside irrelevant humor and incoherent references, the end-of-lecture survey contradictorily showed that most people in the audience found the lecture to be clear, well-organized, and stimulating (Naftulin et al., 1973). In his paper, Unintelligible management research and academic prestige, J. Scott Armstrong gave a compelling hypothesis: “An unintelligible communication from a legitimate source in the [audience]’s area of expertise will increase the [audience]’s rating of the [speaker]’s competence” (Armstrong, 1980). Because the audience, particularly those who perceive themselves as experts on a topic, tends to justify the time spent on attending a talk or reading a paper, and rationalize the necessity of having to learn something, even if the instantaneous response suggests otherwise, that instantaneous thinking may quickly be eluded. Stated more plainly, it is psychologically comfortable for an expert to think that “if I cannot understand the materials, it must be a high-level talk/paper,” because the alternative may seem rather demeaning to the expert’s competency. To test the hypothesis, Armstrong conducted an experiment with passages from 10 management science journals and evaluated their Gunning fog indexes—a readability test in linguistics. The results revealed a 0.7 correlation between the journal’s prestige and its fog index. If Armstrong’s hypothesis is generally valid for academia, a logical action would then be to write everything in a less intelligible manner and throw as many smoke grenades as possible, so to confuse the journal and impress the audience. Where would be a better place to launch an attack than launching it at the methodology section? Solar forecasters spend a great deal of time describing the forecasting methods. This is particularly motivated by the status quo where the scientific novelty of a solar forecasting work is often wrongfully associated with the rarity of a machine-learning algorithm, with the (superficial) complexity of the algorithm, with the number of weird looking symbols and operators appeared in the equations, and with the text length of the methodology section, making the paper all the more unintelligible. Worsening the situation is that once the unintelligible papers are published,

42

Solar Irradiance and Photovoltaic Power Forecasting

they gain legitimacy—this is also the case for the “novel” operator we discussed earlier. There ought to be a way to see through the smoke. The basic solution is rather unimaginative—just blur out the parts of the text that cause confusion and distraction (e.g., the unnecessary adverbs and adjectives), and assess the quality based on whatever remains. To elaborate this procedure, part of the abstract from Jiang and Dong (2016) is used, with the attractive, elaborative, and non-essential part of speech struck out: “[...]Eclat data mining algorithm is first presented to discover association rules between solar radiation and several meteorological factors laying a theoretical foundation for these correlative factors as input vectors. An effective and innovative intelligent optimization model based on nonlinear support vector machine and hard penalty function is proposed to forecast solar radiation by converting support vector machine into a regularization problem with ridge penalty, adding a hard penalty function to select the number of radial basis functions, and using glowworm swarm optimization algorithm to determine the optimal parameters of the model.[...]” In other words, the paper performed univariate solar forecasting using regularized regression, with inputs being meteorological variables. Surely, the contribution now seems bland, because we know that such a task can be done in countless other ways. What really matters is how the choice of each component of the forecasting model is made. For example, what is the difference between using a support vector machine and using a multilayer perceptron? Would replacing the ridge penalty with the lasso penalty improve (or deteriorate) the result? Would not another swarm-intelligence algorithm, e.g., bee colony, cuckoo search, or particle swarm optimization suffice the parameter estimation? Unfortunately, none of these questions was remotely answered in the original text of Jiang and Dong (2016); it simply made a big “Boom!” and lots of smoke, that’s all. More formally, to counter these smoke grenades, one must adhere unwaveringly to scientific rigor and transparency. It behooves researchers to stay vigilant to any incongruities or vulnerabilities in their work, to remain receptive to constructive feedback, and to collaborate with other professionals in their field. Equally vital is the upholding of lucidity and perspicuity in conveying research outcomes to facilitate full comprehension and validation by the scientific community and the public. In sum, a commitment to scientific integrity and transparent inquiry is essential for counteracting the pernicious influence of smoke grenades in the realm of research and discourse. • Write intelligibly; • Do not hesitate to dismiss those who can’t present their ideas clearly; • Learn to identify the skeleton of a research paper and to extract the core idea; • Novelty is not equivalent to the complexity of a forecasting algorithm, nor is it tied to the number of buzzwords;

Philosophical Thinking Tools

43

• Over-displaying equations and prolonged discussion on algorithms are likely to be just smoke grenades.

2.9

USING THE LAY AUDIENCE AS DECOYS

In most occasions, smoke grenades are used deliberately, but in others, they could be unintentional. To avoid these inadvertent events, we need a thinking tool to guide our way forward. Dennett’s proposal is using the lay audiences as decoys (Dennett, 2013). The proposal is straightforward: Imagine the audience is a group of first-year graduate students, who have basic interests and skills of following instructions and making analyses, but have not quite the ability to make inferences and generalizations. So the messenger would need to deliver the matter with considerable delicacy. To give perspective, think about the typical cookbooks that are available in your local bookstore. It is almost certain that the step-by-step guide in the cookbook could give you something for dinner so that you do not starve, but that probably would not be quite the same as the version being served in a good restaurant—otherwise, who would spend three to five times, or even more times, the money for the same food? The instruction, “poach the fish in a suitable wine until it is 80% cooked,” contains ambiguities on what counts as a suitable wine and how 80% cooked is defined. A pro chef would have an exact preference for the wine, whereas a home cook would need to make a guess on the meaning of “a suitable wine.” More generally, different interpretations of the same recipe can lead to dishes with distinct tastes. This exact issue with interpretation also arises when a forecasting algorithm, or more generally, a scientific idea, is described in research papers. Although by design, algorithms, like good recipes, should be substrate neutral (i.e., independent of operating system and software), have underlying mindlessness (i.e., easy to follow with no room for making suppositions), and guarantee results (i.e., is foolproof), all too many forecasters put little to no attention on the presentation. “Let’s focus on the science,” they say, which is perhaps the most ludicrous excuse one can come up with for his poor instructional skills. What seems crystal clear to the authors might not be as clear in reality—and reviewers are not always helpful, because very few would attempt to actually implement the algorithms. This harms the reproducibility of forecasting works. A vast majority of solar forecasting papers published today are non-reproducible; to support this claim, one simply needs to count how many head-to-head direct comparisons there are in the literature (i.e., comparisons using the same data and same forecasting setup)—virtually none—if people are not making direct comparisons, it must be that the benchmarks are overly time-consuming if not impossible to implement. Reproducibility in forecasting research by Boylan et al. (2015) is a piece dedicated to discussing the risks and limitations of non-reproducible forecasting research. Similar discussions have been conducted in a solar forecasting context (Yang, 2019a). With forecasting algorithms becoming increasingly complex, the only way to ensure total reproducibility is to release data and code. Open research in solar forecasting is far from being the standard. Indeed, at the time of writing, Solar Energy is possibly the journal that publishes most solar forecasting papers with data and code,

44

Solar Irradiance and Photovoltaic Power Forecasting

thanks to its subject editor’s obsession with open research. While we certainly hope the situation could change in time, it is essential to scrutinize the underlying cause of the reluctance to release data and code. There are three main reasons for non-reproducible forecasting papers: The first consists in researchers’ tendency to under-explain, particularly given the assumption that the audiences are also experts on the same subject; the second in the excuse of data propriety and upkeep of research edge; and the third in the fear of making mistakes. All three reasons are invalid, as will be argued in detail below. There seems to be an invisible threshold of competence that one must pass in order to be considered a qualified reader of scientific papers. The authors do not know why IEEE’s strict policy allows no more than eight/ten pages per paper upon submission, but this fact serves as evidence of our desire for conciseness when presenting science. Science is elegant and therefore must be delivered as such. Over-explaining things points at the direct opposite of elegance—“Do I have to spell it out for you?” In the extreme, one may even think that if a piece of research is easily understood, it is probably not novel—there should not be that many “low-hanging fruits” given the gigantic body of scientific literature, and everything we do now must be complex, because the “easy” ones have long been exhausted. To uphold the notion of specialization in some branch of learning, people tend to employ jargon, to omit steps that are perceived to be trivial, to complicate things unnecessarily, and to hide away anything that could possibly make the paper look “too easy”; all of which not only damages reproducibility but also harms the uptake of the authors’ works. Researchers often need to work on proprietary data, in particular, in the fields of medicine, biology, chemistry, and others that contain sensitive information and trade secrets. This reason does not hold for solar forecasting. As we shall see in a later chapter, the most valuable data for solar forecasting research, with few exceptions, are all in the public domain, such as the Baseline Surface Radiation Network (BSRN), the National Solar Radiation Database, the Modern-Era Retrospective Analysis for Research and Applications, version 2, or the High-Resolution Rapid Refresh. On that account, solar forecasters have more than sufficient publicly available data to showcase their forecasting skills, nor is there any major motivation for using proprietary data, which are of inferior quality as the open ones anyway (e.g., there are very few irradiance measurement stations that can come close to the standard of BSRN in terms of instrument calibration standard, period of recording, maintenance, and data availability). Eliminating the constraint on data propriety, code becomes the only remaining factor that affects reproducibility, and that solely depends on the willingness of individuals. In our experience, when forecasters are confident about their algorithms, they are willing to share their code for the common good. In contrast, the credibility of those who beat around the bush but not releasing the code, even upon repeated requests, is subject to questioning. Fabricated and falsified research, though believed to be a minor fraction, is not uncommon (Fanelli, 2009). Perhaps the most important driver for non-reproducibility in research is the fear of making mistakes. Scientists are a peculiar species: on one hand, they want to tackle the hardest challenges that no one has succeeded at solving; on the other hand, they feel deeply embarrassed when their propositions are being called out wrong. There

Philosophical Thinking Tools

45

are many pitfalls on the journey to truth. For forecasting in particular, as pointed out by Boylan et al. (2015), mistakes in the final thesis can originate from faulty data entry, arithmetic errors, data transcription errors, bugs in computer code, assumptions that are too strong, and misusing software, among others. While admitting to mistakes is burdensome, it is equally if not more difficult to indicate mistakes. The action of questioning the correctness of a published work puts a long list of people in doubt, ranging from the authors to the reviewers, to the editors, and to others who have cited the research. All too many of us, thus, prefer silence, which can only aggravate the situation. It is thought useful for scientists to cultivate the habit of making mistakes, not just any mistakes, but the good ones. As Dennett has argued, the main requirement for making good mistakes is not to hide them—especially not from yourself (Dennett, 2013). Making research reproducible, which gives others the opportunity to identify and make a case for your mistakes, is therefore the foremost condition for progress. Circling back to where we started, that is, the unintentional smoke grenades, ambiguous recipes or algorithms, and non-reproducible research in general, we find it useful to picture ourselves as instructors in a university. It is common to hear statements such as “for those who have not tried [· · · ],” “for those who do not have prior experience on [· · · ],” “allow me to emphasize [· · · ] one more time,” in lectures. If the targeted audiences are undergraduate students, or even laypersons, no expert would blame you for over-explaining things. For that, we find sentences such as “to ensure that our proposal is self-contained,” “for those who do not have a background in [· · · ],” or “for background information,” to be good starters for explaining the “trivial” parts of our scientific ideas in papers. As for promoting open forecasting research with data and code, there are some initiatives underway. The market, with the almighty invisible hand, will steer the attention of audiences to those works that are easily reproducible, which would benefit surely those who own the reproducible research—more citations, more collaborations, and more progress. • Always imagine the audiences of your research paper are graduate students if not laypeople; • Be generous about explaining small details; • Make your research reproducible by submitting data and code; • Ask authors for help when you are unclear; • Repeated reader queries to authors with no response suggests ignorance and possibly incompetence of the authors (unless they are dead, which is the case, sometimes); • Use publicly available data unless there is a pretty darn good reason for not doing so.

46

2.10

Solar Irradiance and Photovoltaic Power Forecasting

BRICKS AND LADDERS

In this last bit of the chapter, we touch briefly on the topic of creation and propagation of science. Science ought to be objective, but there might not be as much freedom in making objective assessments of scientific ideas as we wish there to be. No rational person would be bold enough to deny the integral progress in science that men have achieved, in particular, over the past 150 years. But how much exposure on science can we really get in a lifetime? The simple, undeniable truth that no one is able to read everything about science, that is, every paper published, every presentation made, every news reported, every report written, every disagreement documented, and every piece of scientific information made available on the internet, prompts us to question how much we really need to know to create objective science, and how that science we created today will affect the science in the future. In his essay Chaos in the brickyard published in the Science journal in 1963, Forscher (1963) used bricks as a metaphor for the islands of scientific contributions, pieces of scientific facts, and the outcome of scientific research. The edifice of science, constructed with these bricks, however, is susceptible to danger, because of the faulty bricks or bad assembly. Making the situation worse is the fact that we are in desperate need of bricks but do not have enough manpower, and master builders have no choice but to train more brick-makers, in great numbers, and in many batches. The master builders no doubt have succeeded, and bricks can now be produced efficiently in the factory for scientific knowledge, with well-tested routines, with a made-tostock mentality, and even with certainty. But for a while, everyone seems to have forgotten why we need those bricks—it is the edifice, not the bricks themselves, that counts. Forscher’s criticism is more relevant than ever in today’s scientific regime. As the land is now flooded with bricks, assembly becomes a real challenge. In fact, the bricks are available in bulk, so much so that anyone could randomly pick one up and make another one just like it, of course, subject to his own interpretation of the procedure for brick-making. However bad a brick may be, it would eventually end up on one of those storage platforms, which we call scientific journals, prepared to be used in the future. Nevertheless, one can never know whether these bricks will be used eventually, since the future brick-makers also have the impulse and compulsion to produce their own bricks. This is a real problem. And it deserves to be treated formally. Social scientists have spent a great deal of time investigating the cultural and cognitive reasons for people to do something while avoiding other things. Ironically, authors of scientific papers do not have the final say on what exactly the intended message is. Readers interpret and expand on the conclusions by filtering the statements through their own worldview. This is what social scientists call motivated reasoning (Kunda, 1990), in which people relate to various ideas based on prior ideological preferences, personal experiences, and the available capacity of thinking. We search for information and evidence about highly complex scientific problems in a way that will lead us back to our preexisting beliefs. It would not be exaggerating to

Philosophical Thinking Tools

47

regard the entire process as incestuous and circular. Processing information through cognitive filters is inevitable, bound to happen, and compulsory. The formation of cognitive filters depends on cultural identity. As birds of the same feather flock together, we tend to develop perspectives that resemble the environment which we are in and are consistent with the values of others within that environment with which we self-identify. Dan Kahan, a professor of psychology at Yale, refers to this as the cultural cognition—the influence of group values on perceptions and beliefs. This explains, at least in part, the disconnect among various scientific fields. Scientists belonging to the same domain are likely to place heavier weights on the works produced within the domain than those produced outside. It might be even possible for scientists to be isolated completely, and often unknowingly, within the domain. After 10 years of international collaboration carried out by experts from the International Energy Agency’s (IEA’s) Solar Heating and Cooling, Task 46: Solar Resource Assessment and Forecasting (now called IEA Photovoltaic Power Systems Programme Task 16: Solar Resource for High Penetration and Large Scale Applications), the Best practices handbook for the collection and use of solar resource data for solar energy applications (Sengupta et al., 2017) was finally published by the National Renewable Energy Laboratory, one of the most authoritative research institutes on renewable energy, with chapters contributed by a long list of renowned scientists from research institutes worldwide. The handbook is said to be a summary of the fundamentals of solar resources and the state-of-the-art practices for the solar industry. There was, with little surprise, a chapter on solar forecasting in the handbook. What is surprising is that the concept of probabilistic forecasting was absent from it. As of 2017, the year during which the handbook was written, probabilistic forecasting had been known for a long time and had been widely accepted, as evidenced by the large number of review articles published elsewhere. On this point, the handbook’s polarized view on forecasting would be likely to be caused by no other reason than cultural cognition, which impaired the authors’ ability to make good science. Had anyone followed the handbook solely, it is unlikely for the follower to develop any capability to make probabilistic forecasts, which is the only lighthouse to guide forecasters in the ocean of uncertainties. In other words, in the world of the followers of the handbook, a forecast would be absolutely deterministic. Escaping from motivated reasoning is no trivial task. It would be like telling someone to jump out of the box, which is rhetoric by nature—if we knew how to do that, we would not need to be told. Had the authors of the handbook known the gravity of probabilistic forecasting, as they do now (see the new version of the handbook, Sengupta et al., 2021), they would not have left that out. Indeed, to escape from the inescapable, we turn to philosophy and study Bertrand Russell, one of (if not) the greatest philosophers of the 20th century. His book, The Scientific Outlook (Russell, 2017), has plenty to offer, and seems to be capable of helping unravel our bewilderment. Contrasting the bottom-up view on science of Forscher’s, in which bricks need to be made before the edifice is assembled, Russell argues a top-down view, in that,

48

Solar Irradiance and Photovoltaic Power Forecasting

science is acquired from nature, and the general body of scientific knowledge contains all facts and all hypotheses men have ever made. The significance of a particular fact or a particular part of the general body of scientific knowledge must, then, be evaluated relative to the body itself. Science, though it often starts from making observations about the specifics, is concerned with generalization. Galileo’s law of falling bodies and Copernicus’ law of heliocentric planetary motions are specifics, specific to the set of observations they made, and to the part of nature their laws can describe; their laws fail when new observations deviate from those specifics. Newton took those observations and generalized them to his famous three laws of classical mechanics, and therefore, his laws are considered to be, and in fact are, more significant. When Newton’s laws fail, Einstein’s law of general relativity comes in and takes control, and thus is more significant than Newton’s. Perhaps, one day, when some law unites general relativity and the laws of quantum mechanics, that law would be the most significant. The various laws we just described, in Russell’s view, form a hierarchy, in which the laws at the higher level supersede those at the lower level. To move about the hierarchy, one can use ladders. Traveling up the ladder proceeds with induction, and down proceeds with deduction. Suppose A, B, C, and D, four laws describing the specifics, suggest the possible existence of a general law—this is the procedure of induction—if the general law is true, then all four specifics must be true—this is the procedure of deduction. There would be many such groups of laws which allow reasoning from the specifics to the general. There would also be another higher-order general law that supersedes the law which supersedes A, B, C, and D. Then, from the highest-order law, one can deductively move down the hierarchy, with absolute certainty, until reaching the level of specifics that is required. Russell suggested that in textbooks, the deductive order should be adopted, but in laboratory the inductive order. Russell’s theory goes beyond what we have just summarized, but it seems that we already have what we need. Forscher’s edifice and Russell’s hierarchy reconcile if we, as forecasters, are concerned not merely with making bricks, but also with making ladders. Solar forecasters have proven to be excellent brick-makers. In just a few years, the literature on solar forecasting has flourished like never before. What we ought to do next is to start making ladders and induct general laws from our specifics, for instance, by sending our works to journals of other domains: International Journal of Forecasting (all areas of forecasting), Wind Engineering (wind forecasting), Energy Economics (price forecasting), IEEE Transactions on Power Systems (load forecasting), just to name a few, and to test whether our A, B, C, and D could work for general-purpose forecasting. If they work, it suggests as probable a ladder that can send us to the next level. At this stage, one may still worry about the distance between us and the top of the edifice of science—after all, there have not been many Newtons and Einsteins throughout history. That is a legitimate concern. We, the authors of this book, have been in academia long enough to realize that we are not master builders of bricks and ladders, but simply regular builders, and there are many like us. However, we are all part of the team. About bricks and ladders of forecasting, we have built some,

Philosophical Thinking Tools

49

and are going to build more. Along the way, we also need to discard some obviously bad bricks (e.g., using the thinking tools prescribed in this chapter), to learn from the mistakes, and to discover methods to make better ones. But you, dear reader, the very fact that you are reading this book qualifies you as a builder. You have the potential to become not just a master builder, but a chief builder, and one day, perhaps, can build a ladder, so tall, that can induct us to a level of the edifice that no forecaster has ever reached before. • Be receptive not just to within-domain knowledge, but also to knowledge accumulated in related domains; • Generalization is the key, a good forecasting idea is likely to be useful elsewhere; • Deduction from a general forecasting concept to solar forecasting specifics is nevertheless valuable; • Have faith, keep publishing, on the condition of good ethics and morality.

and 3 Deterministic Probabilistic Forecasts “A common desire of all humankind is to make predictions for an uncertain future. Clearly then, forecasts should be probabilistic.” — Tilmann Gneiting

It is true: No matter how good the forecasts are, owing to the uncertainties involved during the forecasting process, they are always wrong. This belief seems, at first sight, susceptible to questioning. For instance, through some calculations, one can “forecast” the sunrise time for tomorrow to the exact minute. Similarly, if someone “forecasts” the outcome of a coin toss to be heads, that someone may turn out to be exactly right, with a 50% chance. Both events provide evidence against the earlier belief—surely, forecasts are not always wrong. This contradiction cannot be resolved, and is meaningless to be argued further if one does not first define what is a forecastable quantity. The sunrise time is certain in terms of geometry, and the uncertainty in its calculation is small enough to be neglected for practical purposes; on the other hand, the correct guess of heads was a mere coincidence, and the state of being right is not repeatable, since the outcome of a coin toss is random. From a forecaster’s viewpoint, the predictions made on quantities in both examples are not technically forecasts, for the reason that the former has near total predictability and the latter has absolutely no predictability. Rather, it is those events and quantities that lie between these two extremes that forecasters are concerned with. Conditioning on that, the opening sentence simply meant that forecasts themselves are uncertain. Knowing a forecast is always wrong partly explains why forecasting is often frowned upon by the public. Historically, forecasting has also been banned for the same reason—the Met Office had to stop releasing forecasts to the public due to their inaccuracy. In spite of that, a wrong forecast does not necessarily mean that it is not useful. The trick to forecasting is to understand how wrong the forecasts can be and what the probabilities are for different future states. In this regard, the best forecasts, to statisticians, are necessarily probabilistic, which implies that forecasters not only need to issue the best guess of the forecast quantity at a future time but must also describe the uncertainty associated with that guess. However, because not everyone is a statistician, issuing and using “non-probabilistic” forecasts are still the norm in many scientific domains and industries. In fact, in solar forecasting research, there are far more works on “non-probabilistic” forecasting than on probabilistic forecasting, as is also the case for power systems, where forecast submission guidelines rarely demand probabilistic forecasts. By “non-probabilistic” we mean deterministic. Most statisticians would object at this very moment, because the word “deterministic” gives a false sense of DOI: 10.1201/9781003203971-3

50

Deterministic and Probabilistic Forecasts

51

certainty—the phrase “point forecast” is strictly preferred by statisticians. Indeed, a deterministic forecast may be referred to in a variety of ways including a point forecast, a single-valued forecast, a best-guess forecast, or in a more technical way, a conditional mean (or conditional median) forecast. What these names are essentially describing is that the “non-probabilistic” forecast is just a single number, making the most adequate summary of the quantity of interest at a future time. That said, one must also be aware that statisticians are not the only people who do forecasting. Physicists, meteorologists, economists, and engineers, among many others, are all confronted by the need to predict the future. For historical reasons, many solar forecasters, it seems, have accepted “deterministic forecasting” as the standard phrase. In this chapter, we first explain why point forecasts are called deterministic forecasts in many domains. We then move on from there to discuss what constitutes a probabilistic forecast. The most dominant form of probabilistic forecast—ensemble forecast—is subsequently introduced, alongside the general strategies for obtaining it.

3.1

ON DETERMINISTIC NATURE OF THE PHYSICAL WORLD

The word “deterministic” is used to describe processes that are fixed, certain, and non-random, or variables whose past completely determines their future. The best way to understand the deterministic nature of things is by considering the classical laws of physics. Classical physics includes Newton’s equations of motion, the Maxwell–Faraday theory of electromagnetic field, and Einstein’s general theory of relativity. Leonard Susskind, a theoretical physicist at Stanford University, noted that the job of classical mechanics is to predict the future (Susskind and Hrabovsky, 2014). More specifically, if both the current state of a system and the physical laws governing the system are known, the future can be deterministically predicted. This principle was famously described by the 18th -century physicist, Pierre-Simon Laplace: “We may regard the present state of the universe as the effect of its past and the cause of its future. An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect were also vast enough to submit these data to analysis, it would embrace in a single formula the movements of the greatest bodies of the universe and those of the tiniest atom; for such an intellect nothing would be uncertain and the future just like the past would be present before its eyes.” Pierre-Simon Laplace A Philosophical Essay on Probabilities A system that changes with time is known as a dynamical system. Not only does a dynamical system consist of a state space, but it is also described by dynamical laws of motion. Simply put, a dynamical law tells exactly what the next state would

52

Solar Irradiance and Photovoltaic Power Forecasting

be given the current state. A simple illustration is shown in Fig. 3.1 (a), where the arrows represent the dynamical laws governing the system. In the three-state system, if the current state is known to be 1, then the next state must be 2; if the current state is 2, then the next state must be 3; the sequence 1, 2, 3, 1, 2, 3, . . . , would repeat forever with complete certainty. Nothing is unpredictable. In Fig. 3.1 (b), however, something is unusual. Suppose the current state is 2, it is impossible to know whether the previous state is 1 or 3. This type of system is said to be irreversible. Similarly, in Fig. 3.1 (c), state 2 could either evolve into state 1 or 3. And this type of system is indeterministic into the future. Both cases (b) and (c) are not allowed in classical mechanics. This deterministic and reversible nature of dynamical systems is central to Laplace’s worldview. 2

2 3

1 (a)

2

3

1 (b)

3

1 (c)

Figure 3.1 (a) A deterministic three-state system. (b) A system that is irreversible. (c) A system that is indeterministic into the future. As grand as the theory may sound, Laplace might have underestimated how predictable the world actually is. Suppose all dynamical laws governing the world are known, and everything’s position and motion at one moment are also known— this refers to Laplace’s “vast intellect who submits data to analysis” or Laplace’s Demon—there is still one practical issue, which is precision. Recall that one of the two prerequisites of making deterministic predictions is the perfect knowledge of the current state of the system, i.e., the initial conditions based on which predictions are made. It is, then, obvious that one could never quite possibly predict the future states of the world in its entirety. Unlike the system shown in Fig. 3.1 (a), the dynamical system describing the real world can be thought of as having an infinite number of states. Take the weather system for instance, any physical quantity describing the initial condition of the system, e.g., temperature, humidity, wind speed, or solar irradiance, is continuously infinite. Stated differently, the states of an atmospheric variable are represented by real numbers, which are infinitely dense. The degree to which two neighboring states can be distinguished is known as the resolving power of an experiment or an observation. In principle, no experiment or observation is able to determine the initial state to an infinite precision. The tiniest differences in initial conditions can lead to vast differences in outcomes. If a dynamical system exhibits such a characteristic, such as the case of weather, the system is said to be chaotic. A chaotic system suggests that however strong the resolving power may be, the predictability of the system would be limited to a time horizon. This implies that perfect weather prediction is not

Deterministic and Probabilistic Forecasts

53

achievable, because the resolving power is always limited (Susskind and Hrabovsky, 2014). A common misconception is that a chaotic system is random, because the word “random” is often tied to things that are disorganized, haphazard, or lack of patterns, in our daily life. But that is furthest away from being true—a chaotic system really is deterministic. To demonstrate this, Fig. 3.2 shows three time series, which all display somewhat erratic-looking temporal transients. However, these time series are simply the outcome of the same deterministic system, xt+1 = 1.91 − xt2 (notice that there is no random component in this expression), but with three different initial conditions, x0 = 0.99999, 1, and 1.00001. It can be seen that before t = 15, the three time series are almost indistinguishable, i.e., given a good-enough initial condition, the system is predictable up to t = 15. Beyond that, at the current resolving power, which is 0.00001, there is no predictability. 2 1 0 −1 0

10 Initial value, x0

20 0.99999

1

30 1.00001

Figure 3.2 Time series from a chaotic system, xt+1 = 1.91 − xt2 , with three different initial

conditions, x0 = 0.99999, 1, and 1.00001.

At this stage, a simple conclusion can be made through deductive reasoning. • All chaotic systems are deterministic. (First premise) • Weather is a chaotic system. (Second premise) • Therefore weather is deterministic. (Conclusion) This is perhaps why weather forecasters, at least in the early days, tend to think of weather predictions as deterministic. However, calling the predictions deterministic does not mean that they are perfectly accurate. To weather forecasters, the forecast error could be attributed to two reasons: (1) the incomplete knowledge about the initial conditions, and (2) the incomplete knowledge about the governing dynamical laws. In clear contrast to weather forecasters, statisticians focus on uncertainty. They describe the possible outcomes of an experiment through a sample space, Ω, and call points ω in Ω sample outcomes. With that, a random variable can be defined as the mapping X : Ω → R, which assigns a real number X(ω) to each outcome ω. It is, then, the probability laws that govern X that are of interest. For instance, if X

54

Solar Irradiance and Photovoltaic Power Forecasting

is the number of calls arriving at a call center in a fixed period of time, it is likely to be governed by a probability law as described by Poisson distribution. If X is the aerosol optical depth over a geographical area, it is likely to have a probability law as described by a lognormal distribution. If X is the solar irradiance at some location at a future time, well, we are not quite sure about what probability law it follows. But that is precisely what probabilistic forecasting tries to find out.

3.2

UNCERTAINTY AND DISTRIBUTIONAL FORECASTS

Any forecast that is not a deterministic forecast is a probabilistic forecast. Probabilistic forecasts can be issued for binary events, in which the random variables can only take two values, and can be generally represented by “yes” or “no.” Probabilistic precipitation forecasts in percentages of probability can be dated at least to 1920 (Hallenbeck, 1920). In those forecasts, the probability of rain was issued on a scale of 100, in that, a forecast of 100% represents absolute certainty of rain, and a forecast of 0%, absolute certainty of fair weather. Whatever in between 100 and 0 lies the beauty and value of probabilistic precipitation forecasts. Forecasts such as “precipitation occurs within 36 hours in eastern New Mexico with a chance of 75%” are appreciated, for example, in irrigation, where farmers can choose to suspend or postpone irrigation after learning the forecasts. The distinctive benefit of probabilistic precipitation forecasts over a deterministic “rain” or “no rain” statement is that the former allows a more refined judgment during decision-making. The decision that the farmers make after seeing a 75% rain forecast may be rather different if the forecast says a 5% chance of rain. Similar to the case of binary events, probabilistic forecasts for categorical events (e.g., marginal risk, slight risk, enhanced risk, moderate risk, or high risk of storm) are also represented in terms of percentages of probability, with percentages summing up to 100. In such scenarios, a categorical forecast can influence its user in an even more complex manner than a binary forecast would have. That said, in solar forecasting and countless other domains, the random variables of interest are realvalued. Therefore, instead of representing forecasts as percentages of probability, probabilistic forecasts for real-valued random variables are in the form of predictive distributions. The concept of distribution can be found ubiquitously in statistics so extensively that it may easily raise confusion among non-statisticians. In our experience, sentences such as “the parameters of a distribution of a parameter themselves are distributions” often baffle solar engineers.1 In this regard, it is necessary to introduce some basic definitions, as well as their mathematical representations, before we proceed further. 3.2.1

PRELIMINARY ON PROBABILITY DISTRIBUTIONS

As mentioned earlier, a random variable is a mapping which assigns a real number to each outcome of an experiment. When there are continuously infinite outcomes, the random variable is a continuous random variable. (A more formal definition is given 1 This

sentence is not gibberish at all, see Section 3.2.2.

Deterministic and Probabilistic Forecasts

55

shortly after.) Most commonly, a random variable is represented using an upper-case letter, whereas a particular value that the random variable can take is represented using a lower-case letter. Given a random variable X, the cumulative distribution function (CDF) is defined as a function which maps the real line to the [0,1] interval, i.e., FX : R → [0, 1], where (3.1) FX (x) = P(X ≤ x), with P denotes probability, and FX (x) denotes the value of the CDF of X at some specific x. In other words, CDF is a measure of how likely the random variable is less than or equal to some arbitrary value x. It is noted that CDF is a function, which, like any other mathematical function, requires an argument. Here, the argument x is known as the threshold. To give perspective, let us assume for now that solar irradiance reaching the earth’s surface can vary between 0 and the solar constant, which takes the value of 1361.1 W/m2 according to the latest research (Gueymard, 2018). That is our random variable X. For an arbitrary x ≤ 0, say, x = −100 W/m2 , what would be the probability that X is smaller than that? Because irradiance can never be negative, FX (−100 W/m2 ) = P(X ≤ −100 W/m2 ) = 0. Similarly, for some x ≥ 1361.1 W/m2 , say, x = 1500 W/m2 , what would be the probability that X is smaller than that? Because irradiance can never exceed 1361.1 W/m2 , FX (1500 W/m2 ) = P(X ≤ 1500 W/m2 ) = 1.2 If we then proceed to find the value of FX (x) for all −∞ ≤ x ≤ ∞, we can then plot FX (x) against x, which would look something like Fig. 3.3. 1.00

F X (x )

0.75 0.50 0.25 0.00 0.0

500.0

x

1000.0

1361.1

Figure 3.3 The cumulative distribution function of 1-min global horizontal irradiance at Desert Rock, Nevada, in 2019.

An immediately adjacent concept to CDF is the probability density function (PDF), which is usually denoted using fX (x). It is related to CDF through: FX (x) =

x

−∞

fX (t)dt,

(3.2)

and fX (x) = FX (x) at all points x at which FX is differentiable, where the prime in FX denotes the derivative. It is worth noting that in the integral of the above equation, 2 In short time intervals, such as few minutes or less, the surface-level irradiance can exceed the solar constant due to cloud-enhancement events. These events diminish over longer intervals, such as 5 min. But for the sake of discussion, we disregard such complications temporarily.

56

Solar Irradiance and Photovoltaic Power Forecasting

the argument of fX (x) is replaced by t to avoid conflict with the x in the integral’s upper limit—this is the type of explanation statisticians often omit, which often leaves non-statisticians bewildered and perplexed; this book tries to clarify these “trivial” issues as much as possible. From here, it is clear that a CDF is the integral of the corresponding PDF from negative infinity to x, or equivalently, a PDF is the derivative of the corresponding CDF. With that, it can be formally defined that a random variable X is continuous if ∞ there exists a function fX such that fX (x) ≥ 0 for all x, −∞ fX (x)dx = 1, and for every a ≤ b, P(a ≤ X ≤ b) =

b a

fX (x)dx.

(3.3)

What the above math really says is just that a PDF is non-negative; the area under a PDF integrates to 1; and the definite integral of a PDF over the range [a, b] gives the probability of X residing in the range [a, b]. Two facts should nevertheless be highlighted. First, a continuous random variable has P(X = x) = 0 for all x; the probabilities are obtained through integration, not by reading the values of PDF. Second, the values of PDF can exceed 1, which is different from the case of probabilities associated with a discrete random variable, such as 70% “rain” and 30% “no rain,” in which the probabilities can never exceed 1. That is pretty much all we need to know in order to proceed to understand predictive distributions. 3.2.2

WHAT IS A PREDICTIVE DISTRIBUTION?

Although the answer to this question may seem trivial to statisticians, we take this opportunity to clarify it for those who are not familiar. (Using lay audiences as decoys, see Chapter 2.) Because statistics is the science of uncertainty, virtually every quantity in the domain has a distribution—if a parameter of a distribution, such as the mean, needs to be estimated, it is uncertain, and hence should be described with a distribution. It is thus useful to think about two questions whenever the subject to which a distribution is associated is clouded: What exactly is the quantity that is thought to be random, and what is the corresponding experiment of which the outcome is mapped by the random variable? For instance, in Fig. 3.3 earlier, the random variable, X, is the 1-min global horizontal irradiance (GHI) at Desert Rock, Nevada, in 2019. We call X a variable because the measurement records over the period of 2019 take different values. We think of X as random because even if the past records are known exactly, the next measurement record still cannot be inferred with absolute certainty. In other words, the experiment, in this case, is “making 1-min radiometry measurement on GHI at Desert Rock, Nevada, at some point in 2019.” Throughout the year, if the equipment was perfectly reliable, the experiment would be repeated 744,600 times, and we would end up with exactly 365×24×60 = 744,600 outcomes. And each outcome is a realization of X. To that effect, the CDF shown in Fig. 3.3 is the probability law governing how X would have materialized if the experiment were to be repeated yet another time. In a similar manner, we consider another random variable, Y , representing the deterministic forecast of 1-min GHI at Desert Rock, Nevada, in 2019. We call Y a

Deterministic and Probabilistic Forecasts

57

variable because the forecasts made over the course of 2019 take different values. We think of Y as random because the sequence of forecasts cannot be known with absolute certainty.3 The experiment in this case is “generating 1-min GHI forecasts at Desert Rock, Nevada, for some time instance in 2019.” Throughout the year, if the forecasting model has been trained and the input to the model is always available, we would obtain, again, exactly 744,600 outcomes. Subsequently, we can obtain the CDF of Y in the same fashion as we obtained the CDF of X. The CDF of Y is, then, the probability law governing how Y would have materialized, if the experiment were to be repeated yet another time. Under both circumstances, the time order of the outcomes from the experiment does not seem to matter, because each x or each y can be considered as a random sample drawn from the CDF of X or Y , namely, FX (x) or FY (y). This idea is central to the Murphy–Winkler distribution-oriented forecast verification framework, in which only the forecast–observation pairs matter, but not the time order of those pairs (Murphy and Winkler, 1987). Additionally, the framework is only applicable to deterministic forecasts. Indeed, the notion of distribution is not unique to probabilistic forecasting, it could be used to study deterministic forecasting. That is why thinking about what it is exactly the random variable and experiment describe is useful in clearing up ambiguities related to the word “distribution” when being used in a forecasting context. Such an exercise is tedious, but when employed, one would not be led astray. Now, in probabilistic forecasting, the random variable of interest is the forecast for a specific time, location, and/or subject, depending on the problem at hand. One example experiment one can think about is “generating GHI forecast for Desert Rock, Nevada, for 12:00 on June 3, 2019.” In order to conduct such an experiment, a forecaster needs to gather data, build a model, and estimate the model parameters. Each of the three stages introduces some uncertainty. More specifically, forecasts generated using different initial conditions are likely to be different; forecasts generated using different physics schemes (or statistical and machine-learning models) are likely to be different; and forecasts generated using the same model but with different parameters (or weights) are also likely to be different. Nonetheless, all these forecasts are meant for the same quantity, that is, the GHI value for Desert Rock, Nevada, at 12:00 on June 3, 2019. It follows that the GHI forecast for that quantity should be treated as a random variable. And whatever distribution characterizes that random variable is its predictive distribution. Predictive distribution refers to the CDF of a probabilistic forecast, whereas its PDF is called predictive density—the difference is trivial, so people generally do not make a strict distinction between CDF and PDF in describing predictive distribution. It should also be clear now that each predictive distribution only characterizes a specific forecast, which implies that the predictive distributions for two adjacent forecasts can look quite distinct, as seen in Fig. 3.4, showing the predictive densities of hourly forecasts at Desert Rock, Nevada, on June 3, 2019. In fact, forecasters 3 There are very specific cases where forecasts can be fixed, such as a climatology forecast, which issues the same forecast regardless of the situation; therefore, we are referring to the more general cases here.

58

Solar Irradiance and Photovoltaic Power Forecasting

Irradiance [W/m2]

900

600

300

0 00:00

06:00

12:00 Time of day

18:00

Figure 3.4 Predictive densities of some hypothetical hourly forecasts at Desert Rock, Nevada, on June 3, 2019. Dots indicate the corresponding observations.

rarely study the predictive distribution of a single forecast, because there would (usually) be only one observation available to verify that forecast, nor is there much that can be done. Instead, it is the collection of forecasts, indexed by i = 1, 2, . . . , n, that is of interest. These indexes may correspond to instances over a time window, locations over a geographical area, or both. 3.2.3

IDEAL PREDICTIVE DISTRIBUTIONS

Recall that the CDF for a random variable X is written as FX (x); in practice, due to the need for indexing, it is often easier to simply write F instead. For instance, the CDF for the ith forecast can be denoted using Fi . The next relevant question is: How to gauge the goodness of Fi ? The answer to this question is not straightforward in the sense that the forecast is a distribution, while the observations (also known as the verifications,4 or verifying observation, in a forecasting context) are real-valued. In this regard, Gneiting and co-workers laid down the bedrock of probabilistic forecast verification in their seminal paper Probabilistic forecasts, calibration and sharpness (Gneiting et al., 2007), which is widely recognized as one of the most stellar achievements in the history of probabilistic forecasting. Gneiting imagines an omniscient intellect called Nature. For each i = 1, 2, . . . , n, Nature chooses a distribution Gi , which can be thought of as the true probability law that generates the data. The observation a forecaster makes for index i, denoted as xi , is but one random sample drawn from the distribution Gi . Given Nature knows everything, the information accessible to the forecaster is at most that of Nature. It 4 It is indeed confusing that the word “verification” can mean both the observation itself and the action of evaluating forecasts.

Deterministic and Probabilistic Forecasts

59

follows that if Fi = Gi ,

for all i,

(3.4)

the forecaster is ideal. The forecasts generated by the ideal forecaster are therefore also ideal. It is anchored on this very framework that the overarching strategy of making probabilistic forecasts is proposed: maximizing the sharpness of the predictive distributions subject to calibration (Gneiting et al., 2007). Stated differently, the two goals, namely, making ideal probabilistic forecasts and maximization of sharpness subject to calibration, are equivalent. Of course, the meanings of sharpness and calibration need to be clarified in a later chapter of this book. For now, one could think of calibration as the statistical consistency between the predictive distributions and the observations, and sharpness as the concentration of the predictive distributions. A practical issue with Gneiting’s theoretical framework is that the true distribution Gi remains hypothetical and unknown. Forecasters are therefore tasked with issuing Fi based on the particular forecasting situation and their best judgment. Subsequently, various scoring rules and diagnostic tools are employed during verification, to check whether or not the issued Fi are satisfactory. We are to circle back to these scoring rules and diagnostic tools in Chapter 10, but the next part first takes a look at what kind of Fi can be issued. 3.2.4

TYPES OF PREDICTIVE DISTRIBUTIONS

Being probability distributions, predictive distributions could be parametric, semiparametric, nonparametric, or empirical. The choice invoked solely depends on the forecaster’s belief. Philosophically, as we have seen in the last chapter, disagreements in the beliefs of different forecasters are inevitable. Moreover, for different forecasting situations, even the belief of the same forecaster may vary. On this account, nor is there a thing called “best choice.” However, when everything else is fixed, a logical action to settle disagreements is by comparing the predictive performance of forecasts issued based on different types of predictive distributions. Again, that is within the scope of forecast verification, which is to be dealt with later in this book. 3.2.4.1

Parametric Predictive Distributions

It does not take a statistician to understand what a parametric distribution is.5 A parametric distribution is a distribution that can be characterized by a fixed set of parameters. For instance, the Gaussian (or normal) distribution is parameterized by μ and σ , which are the mean and standard deviation of the distribution, respectively. Mathematically, one has:

1 1 f (x) = √ exp − 2 (x − μ)2 , (3.5) 2σ σ 2π 5 The concept of Gaussian distribution is understood by anyone with a high-school degree. But things can be complicated and convoluted if the distribution is not Gaussian, which would be the case for most quantities in solar forecasting.

60

Solar Irradiance and Photovoltaic Power Forecasting

which is the PDF of Gaussian random variables. Similarly, a logistic distribution is parameterized by μ, the mean, and σ , the scale parameter (not to be confused √ with the standard deviation; the standard deviation of a logistic distribution is μσ / 3). It has a PDF of:   x−μ exp − σ f (x) =  (3.6)   . x−μ 2 σ 1 + exp − σ Both the Gaussian and logistic distributions belong to a so-called “location–scale family of distributions.” In fact, almost all parametric distributions with range (−∞, ∞) are location–scale family of distributions, with a notable exception of the exponential Gaussian distribution (Rigby et al., 2019). It should be highlighted that despite the name, the location–scale family of distributions can take up to four parameters to describe. It is customary in statistics to collect all parameters into a vector, i.e., θ = (μ, σ , ν, τ) , where the four parameters denote location, scale, skewness, and kurtosis, respectively, and the symbol denotes the transpose of a matrix/vector. The latter two parameters control the shape of the distribution. A useful property of the location–scale family of distributions is that a random variable under one of these distributions can be converted to an “unscaled and unshifted” version by subtracting μ and dividing σ . To give perspective, consider a random variable that has a normal distribution, which is denoted as X ∼ N (μ, σ ) mathematically, one can write: Y = (X − μ)/σ ∼ N (0, 1), where N (0, 1) denotes the standard normal distribution with mean 0 and variance 1. More generally, for any random variable with a distribution D belonging to the location–scale family, X ∼ D(μ, σ , ν, τ), it is true that Y = (X − μ)/σ ∼ D(0, 1, ν, τ). The advantage of this property would become eminent during parametric post-processing of forecasts (see Chapter 8). One issue with the location–scale family of distributions is that their ranges are (−∞, ∞), which contradicts the physical nature of some quantities, e.g., solar irradiance and wind speed are left-bounded at 0. One may argue to concentrate the distributions on the positive half of the real line as much as possible, but however small the portion on the negative half is, it still integrates to some probability, which does not correspond to any outcome of the experiment. The alternative is to employ truncated predictive distributions with a cutoff at 0, such as a truncated normal distribution, N + (μ, σ ) (again, σ in a truncated normal is the scale parameter, which should not be confused with standard deviation). The downside of using such truncated predictive distributions is that many statistics of theirs—such as mean, standard deviation, or continuous ranked probability score—are tedious to derive. Furthermore, some empirical evidence based on solar applications suggests that the hustle involved in using a truncated predictive distribution might not always be justified, since the contribution from a truncated predictive distribution to the model’s predictive power could be small (Yang, 2020d).

Deterministic and Probabilistic Forecasts

3.2.4.2

61

Semiparametric Predictive Distributions

Parametric distributions are often found insufficient in describing random variables that contain some “hidden” categorical features. A classic example is the distribution of human height, which, owing to the mixing of two separate groups, males and females, has two peaks instead of one. Distributions with two peaks are called bimodal distribution. The distribution of solar irradiance is often found to be bimodal, originating from the cloudy and clear states of the sky. In these scenarios, it is often beneficial to consider a so-called “mixture distribution” during forecasting, which can be considered as a semiparametric predictive distribution. A mixture distribution is the weighted sum of two or more parametric distributions. More specifically, the CDF (and PDF if it exists) of a random variable with a mixture distribution can be written as a convex combination of the component CDFs (and PDFs). For instance, given two Gaussian PDFs with parameters θ 1 = (μ1 , σ1 )

and θ 2 = (μ2 , σ2 ) , namely, f (x; μ1 , σ1 ) and f (x; μ2 , σ2 ), their mixture distribution has the PDF: g(x; θ 1 , θ 2 , w) = w f (x; μ1 , σ1 ) + (1 − w) f (x; μ2 , σ2 ).

(3.7)

In other scenarios, more complex (and sometimes cute) patterns can be expected, such as the one shown in Fig. 3.5, which is referred to as “the claw” or the Bart Simpson distribution (Wasserman, 2006). Its PDF is the weighted sum of PDFs of six Gaussian distributions:   1 1 4 1 j g(x) = f (x; 0, 1) + − 1, f x; . (3.8) ∑ 2 10 j=0 2 10

g (x )

0.6 0.4 0.2 0.0 −2

0 x

2

Figure 3.5 “The claw” or the Bart Simpson distribution. Solar engineers can be charged guilty of their obsession with the Gaussian distribution. Mixture distribution with Gaussian components is the most common choice for modeling the clear-sky index6 distribution. Depending on the time scale, i.e., 6 A clear-sky index is the ratio between irradiance (or PV power) and its clear-sky expectation. A clear-sky expectation is the irradiance (or PV power) under a cloud-free atmosphere. The emphasis is on “cloud-free,” which is not the same as a “no atmosphere” case. This is more discussed in the next chapter.

62

Solar Irradiance and Photovoltaic Power Forecasting

whether the data is hourly or minute resolution, the literature has proclaimed a twoor three-component Gaussian mixture distribution to be reasonably adequate (Yang et al., 2017a). Nonetheless, this does not rule out the fact that there are other more appropriate choices out there. What seems to be lacking is the scrutiny of the tail behavior and higher-order moments, leaving a bounty of opportunities for future studies. Similar to the case of parametric predictive distributions, when mixture distributions are used as predictive distributions, the choice also depends highly on the forecaster’s belief. Generally speaking, if the forecasting model is conditioned sufficiently (e.g., we know confidently whether a forecast is issued for a clear or cloudy condition), there is little reason to utilize a mixture distribution as a predictive distribution. However, since identifying the sky condition is an exceptionally challenging task, mixture predictive distribution might have an edge, as is the case with other weather variables such as wind speed or precipitation accumulation (Baran and Lerch, 2018). 3.2.4.3

Nonparametric Predictive Distributions

When the reason to give grounds for having modality in predictive distribution is insufficient, one can opt for a nonparametric predictive distribution. A nonparametric distribution is one whose complexity, that is the number of parameters, grows with the size of the data. Different from the cases of parametric and mixture predictive distributions, nonparametric predictive distributions do not assume any shape that is based on prior beliefs. It determines the shape of the predictive distribution, by preference, from (assumed) independent and identically distributed (iid) samples. Suppose there are m anchor points, x1 , . . . , xm , which are iid samples drawn from an unknown predictive distribution F, then, the goal of nonparametric density estimation is to estimate the PDF, f , with as few assumptions about f as possible. It is pointless to discuss nonparametric density estimation without introducing the concept of kernel. To non-statisticians, the word “kernel” may sound strangely familiar. This is because it carries several distinct meanings in different branches of statistics, and the one we are interested in concerns nonparametric statistics. A kernel is a function, denoted by K(x), such that (Wasserman, 2006): K(x) ≥ 0,



K(x)dx = 1,



xK(x)dx = 0, and



x2 K(x)dx > 0.

(3.9)

The four parts of Eq. (3.9) correspond to: (1) the kernel must be non-negative; (2) the kernel integrates to 1; (3) the kernel is centered at 0; and (4) the kernel has a spread that is positive. To give perspective, Fig. 3.6 shows four example kernels, namely, the boxcar, the Gaussian, the Epanechnikov, and the tricube; their formulas are saved for brevity. With a particular choice of kernel K, and a positive number h known as the bandwidth, the kernel density estimator of f is given by:   x − xi 1 m 1 fm (x) = ∑ K . (3.10) m i=1 h h

Deterministic and Probabilistic Forecasts

63

boxcar

Gaussian

Epanechnikov

tricube

0.75 0.50

K (x )

0.25 0.00 0.75 0.50 0.25 0.00 −2

0

2

x

−2

0

2

Figure 3.6 Examples of kernels. The bandwidth controls the width of the kernel, in the same fashion as how the scale parameter controls the spread of a parametric distribution. The expression (x − xi ) suggests a shift of the kernel’s center to xi . The hat above f indicates that this density function is an estimate of the actual unknown density, whereas the subscript m suggests that the estimate is computed from m samples. 3.2.4.4

Empirical Predictive Distributions

Nonparametric predictive distribution, despite its name, still requires us to pick a kernel, as well as to estimate its bandwidth. Is there any predictive distribution that is completely free of parameters? The answer is affirmative. An empirical cumulative distribution function (ECDF) is a CDF that places mass 1/m at each of the m data points, x1 , . . . , xm . Mathematically,  m 1 if xi ≤ x, 1 F m (x) = ∑ {xi ≤ x}, where {xi ≤ x} = (3.11) m i=1 0 if xi > x. The symbol {condition} is often used to denote an indicator function, which takes the value of 1 if the condition as expressed in the curly brackets is satisfied, and takes the value of 0 if that condition is not satisfied. These indicator functions may have different appearances elsewhere, e.g., condition , Icondition , I{condition}, 1condition , and 1{condition}. ECDF can be easily understood graphically. Starting from 0, as the x value increases, the F m (x) increases in steps, forming a “staircase” with as many steps as the number of xm ’s. For example, Fig. 3.7 depicts F 6 (x) estimated from 6 data points, with x6 < x2 < x3 < x1 < x4 < x5 . (This is to show that the order of the data indexes does not matter when computing ECDF.) For x < x6 , all {xi ≤ x} takes the value of 0, thus F 6 (x) = 0. For x6 ≤ x < x2 , {x6 ≤ x} = 1 while other {·}’s are 0s, thus

64

Solar Irradiance and Photovoltaic Power Forecasting

F 6 (x) = 1/6. So on and so forth, until finally, x ≥ x5 , all {xi ≤ x} takes the value of 1, thus F 6 (x) = 1. The entire ECDF has now been traced out. 1.00

F 6(x )

0.75 0.50 0.25 0.00 x6

x2

x3 x1 x4 x

x5

Figure 3.7 An empirical cumulative distribution function estimated through six data points, x1 , . . . , x6 . Each step has a size of 1/6, or more generally, 1/m if there are m data points. The step functions used in ECDF construction might be inconvenient at times when the value of F m (x) at an arbitrary x is desired. Some forecasters, therefore, adopt an interpolated version of CDF, in which adjacent data points are connected by a linear segment (Lauret et al., 2019). Fig 3.8 shows an interpolated CDF with the same data as Fig. 3.7. Owing to the need to bound the CDF between 0 and 1, two additional x’s—we annotate them as xmax and xmin —are needed; they may be derived from the forecaster’s knowledge about climatology, which deals with the long-term behavior of the weather. For the current purpose, it simply means the maximum and minimum values that the forecast can take, conditional on the forecasting situation. 1.00

F 6(x )

0.75 0.50 0.25 0.00 x min x 6

x2

x3 x1 x4 x

x5

x max

Figure 3.8 An interpolated version of the CDF estimated through six data points, x1 , . . . , x6 . The first and last steps have a size of 1/12, or more generally, 1/(2m) if there are m data points, whereas the other steps in the middle have a size of 1/6, or 1/m more generally.

Although the interpolated version of CDF has been advised, there are also criticisms against it. The ECDF is the CDF of the population constituted by the data

Deterministic and Probabilistic Forecasts

65

points themselves, and it is but one estimator of the true CDF. The ECDF is perfectly general, in that, it is consistent and converges quickly. Furthermore, the ECDF is a discrete distribution, any kind of interpolation would render the resultant CDF difficult to interpret. If interpolation must be used, for whatever reasons, a better approach would be to first estimate the PDF through kernel density estimation, and then integrate it to obtain the CDF. In any case, we do not intend to further debate the alternatives, as the practical difference is thought to be marginal in forecasting applications. 3.2.5

OTHER REPRESENTATIONS OF DISTRIBUTIONAL FORECAST

Issuing predictive distribution is the paradigm case of probabilistic forecasting, which means that a predictive distribution contains all relevant information about the forecast. Through predictive distribution, two other representations of probabilistic forecast, namely, quantile forecast and interval forecast, can be obtained at once. We start by understanding quantiles. Formally, for a random variable X with CDF F, the inverse CDF or the quantile function is defined by:   (3.12) F −1 (τ) = inf x : F(x) > τ , for τ ∈ [0, 1]. The square bracket notation [0, 1] denotes a range, with both endpoints included. The symbol inf is the infimum, a concept in set theory—whereas its formal definition requires much mathematics that is beyond the scope of this book, replacing it with “the minimum” is generally adequate. Equation (3.12) is essentially saying that the τ th quantile is the minimum value of x that satisfies F(x) > τ. For most parts of solar forecasting, F is strictly increasing and continuous, F −1 (τ) is then the unique real number x such that F(x) = τ. To further elaborate on the concept of quantile, recall the CDF plotted in Fig. 3.3, which has been reprinted in Fig. 3.9 but with some annotation. It is noted that the xaxis is the domain of the random variable X which, in this case, is irradiance in W/m2 ; the y-axis is the range of CDF F, which is from 0 to 1. For a given irradiance value of x = 300 W/m2 , one can read off the value of the corresponding τ—by evaluating F(x = 300 W/m2 )—which is about 0.39; so we say that 300 W/m2 marks the 0.39th quantile. For a given value of τ = 0.75, one can read off the value of x—by evaluating F −1 (τ = 0.75)—which is 728 W/m2 ; so we say that the 0.75th quantile is 728 W/m2 . Although the distinction between τ (a particular value of F) and quantile (a particular value of X) seems entirely obvious, in practice, especially when X also has the range of [0, 1], a careless mix-up of the two leads to deadly errors that ought to be avoided at all cost. Do not get confused! Specific groups of quantiles take special names, particularly, the quartiles, deciles, and percentiles, which correspond to τ ∈ {0.25, 0.5, 0.75}, τ ∈ {0.1, 0.2, . . . , 0.9}, and τ ∈ {0.01, 0.02, . . . , 0.99}, respectively. It should again be emphasized that when forecasters converse about quantiles, the quantiles themselves are not numbers such as {0.25, 0.5, 0.75} (these are τ’s), but rather they are a set of specific values of the random variable. Hence, whenever quantile forecasts are desired, one must always specify which quantiles are being issued; the word “which” is

66

Solar Irradiance and Photovoltaic Power Forecasting

1.00

τ

0.75 0.50 0.39 0.25 0.00 0.0

300.0 500.0

728.0 x

1000.0

1361.1

Figure 3.9 The CDF as appeared in Fig. 3.3. The solid line reads: an irradiance value of 300 W/m2 corresponds to the 0.39th quantile. The dashed line reads: the 0.75th quantile is 728 W/m2 .

basically asking what values τ take. A typical sentence in a forecasting paper would read: “We issue 0.01, 0.02, . . . , 0.99 quantiles, which are represented by qτ .” In that, the first half of the sentence describes “which τ,” and the second half of the sentence assigns a symbol to the quantiles themselves. One crucial special case of quantile forecast is the interval forecast, where forecasters issue the central prediction interval with a predefined nominal coverage probability. The word “central” needs to be highlighted as there are infinitely many prediction intervals that share the same nominal coverage probability. For instance, if the coverage is set to be 95%, besides the default case as defined by the 0.025th and 0.975th quantiles, there are infinitely many other choices, such as {0.01, 0.96}, {0.02, 0.97}, or {0.026, 0.976}; you get the idea. To avoid such ambiguities, choices other than the central one are not considered—whenever prediction interval is mentioned, “central” is implied. This is equivalent to saying that a (1 − α) × 100% prediction interval has upper and lower endpoints being α/2 and (1 − α/2) quantiles (Gneiting and Raftery, 2007). We have been discussing, thus far, how to retrieve predictive quantiles from a predictive distribution. In other circumstances, some forecasting methods generate predictive quantiles directly, as typified by quantile regression and its variants, which include penalized quantile regression, quantile regression neural network, and quantile regression forest. There are also neural network structures that designate two output neurons to cater to the lower and upper bounds of a prediction interval, and those designate multiple output neurons to represent a set of predefined quantiles. But there is one option for obtaining quantiles that seems particularly attractive to weather forecasters (Lauret et al., 2019)—getting quantiles from ensemble forecast—which we shall spend the next few pages exploring.

3.3

ENSEMBLE FORECASTS

Gneiting and Katzfuss (2014) argued that perhaps the most mature and successful demonstration of probabilistic forecasting takes place in the field of weather

Deterministic and Probabilistic Forecasts

67

prediction. According to the terminology of weather forecasters for describing forecast horizon, numerical weather prediction (NWP) works mainly in the medium range, which corresponds to a horizon on the order of days. For forecasting weather quantities at shorter horizons up to a few hours, statistical and machine-learning extrapolation has been traditionally viewed as the preferred approach, albeit this preference has become less justifiable today, where NWP forecasts are updated hourly and sub-hourly. For horizons longer than a few days, on the other hand, chaos theory has put a hard limiter on how well one can predict the weather. Therefore, a more appropriate term to describe such forecasts is climate projection, which can reach out to as far as a century ahead, although the validity of such projections can lead to heated debate.7 Regardless of whether it is short-, medium-, or long-range weather/climate forecasting, ensemble forecasts have been the dominating form of probabilistic forecasts. The chief principle of ensemble forecasting, or ensemble prediction in general, is to leverage the wisdom of crowds to mitigate the judgmental bias in individual predictions. One well-known recorded use of ensemble predictions happened in 1907, when Sir Francis Galton, cousin of Charles Darwin, visited the Fat Stock and Poultry Exhibition in the port city of Plymouth (Spiegelhalter, 2019). Galton saw a group of contestants paying a tanner to guess the weight of a large ox. Smart as he was, Galton gathered those tickets filled out by 787 contestants, and chose the median number, which was 547 kg. In other words, he chose the median as the summary statistic to describe the data samples. It turned out that the actual weight was 538 kg, a number fairly close to Galton’s guess. His guess would probably be even closer had there been more contestants available. Whereas the scientific principles of ensemble forecasting are to be explored in depth in later chapters, a basic typology of ensemble weather forecast is given next. 3.3.1

TYPOLOGY OF ENSEMBLE WEATHER FORECASTS

Stated simply, an ensemble forecast contains multiple component forecasts (or member forecasts) that are generated using different information sets, underlying assumptions, and forecasting principles. Because individual forecasters are likely to suffer from information shortage, loose assumptions, and violation of forecasting principles, betting on collective judgment is usually safer. For a vast majority of the time, those component forecasts are deterministic, but absolutely nothing can be argued about why they could not or should not be probabilistic by nature. Typically, the total number of component forecasts in an ensemble forecast ranges from a few to over a hundred. For instance, the European Center for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System (EPS) issues 50 members (i.e., perturbed forecasts), alongside a best-guess forecast (i.e., the control forecast). (We can call that either a 50-member ensemble or a 51-member ensemble; people can get the idea.) The ECMWF ensemble forecasts are generated by perturbing the initial diagnosis of the current state of the atmosphere. Since the evolution of the dynamical model 7 An

obvious difficulty resides in the verification of such projections.

68

Solar Irradiance and Photovoltaic Power Forecasting

Best-guess forecast Uncertainty in analysis Uncertainty in forecast

Climatology Time

Figure 3.10 A schematic illustration of the trajectory of a dynamical ensemble. Climatology represents a “no skill” forecast, i.e., the marginal density of the variable of interest.

of the atmosphere depends sensitively on initial conditions (also known as the analysis), ensemble prediction systems such as the ECMWF EPS aim at capturing the uncertainties through ensemble data assimilation.8 Ensembles of this sort are known as the dynamical ensemble, see Fig. 3.10. In weather forecasting, dynamical ensemble is the default type of ensemble, which consists of a collection of forecasts with perturbed initial conditions. Recall that weather forecasters think of two reasons when it comes to forecast errors: Whereas the incomplete knowledge about the initial conditions is handled by perturbing them, the incomplete knowledge about the model representation of the governing dynamical laws is addressed by considering multiple models. That is, even if the same set of initial conditions is used, different models would still output different forecasts. This type of ensemble is known as the poor man’s ensemble, or multi-modeling. Indeed, this is the methodologically easiest and computationally cheapest way to construct an ensemble weather forecast, and thus the analogy of “a poor man.” Besides perturbing initial conditions and using different dynamical models, another tactic to issue ensemble weather forecasts is to use stochastic parameterizations. A dynamical weather model is based on a spatial discretization (or grid) of the differential equations of motion. In contrast to the dynamical core—the part of the model that describes the resolved scales of motion—physical parameterizations provide estimates of the grid-scale effect of atmospheric processes that cannot be 8 In meteorology, the goal of data assimilation is to obtain the best possible estimate on the states of the atmosphere through both observations and short-range model forecasts (also known as the background), as well as to quantify the uncertainty of that estimate. Data assimilation is sequential. In each round, the previous model forecasts are compared to the fresh batch of observations, the model states are then updated, and forecasting is subsequently performed.

Deterministic and Probabilistic Forecasts

69

resolved.9 To address the uncertainty in parameterization, instead of using a fixed scheme, many operational weather centers routinely use stochastic parameterization schemes to generate ensemble forecasts (Berner et al., 2017). We call this type of ensemble the stochastic parameterization ensemble. At this stage, any reader with a decent understanding of statistics would have already figured out the remarkable correspondence between the types of ensemble weather forecasts and the several sources of uncertainty that are omnipresent in statistics. Cressie and Wikle (2015) put forth the hierarchical statistical modeling framework, in which the hierarchy consists of a data model on the top, a process model in the middle, and a parameter model at the bottom. Cressie and Wikle contended that the era for building (marginal) probability models directly on data is coming to an end. What is more attractive is to build conditional probability models on data that depend on the underlying process, which subsequently depends on the process parameters that are again probabilistic. Be that as it may, we ought to agree that data, process, and parameter are the three sources of uncertainty that concern statisticians. To that end, the current practice of ensemble weather forecasting aligns well with the statistical theory of uncertainty. The dynamical, poor man’s, and stochastic parameterization ensembles capture respectively the uncertainties in data, processes, and parameters. With the three basic forms of ensemble forecasts, one can perform further operations on them, through the so-called “method of dressing.” If we accept the fact that the best judgment of a forecaster is necessarily probabilistic, each deterministic component forecast should in principle be accompanied by a description of its uncertainty. One natural way of quantifying such uncertainty is by looking at past error statistics. It is also reasonable to assume that the future errors of a “frozen” model10 would resemble the past errors. On this point, one can create more forecasts by adding past errors, from instances with similar weather conditions to the current one, onto the present component forecast, and hence the word “dressing.” These added forecasts are known as the daughter ensemble (Roulston and Smith, 2003). Daughter ensembles together with the mother ensemble—which could be either a dynamical ensemble, a poor man’s ensemble, or a stochastic parameterization ensemble—are called the hybrid ensemble. One caveat with the method of dressing is the double-counting of uncertainties. Although the distinction among sources of uncertainty facilitates interpretation, the model errors are, almost by definition, inseparable from initial condition errors. 9 NWP models are unable to resolve features and processes of weather that take place within a grid box, such as radiation or rain. However, for wind for example, the effect of local flows and turbulent eddies, due to subgrid surface structures like buildings or trees, must be known in order to have a single-number gridscale representation that goes into the surface friction term required by the wind forecast equation. For example, in Monin–Obukhov similarity theory, the representative height of surface structures is parameterized by a roughness length. Similarly, effects of other subgrid atmospheric processes such as radiative or microphysical effects of subgrid clouds, also need to be known for other equations of an NWP model. The way to take these within-grid-box effects into consideration without actually simulating them is called a parameterization or a scheme. To that end, parameterization models the effect of a process, rather than modeling the process itself. 10 The operational version of a dynamical model is constantly being updated. A “frozen” model refers to a particular fixed setup. A “frozen” model is usually used to produce reanalysis data, see Chapter 6.

70

Solar Irradiance and Photovoltaic Power Forecasting

Similar to dressing the past errors, one can directly dress distributions onto mother ensembles, which would immediately result in a nonparametric predictive distribution, as seen in Subsection 3.2.4.3. The dressed distributions need not come from the same parametric family, nor should they carry the same weight in the final mix, unless they are dressed on equiprobable dynamical ensemble members. Irrespective of which type of ensemble or which strategy of dressing is used, ensemble forecasts could benefit from calibration—an idea that was discussed in Section 3.2.3. 3.3.2

STATISTICAL ENSEMBLE FORECASTING

The commanding statement from George E. P. Box, a much-celebrated British statistician—“All models are wrong, but some are useful.”—famously highlighted the compulsion of accepting the inevitable uncertainty in (statistical) models. Despite the much-divided opinions on what constitutes a good statistical model, using ensembles is one of the few things that have reached a widespread consensus among statistical forecasters. Rather than calling them ensemble forecasts, as weather forecasters do, statisticians refer to them as combining forecasts, or composite forecasts (Armstrong, 2001). Forecast combination works on deterministic forecasts, interval forecasts, quantile forecasts, probability forecasts, and distributional forecasts (Winkler et al., 2019). Though in many cases, the component forecasts of various forms are generated through statistical models, judgmental forecasts also occupy a significant market share in the world of forecasting. Skimming through the literature on combining forecasts, any forecaster of physical sciences would be genuinely amazed by the amount of work there is on judgmental forecasting.11 These wild guesses and speculations thrown out by experts might strike in a sort of way that contradicts the core activities of physical scientists and engineers, i.e., science—we watch the news, and hear economists sharing their views on the housing market, but that’s that, after the news, we go back to our notebooks and computers loaded with equations for real forecasting. The reality is, however, contrary: Any perception about judgmental forecasting being less than science would be quite wrong. In fact, the research issue outlined in one of the earliest works on combining distributional forecasts is described as the opinion pool, which narrates a situation in which individuals hold disparate views on the probabilities of some relevant states of affair (Stone, 1961). Because the process of making a judgmental forecast is internal to a forecaster, the procedure for combining the solicited judgmental forecasts requires arguably more thinking and reasoning than that for combining forecasts output by models. After all, human behavior can be much harder to comprehend than mathematical equations. Fortunately, in solar forecasting, we do not need to deal with such situations, or do we? It is hard to tell whether or not, in the future, when solar forecast providers become an integral part of the energy market, the 11 Judgmental forecasting is widely used in business, financial, economic, marketing, political, product, sports, demographic, crime, and technology forecasting, among other domains, in which a forecaster issues forecasts based on heterogeneous information collected over a lifetime and a (not artificial) neural network “model” in the human brain computer.

Deterministic and Probabilistic Forecasts

71

game-theoretic type of interactions among forecasters, moderators,12 grid operators, and forecast users would eventually lead us back to the literature on judgmental forecasting. Indeed, judgmental forecasting has been an integral part of operational weather forecasting at the ECMWF (Inness and Dorling, 2012). With all that said, this book does not consider judgmental forecasting further than this. Statistical ensemble forecasting employs the same strategy as ensemble weather forecasting, that is, a forecaster can explore the forecasts generated using different (sub)sets of data, different models, and/or different parameters. Since statistical forecasting models exist in massive quantities and forms, it would be impractical to enumerate all possible ways of producing ensembles. Hence, in what follows, we only offer a very brief overview, and discuss some major tricks and knacks available in the literature. 3.3.2.1

Data Ensemble

The technique that underpins univariate statistical forecasting is time series modeling. A time series can be thought of as a vector of numbers recorded sequentially. Though not necessary, the values are often regularly spaced in time. As history tells of the future, time series methods seek to extrapolate the historical patterns. How much history one should use, then, becomes an immediately relevant question. Whereas one may argue to use as much data as possible, it would not be unreasonable to use only recent data if the forecaster believes in exploiting the underlying regimes of the forecasting situations, such as that one might not want to involve observations made during winter in explaining irradiance in summer. The first way for generating ensembles is, therefore, using time series data with different lengths of history; we term this time series length perturbation. A common nonparametric technique for the calculation of standard errors and confidence intervals is bootstrapping. In the evaluation of prediction intervals, bootstrapping is no different from the method of dressing—by assuming a consistent error behavior, the past errors are (conditionally) sampled and dressed onto the current forecast. Beyond that, bootstrapping is also capable of constructing new time series that statistically resemble the original one. Owing to the fact that time series may exhibit autocorrelation,13 instead of re-drawing samples from a time series one by one, block bootstrapping is more adequate, in that, the local time series features within the re-drawn contiguous sections are preserved. It follows that forecasts can be generated on each bootstrapped time series, and thus form an ensemble; this procedure is called bootstrap aggregating, or simply, bagging. A curious feature of bagging is that the final aggregated forecasts are often more accurate than that produced with the original time series directly, although the total information set, from which the forecasts are derived, stays unchanged. Bootstrapping constructs new time series from the original one. In many domains, such as financial, economic, or marketing forecasting, time series data of 12 Sometimes, forecasts submitted by system owners may be aggregated by an intermediate party, who takes charge of several systems in a region. More generally, any party between forecasters and grid operators can be regarded as moderators. 13 Autocorrelation is the correlation between a time series and a lagged version of itself.

72

Solar Irradiance and Photovoltaic Power Forecasting

interest are only available in a single version—a GDP time series is a GDP time series, for any country and time period, why would there be two different sets of GDP records? In atmospheric science, more often than not, there would be multiple versions of time series of the same atmospheric variable. On top of in-situ measurements, atmospheric variables are also observed through remote sensing, or produced through reanalysis.14 We shall discuss the complementarity of these three data sources in Chapter 6. But for now, it is obvious that if forecasts are generated using each of these different versions of time series, an ensemble naturally follows. Remote-sensed and reanalysis data are gridded, which is to say that their regions of coverage are discretized into pixels, and a time series is available at each of these pixels. This gives opportunities for forecasters to leverage not just the time series at the focal location but also those time series of the neighboring pixels. Some studies have shown that by including forecasts from neighboring pixels into the ensemble, the accuracy of forecasts at the focal location improves (Yagli et al., 2022; Schwartz et al., 2010). Nevertheless, the true value of these spatial data lies within their support for spatio-temporal forecasting. Of course, one can extend the aforementioned ensemble-generating techniques for univariate time series, such as time series length perturbation or bagging, to a spatio-temporal setting. To give perspective, consider a popular spatio-temporal forecasting model, vector autoregression, which examines the auto- and cross-correlation among time series in space. The word “vector” suggests that the quantity of interest is no longer a single random variable, but a vector of random variables. For that reason, instead of block bootstrapping one time series, forecasters can bootstrap matrices containing several time series. Forecasting of vector quantities belongs to the domain of multivariate statistics. 3.3.2.2

Process Ensemble

Most economists and some statisticians would agree to the existence of datagenerating processes; they attribute the cause that led to the observed data to such processes. Some argue that such an idea is merely a Platonic ideal, because in the real world, those processes are hidden from the observer. One thing is, however, certain: We can build models to test our hypothesis empirically. If the data from the hypothesized process is reasonably similar to the observations, one has good reason to believe in the process to continue generating data of the same fashion in the future. From this standpoint, forecasters may make more than one hypothesis, and a process ensemble really just consists of the forecasts from a group of competing models, each reflecting a unique belief of the underlying data-generating process. 14 Reanalysis is the output of a dynamical weather model that is run in “hindcast mode” with the goal of generating a temporally consistent (i.e. a single model run, not multiple runs initialized at different times and with different initial conditions), best-available description of the state of the atmosphere produced by the model. Similar to weather forecasting, reanalysis also performs data assimilation, but in reanalysis “future” data can be included in the data assimilation. Reanalysis does not issue operational forecasts, only real-time data. One can think of reanalysis as a re-run of a “frozen” NWP model over an extensive historical period, and the model is continually anchored to the observations, whereas in weather forecasting, data only informs the initial state. In this way, reanalysis data suffers less from uncertainty due to initial conditions and the chaotic nature of the atmosphere, while the sources of parameter and process uncertainty remain the same as in the original weather forecasts.

Deterministic and Probabilistic Forecasts

73

Whenever statistical forecasting is discussed, there ought to be a place reserved for the iconic autoregressive integrated moving average (ARIMA) family of models. It is called the ARIMA family of models because it is not a single model, but a whole bunch of them. Depending on how many past values of the forecast variable (the order of “AR,” or p), how many times of differencing (the order of “I,” or d), and how many past forecast errors (the order of “MA,” or q) a forecaster wishes to consider, the resultant ARIMA(p, d, q) model would be unique to a dataset. Changing p, d, and/or q would lead to another model. Statisticians refer to the procedure for selecting the most appropriate choice of p, d, and q as model identification or model selection. Once the “best” choice is identified, e.g., using some information criterion, we have an ARIMA model for the time series at hand. Almost as popular as ARIMA is the exponential smoothing (ETS) family of models. The time series under the ETS framework is considered to be the result of a fusion of three components, namely, the error, the trend, and the seasonal, which are also what the abbreviation of ETS stands for. Depending on how these three components interact with each other, ETS takes different forms. For instance, one can either assume the error component to be additive (A) or multiplicative (M) to the other two components. For trend, there are five options: no trend (N), additive (A), additive damped (Ad ), multiplicative (M), and multiplicative damped (Md ), whereas for seasonal, there are three: no seasonal (N), additive (A), and multiplicative (M). These components (or states) would change over time, hence, ETS models are state space models. For example, an ETS(A, N, N) model, also known as the simple exponential smoothing, has additive error, no trend, and no seasonal. Similar to ARIMA, once the “best” choice is identified for a given time series, we have an ETS model. There are more ways to decompose a time series into components. A nonexhaustive but representative list includes additive decomposition, multiplicative decomposition, X11 decomposition, SEATS decomposition, and STL decomposition.15 To some extent, one may also count Fourier and wavelet decompositions in. With that, an easy way to acquire more forecasts for entering the ensemble is to consider each of these decompositions. Indeed, the decomposition–forecast– reconstruction workflow is not novel, but a well-known strategy. Time series models can get fairly complex. For instance, the TBATS family of models stands for Trigonometric representation via Fourier series, Box–Cox transform, ARMA errors, Trend, and Seasonal components—whatever that means. The main message that we wish to deliver is this: It is not so difficult to collect a fairly large number of time series models for ensemble forecasting. The model pool becomes even larger if one considers regression models (see Section 3.3.3). 3.3.2.3

Parameter Ensemble

One perspective to view the orders of ARIMA or the states of ETS is to treat them as parameters. As parameters of a distribution control its shape, the parameters of a time series family control its function form. Consequently, when the orders of ARIMA or 15 SEATS is the acronym of Seasonal Extraction in ARIMA Time Series, and STL stands for Seasonal and Trend decomposition using Loess.

74

Solar Irradiance and Photovoltaic Power Forecasting

the states of ETS are adjusted away from the identified ones, new forecasts result. When several such forecasts are gathered, one arrives at a parameter ensemble. Some statisticians may voice concern about this strategy, since it defies the purpose of time series model identification. If one takes a look at the classic book by Box et al. (2015), it would not be too long before realizing how much effort, in the past, forecasters had put in just to arrive at the formalism of model identification. It follows that one should not revert to the start, and should not consider any parameterization other than the optimal one. While acknowledging the scientific rigor in the literature of model identification, we provide some justification for having a parameter ensemble. Through pure reasoning, it is possible to rationalize the current parameterensemble strategy, with respect to that used in ensemble weather forecasting. Since members in a dynamical ensemble are not the best guess, but can nevertheless be considered useful for uncertainty quantification, one can apply, then, the same rationale to believe those non-optimal parameterizations in time series forecasting would also act in favor of uncertainty quantification. The model identification formalism targets the optimal parameterization, or analogously, a best-guess forecast; that does not conflict with having the non-optimal parameterizations as ensemble members. Moreover, what is considered to be an “optimal” parameterization is also open to interpretation. For example, there are several information criteria available, such as the Akaike’s information criterion (AIC), corrected AIC, or Bayesian information criterion, each would possibly identify a different parameterization. That said, there are some potential pitfalls in choosing a parameter ensemble. For one, the choice of the adjusted parameters must be reasonably close to the optimal one. It would not make any sense to fit a nonseasonal parameterization of ETS to seasonal data, and vice versa. Next, if the optimal parameterization does not fit the data well, for whatever reason, the forecasters should break off from the forecasting process and consider alternative models. One example is heteroscedasticity in fitting errors, which suggests the need for another model from the one that assumes homoscedasticity. 3.3.3

ENSEMBLE LEARNING

We must confess early that splitting out another section on ensemble learning from statistical ensemble forecasting is rather unnatural. To understand why, we start by examining some definitions of ensemble learning: “Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.” Robi Polikar Scholarpedia

Deterministic and Probabilistic Forecasts

75

“An ensemble contains a number of learners which are usually called base learners. The generalization ability of an ensemble is usually much stronger than that of base learners. Actually, ensemble learning is appealing because that it is able to boost weak learners which are slightly better than random guess to strong learners which can make very accurate predictions.” Zhi-Hua Zhou Encyclopedia of biometrics “Ensemble Learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a“committee” of decision makers. The principle is that the committee decision, with individual predictions combined appropriately, should have better overall accuracy, on average, than any individual committee member.” Gavin Brown Encyclopedia of machine learning All three versions of the definition of ensemble learning cover two aspects: First, it consists of gathering and training multiple models, and subsequently, combining their outputs; and second, a reduction in model-selection risk and some improvement in accuracy can be expected. Both aspects are, in principle, no different from the procedure and goal of statistical ensemble forecasting. Stated differently, the essential characteristics that differentiate “forecasts by ensemble learning” from “combining forecasts,” or “ensemble forecasts” for that matter, are somewhat obscure. This lack of characteristic distinction between the two sets of terminologies is disconcerting, because not knowing the alternative jargon puts one at risk of overlooking a large portion of the literature. (Separate systems of terminologies, which researchers in different fields reinvented to convey the same idea, have led to great misfortune.) And that is precisely why we have decided to devise a section on ensemble learning. Even so, we cannot escape from a related but more general question concerning and troubling many people: What is statistics, what is machine learning, and how to distinguish between them? Unfortunately, this is a question without a simple answer. Over and above that, those who attempt to provide quick answers to the question often fall victim to the illusion of explanatory depth—the incorrectly held belief in understanding something deeper than one actually does. Think about Bayesian inference, for instance, most would not hesitate to place it nearer to the category of “statistics” than to “machine learning”; similarly, artificial neural network is evidently more “machine learning” than “statistics.” Whereas the cases of Bayesian inference and neural network still can be agreed upon, one ought to realize that the boundary of statistics and machine learning is by definition fuzzy—for example, is regularized regression statistics or machine learning? How about just calling everything statistical learning Hastie et al. (2009)? We cannot pretend to know how the line ought to be drawn so that the classification can be accepted by most. We can, however, attempt to develop a

76

Solar Irradiance and Photovoltaic Power Forecasting

parallel interpretation of ensemble forecasting, departing from a machine-learning perspective. Machine learning performs two overarching types of tasks, classification being one and regression being the other, between which forecasting of categorical quantities is concerned with the former, and forecasting of continuous variables is concerned with the latter. There are myriads of algorithms and methods in both classification and regression. And thanks to the cumulative effort of generations of scientists over the past 50 years or so, open-source implementations of those algorithms are now available in bulk. The most direct strategy for generating machine-learning ensemble forecasts, in this regard, is to feed the training data to a collection of regressions, such as support vector regression, adaptive-network-based fuzzy inference system, bagged regression tree, boosted smoothing spline, extreme gradient boosting, extreme learning machine, random forest, ridge regression, multilayer perceptron, stacked autoencoder deep neural network, least angle regression, lasso regression, or elastic net, just to name a few. There are about 150 regression methods available in the R caret package, and about as many in the Python scikit-learn library. Once the features (predictors, regressors, or independent variables) along with the target (predictand, regressand, or dependent variable) are arranged, and the training–test split is designed, all that is left to be done is to execute a few function calls and wait for the results. (In the present, good machine-learning toolboxes could handle automatic parameter tuning fairly well.) This strategy of issuing the machinelearning ensemble and that of producing the poor man’s ensemble as seen earlier are analogous. That being said, this strategy is recommended only with reservation, for it can lead to undesirable consequences. For instance, anyone could pick five models from the list of 150, and make an ensemble forecast, which clearly does not preclude somebody else from picking another five, and yet another five by somebody else, ad infinitum. The permutation–combination game is never-ending, wearing a veil on comparability, and is contributing to the field’s progress nothing but a drag force, amplified by the “novel” operator and the smoke grenades (see Chapter 2). What needs to be done, instead, is appreciating the salient features of solar irradiance (see Chapter 4) and the theory behind ensemble learning. Regarding the latter, there are three main algorithms, or rather, strategies, that have been proven useful in countless situations: bagging, boosting, and stacking. A brief account of each is given in what follows. 3.3.3.1

Bagging

Bootstrap aggregating, or bagging, is fairly straightforward in its concept. As introduced in Subsection 3.3.2.1, bagging works by creating bootstrapped copies of the original time series (or feature matrix, in a machine-learning context), applying the forecasting method to each of the new series, and thus resulting in multiple forecasts for the same quantity. The forecasting models fitted or trained with the new series are commonly referred to as base learners or weak learners. It is known a priori that

Deterministic and Probabilistic Forecasts

77

the performance of a bagged forecasting method relies on the property and behavior of the base learner. A good base learner should inherit a low bias and a low variance, or be able to acquire them through optimization. But the problem, like in many other situations concerning optimization, is one of balance, or what is formally known as the bias– variance trade-off. Bias is the part of the error from inadequate assumptions in the learning algorithm, which would cause the algorithm to miss some degree of correspondence between its features and target. Simply put, having high bias hints at underfitting. On the other hand, variance is the part of errors from the sensitivity to small variations in data, and an overly sensitive algorithm does not discriminate noise from signal, i.e., is not robust. That is, having high variance hints at overfitting. Since the two objectives—having small bias and having small variance—are conflicting, balancing them is one of the most fundamental concerns in machine learning. Bagging averages forecasts made by different base learners. Because averaging smooths out the variability in the forecasts, by assuming there is enough diversity in the ensemble, bagging reduces variance. This is no different from the “wisdom of the crowd” thinking that underpins the framework of ensemble forecasting. Clearly, for base learners that are prone to high variance, bagging usually works well. However, if the base learners are fairly stable with high bias, the advantage of bagging is less obvious. Though it is often the case, there is no guarantee that bagging can outperform base learners. In fact, most ensemble forecasts face the same issue: It can be shown that, when the base learners are weighted equally, the mean square error (MSE, or equivalently, sum-of-square error, or root mean square error) of ensemble forecasts is guaranteed to be smaller than that of an average learner, but not that of the best learner. What forecasters can do, however, is to improve the diversity of the base learners, which is more likely to drive down the MSE of the bagged model. In machine-learning settings, that is, the data is represented by a feature matrix rather than a time series, one has the option of resampling fewer data points than the original size of the data. Consequently, this allows an internal evaluation of predictive performance through what is known as the out-of-bag samples, i.e., the data points that are not drawn. However, the more base learners a forecaster employs, the slower the computation is, which would be taxing for complex forecasting methods and large training datasets. However, computing time can be reduced by parallelization, which is embarrassingly easy for bagging, since the base learners do not interact with each other throughout the process. 3.3.3.2

Boosting

Boosting shares the same motive as bagging, in that it seeks to combine the outputs of many base learners to produce a more powerful prediction. However, the mechanism of combining used by boosting is fundamentally different from that used by bagging. Originally proposed for classification, boosting starts with one weak classifier and sequentially piles on more classifiers to the ensemble, in stark contrast to the parallelized nature of bagging, by leveraging the evidence of the errors made by the preceding classifier. Whereas boosting can be extended to regression settings, the

78

Solar Irradiance and Photovoltaic Power Forecasting

literature on it is highly imbalanced towards classification. For this reason, we shall start by reviewing the most popular boosting algorithm—AdaBoost.M1—for binary classification, and later discuss its extension for regression. training samples

C1 (x)

weighted samples

C2 (x)

weighted samples

C3 (x)

α1 α2

.. . α3 Cm (x)

weighted samples

αm  C(x) = sign

m



∑ α jC j (x)

j=1

Figure 3.11 Schematic diagram of AdaBoost.M1. Figure 3.11 depicts the schematic diagram of the AdaBoost.M1 algorithm. Consider a binary classification problem, in which one is tasked to predict the label of Y ∈ {−1, 1} through a vector of predictors X, and there are n training samples.16 For some classifier C, the error rate of the training sample is defined to be: η=

1 n ∑ {C(xi ) = yi }, n i=1

(3.13)

which means that: For the ith sample, xi , if the label output by the classifier, C(xi ), is not the same as the correct label, yi , we add one to the sum, by the means of an indicator function; and η is, hence, the fraction consists of the number of wrong predictions over n total instances. C is what is known as the weak classifier, whose error rate is slightly better than that of random guessing. But a weak learner is all we need. Although C itself is weak, the error rate allows us to identify those instances where C has failed, thereby modifying our understanding, and building another classifier to handle those incorrectly classified instances, through weighting. When these steps are repeated, boosting progressively adds new classifiers, until the error rate 16 For

clarity, think of the feature matrix to be n × k, where k is the length of the vector X.

Deterministic and Probabilistic Forecasts

79

converges. With Eq. (3.13) rewritten as: ηj =

∑ni=1 wi j {C j (xi ) = yi } , ∑ni=1 wi j

(3.14)

where j indicates the jth iteration, AdaBoost.M1 goes as follows: 1. Assign wi1 = 1/n, for i = 1, . . . , n. 2. For j = 1, . . . , m: (a) Build weak classifier C j (x), (b) Compute η j using Eq. (3.14), (c) Calculate α j = log[(1 − η j )/η j ], (d) Calculate wi, j+1 = wi j exp(α j {C j (xi ) = yi }), (e) Increment j. 3. Evaluate test set with the boosted classifier, which is:   C(x) = sign

m

∑ α jC j (x)

.

j=1

The key step of AdaBoost.M1 is 2(d), for it allows the weight of an incorrectly classified observation to be scaled by a factor of exp(α j ), increasing its importance in inducing the succeeding classifier; and the weight of a correctly classified observation stays unchanged. Extending boosting to regression settings demands modifications of the loss function. A loss function is one that tells us how much a prediction is away from the observation. Through some algebraic manipulation, one can show that AdaBoost.M1 is equivalent to a forward stagewise additive model with exponential loss (Hastie et al., 2009). It is not critical at this stage to understand what exactly a forward stagewise additive model is; instead, one should note that by replacing the exponential loss with other loss functions, such as the squared loss, the absolute loss, or the Huber loss, boosting can be applied to regression. It is also noted that some loss functions are more robust (less affected by outliers) than others, albeit that the particular choice of loss function used in boosting is often motivated by algorithmic simplicity rather than robustness. 3.3.3.3

Stacking

Stacked generalization, or stacking, builds up from a data or model ensemble, and adds (or “stacks”) a meta-learning algorithm, or super learner, who will learn an optimal combination of the base-learner forecasts. From one perspective, stacking aligns with the general idea of post-processing, in which initial forecasts are adjusted and combined according to the observed relationship between various component

80

Solar Irradiance and Photovoltaic Power Forecasting

forecasts and the target. Post-processing is, by nature, a two-stage process. Twostage processes require forecasters to set aside part of the verification data for the purpose of training the post-processing model. Suppose the forecaster has four years of data, once the base learners (or component models) are trained using two years of data, initial raw forecasts are produced for the remaining two years; subsequently, a portion of those raw forecasts, such as a period of one year, is taken out to train a postprocessing model, so that the model can post-process the leftover portion of the raw forecasts into the post-processed forecasts, see Fig. 3.12 for an example of this datasplitting strategy. Stacking proceeds slightly differently from the procedure of postprocessing. Rather than using a two-stage process, stacking trains the super learner through cross-validation (CV)—stacking requires only a training set, and leaves the test set (hold-out samples) untouched. raw component forecasts issued by base learners

train base learners

2017

2018

2019

2020

train post-processing model

2021

year

post-processed forecasts

Figure 3.12 Typical train–test split, with post-processing, in ensemble forecasting. The super learner algorithm is shown in Fig. 3.13. It starts with the usual ensemble forecasting procedure by training several base learners using the training set. (These trained base learners are saved separately, until the testing phase.) For each base learner, its in-sample forecasts are generated through k-fold CV: 1. (optional) Shuffle the initial training set. 2. Split the initial training set into k equal-sized groups. 3. For each group: (a) Treat the group itself as the new test set; (b) Treat the remaining groups as the new training set; (c) Train model using the new-training-set features and new-training-set targets; (d) Predict the new-test-set targets through the new-test-set features and the trained model; (e) Save predictions and discard the trained model. 4. Evaluate the CV results. (This step is not needed for the super learner algorithm).

Deterministic and Probabilistic Forecasts n × k training features ⎞ ⎛ x11 x12 · · · x1k ⎜x21 x22 · · · x2k ⎟ ⎟ ⎜ ⎜ . .. .. ⎟ .. ⎝ .. . . . ⎠ xn1 xn2 · · · xnk

training set

m base learners BL1

in-sample cross-validated forecasts

81

BL2

···

n × 1 training targets ⎛ ⎞ y1 ⎜y2 ⎟ ⎜ ⎟ ⎜.⎟ ⎝ .. ⎠ yn

BLm

CV

CV

CV

⎛ ⎞ y 11 ⎜y 21 ⎟ ⎜ ⎟ ⎜ . ⎟ ⎝ .. ⎠ y n1

⎛ ⎞ y 12 ⎜y 22 ⎟ ⎜ ⎟ ⎜ . ⎟ ⎝ .. ⎠ y n2



super learner

⎞ y 1m ⎜y 2m ⎟ ⎜ ⎟ ⎜ . ⎟ ⎝ .. ⎠ y nm

SL

Figure 3.13 Super learner algorithm for stacking.

After that, a super learner, which is often in the form of a regularized regression, is trained based on the in-sample cross-validated forecasts and the original training set targets. When it comes to out-of-sample forecasts, the trained base learners first issue ensemble forecasts, and these forecasts are combined through the trained super learner.

3.4

AFTER THOUGHT

Perhaps the key characteristic separating statistical ensemble forecasting and ensemble learning is the access to the component forecasts. The combining mechanism as used in statistical ensemble forecasting is often external, whereas it is usually internal to ensemble learning algorithms. Packaged ensemble learning toolboxes usually do not output component forecasts, but only the final combined one. This can be inconvenient, and would require modifying the code, should one wish to access not just the combined forecast but also the component ones. Statistics ensembles, in this respect, are more flexible out-of-the-box.

82

Solar Irradiance and Photovoltaic Power Forecasting

Forecast users’ desires are of various kinds. Some may pursue simplicity and favor a single-valued best guess; some have deep anxieties about uncertainty, and then a deterministic forecast is often considered insufficient, and they may demand a predictive distribution; some might seek a balance between simplicity and uncertainty representation, and thus prefer quantile or interval forecasts; for the remaining ones who have faith in the wisdom of the crowd, a good ensemble forecast provides comfort. To ensure the synergy between forecasters and forecast users is not broken by the styles of presentation of forecasts, conversion of forecasts from one form to another is necessary. Such conversion takes skills, and is thought best to be discussed separately in a later chapter (see Chapter 8). For now, we simply note that the conversion is not as straightforward as it may seem. The inherent difficulty of the conversion is counter-intuitive and even philosophical, especially for probabilistic-to-deterministic conversion, where one may na¨ıvely think that summarizing the predictive distribution or ensemble through mean or median is all that is about. However, without scrutinizing the situation, one can easily miss a whole bunch of important questions. Why should one extract the mean as the summary statistic representing the predictive distribution? Why not the median? If the mean of a predictive distribution is used as the deterministic forecast, how does that affect the choice of accuracy measure that is used to evaluate the forecasts? If one accuracy measure is found to be not in the forecaster’s favor, should it be discarded or hidden? Should one decide on the choice of accuracy measure prior to forecasting? In our experience, these questions are rarely asked by solar forecasters. Only a countable few have attempted to tackle them formally, even so, most of the time, their arguments are epistemically circular (Jolliffe, 2008). We are, by no means, claiming to know the holistic answers. But what we can do is to expose the readers, if they did not already know, to the complex issues at hand. There is an undercurrent of resistance in the solar forecasting community to the formal forecast conversion and verification procedures. We hope this can change in time. We shall return to this topic soon.

Forecasting: The New 4 Solar Member of the Band “We were four guys. I met Paul, I said ‘Do you wanna join me band?’, you know. Then George joined, and then Ringo joined. We were just a band who made it very, very big, that’s all.” — John Lennon

Solar forecasting as a domain of energy forecasting is exceedingly recent. Other domains of energy forecasting, primarily electric load forecasting, wind forecasting, and electricity price forecasting,1 were already fairly well developed by the early 2010s, the time when modern solar forecasting first began. Load forecasting is the oldest: By the mid-1980s, high-quality reviews, which can be considered impressive even with today’s viewpoint, were already available (Willis and Northcote-Green, 1983; Gross and Galiana, 1987). The emergence of wind forecasting is not far behind: The first major papers appeared in the 1980s,2 followed by the warm-up in the 1990s, and since 2000, it began to mature (Costa et al., 2008). As for price forecasting, there was not much development prior to 2000, except for a few isolated works, but the amount of literature has been steadily accumulating since, largely owing to the deregulation and the introduction of competitive markets of the power sectors (Weron, 2014). Scientific methods, by nature, are developed to address general problems. That gives us absolutely no reason to not scrutinize the existing literature before attempting to make our own. Forecasting is no exception. The history of forecasting is in large measure the collective thinking of some very smart people, who along the way made some very tempting mistakes. If solar forecasters ignore those or pretend to look away, they are doomed to make the same mistakes. The first goal of this chapter, then, is to take a look at the major lessons learned by other energy forecasters in the past and correlate those with the evidence solar forecasters have gathered. Scientific methods, also by nature, are specialized. Specialization is the sign of academic rigor and the manifestation of creativity. That compels us to think beyond the current literature and move into the unexplored territory, even if the steps taken are small as they may appear to those who are outside of the domain. Forecasting is no exception. By identifying the limitations of existing energy forecasting methods when they are applied to solar forecasting, new opportunities present themselves. 1 For simplicity, electric load forecasting and electricity price forecasting are referred to as load and price forecasting in what follows. 2 Wind is a major meteorological variable, and thus it had been included in the early effort to “foretell the weather” at a much former date. Nevertheless, given the content of this section, which primarily deals with energy forecasting, it is the wind power forecasting that we are concerned about. In other words, the term “wind forecasting,” just like the case of solar, denotes both wind speed and wind power forecasting.

DOI: 10.1201/9781003203971-4

83

84

Solar Irradiance and Photovoltaic Power Forecasting

However small those opportunities may seem at first, and however inconsequential their impacts may be perceived as, they could eventually lead to significant advances. The second goal of this chapter is thus to discern the dissimilarities between solar forecasting and other energy forecasting, so that we can make targeted efforts towards advancing the domain. It is true that the current authors prefer to think of solar forecasting as a new constituent of the energy forecasting community. However, this is not incompatible with other typologies or classifications, such as putting it as a subsidiary of weather forecasting. After all, the greater part of wind forecasting is about weather and meteorology, and wind forecasting has always been a major player in energy forecasting. Some may regard this present classification of solar forecasting as an easy way out, of which the conception may develop from the argument that solar forecasting can simply follow the lead of other energy forecasting domains, wind in particular, without needing to develop sophisticated theory on its own. That is incorrect; a more adequate question we ought to ask is: What does it take for this “new member of the band” to blend in and get along with the rest, to shine his character, and eventually, to flourish together with the rest?

4.1

MATURITY OF DOMAINS OF ENERGY FORECASTING

The Global Energy Forecasting Competition (GEFCom) is the largest forecasting competition series conducted by a dynamic team led by Tao Hong. The first GEFCom, in 2012, set out two tracks—a load forecasting track and a wind forecasting track—for contestants around the world to showcase their forecasting skills (Hong et al., 2014). Two years after that, the second GEFCom took place. On top of the load and wind tracks, GEFCom2014 introduced additionally a price forecasting track and a solar forecasting track (Hong et al., 2016). The holistic design of GEFCom2014 attracted 581 participants from 61 countries, which has led to the first-ever largescale interactions among forecasters working in the previously autonomous energy forecasting domains. Since then, the umbrella name “energy forecasting” started to be gradually accepted by more and more people in both research and industry. In consolidation of the outcome of GEFCom2014, a review article was written, in which Hong et al. (2016) presented an outlook of energy forecasting. The main message therein consists in a summary of the then perceptions on the maturity of various energy forecasting domains, as shown in Fig. 4.1. The relative positioning of different domains, on a two-dimensional space of maturity in deterministic and probabilistic forecasting, is remarkably consistent with the length of history of each domain, with load and wind having longest histories and being most mature, and solar being the least mature in both deterministic and probabilistic forecasting. The rationale behind such a quadrant assignment is rather complex. Short-term load forecasting. Firstly, the reason for separating load forecasting into long- (> 2 weeks) and short-term (≤ 2 weeks) is because a primary influencing factor of load forecast accuracy, namely, temperature, has limited predictability beyond two weeks. In this regard, the short-term load forecasts benefit tremendously from the accurate medium-range temperature forecasts—by medium-range,

Solar Forecasting: The New Member of the Band

85

probabilistic forecasting mature wind LTLF STLF immature

price

deterministic mature forecasting

solar immature

Figure 4.1 Maturity quadrant of the energy forecasting subdomains (LTLF: long-term load forecasting; STLF: short-term load forecasting).

meteorologists mean a horizon of less than two weeks. Because the short-term load forecasting problem can be framed as a time series forecasting problem, there was hardly any method of time series forecasting that had not been tried as of 2014. To that end, Hong et al. (2016) placed short-term load forecasting as the most mature domain in terms of deterministic forecasting. Long-term load forecasting. Long-term load forecasting is about power system planning, which can go as far as 10 years ahead. Long-term load forecasting is motivated by the long lead time of constructing transmission lines and large power plants. In parallel, it is important for transmission lines and power plants to adapt to electricity grids that constantly undergo geographical or temporal changes in terms of load and generation. For example, the short 188 km Sunrise Powerlink transmission line was proposed in 2005, and approved by state regulators in 2008, but was not energized until 2012. The Sunrise Powerlink now transports solar energy from the southern California desert to San Diego at the coast. Although any attempt to issue an accurate deterministic load forecast for a particular hour 10-year-ahead can be considered nothing more than a wild guess, a good estimate of the overall uncertainty in the long term might be attainable, provided that the miscellaneous influencing factors such as inter-annual weather variability, population growth, demographic change, or geographical expansion of the service territory, are taken into consideration. In fact, as early as 1983, this kind of long-term load forecasting was already thought of as a multiple-scenario projection (Willis and Northcote-Green, 1983), which holds a high resemblance with the climate projection that is often spoken about in the atmospheric science community. As the word “multiple-scenario” suggests the notion of an ensemble, which has been studied by load forecasters since the 1980s, long-term load forecasting was thought to be second most mature in terms of probabilistic forecasting in Hong et al. (2016), right next to wind forecasting, but

86

Solar Irradiance and Photovoltaic Power Forecasting

not so much in terms of deterministic forecasting due to the aforementioned rationale. Wind forecasting. Wind took the winning place in probabilistic forecasting; this is no surprise whatsoever. The exact reason given by Hong et al. (2016) was this: Wind forecasting is part of weather forecasting, which has been perhaps the most successful demonstration of ensemble forecasting; we have seen that statement from Tilmann Gneiting in the last chapter also. As a matter of fact, Gneiting himself had many widely recognized and often major contributions to wind forecasting. The emphasis on probabilistic forecasting in the wind literature has caused a relative shortage of work on deterministic forecasting. At least when compared to short-term load forecasting, there do not seem to be as many deterministic wind forecasting papers as there are in load forecasting. Therefore, Hong et al. (2016) considered deterministic wind forecasting to be “less mature” than load, as shown in the quadrant. This placement, however, appears to be imperfect, owing to the fact that any probabilistic forecast can be summarized into a deterministic one in a variety of ways. That said, the formalism of summarizing predictive distributions itself was not developed until the early 2010s (Gneiting, 2011); this, in turn, gives some justification to why, historically, deterministic forecasting and probabilistic forecasting were treated separately. Price forecasting. Price forecasting was said to be more mature in deterministic than in probabilistic forecasting. Examining the academic affiliations of price forecasters shows that many of them in fact come from business schools or schools of economics. As anything that deals with market behavior is complex, the factors influencing the electricity price are many, such as transmission congestion, game theoretic interactions, or the psychology of market participants. Combining these two premises leads to the conclusion that price forecasters have been focusing on datadriven approaches in which econometric-type models dominate. Since most econometric models, such as those based on multiple linear regression and time series models, have placed much of their emphasis on estimating the mean or median of a quantity, price forecasting no doubt has inherited that. Supporting this argument is the 52-page u¨ berreview conducted by Weron (2014), in which only the evaluation of deterministic forecasts were discussed, hinting at the price forecasters’ reluctance, at that time, of making probabilistic forecasts. That mentality has shifted now, as evidenced by another review by Nowotarski and Weron (2018), four years after Weron’s 2014 review. Solar forecasting. Solar forecasting ranked last in both deterministic and probabilistic forecasting for two reasons. The obvious one is that, as of 2014, the amount of literature on solar forecasting was substantially less than that of other energy forecasting domains. Further to that, Hong et al. (2016) explained that the thin literature on solar forecasting might be due to the fact that the penetration of solar power in grids had not been significant enough at that time. Whereas their point is accurate, a somewhat hidden reason which had led to the low ranking of solar forecasting is the disconnect between the solar forecasting community with the rest—Yang (2019a) used an analogy of “living in a bubble” to describe the situation. Indeed, the first major energy forecasting review that had a solar forecaster in its author list was published only ever so recently (Hong et al., 2020).

Solar Forecasting: The New Member of the Band

87

Anyhow, the conception of the maturity quadrant was rooted in the perception of some of the best energy forecasters in 2016. Was there any general consensus on such a division among other energy forecasters? What has changed that can influence our judgment on the relative position of energy forecasting domains today? How much development has occurred since? These are the questions that do not lend themselves to simple answers. But one thing we are quite sure of is that the position of solar forecasting has definitely moved towards the direction of the first quadrant, that is, both deterministic forecasting and probabilistic solar forecasting have advanced substantially since 2014, in part due to the fastest rate of publication growth among all four domains (Hong et al., 2020). In the remaining part of the chapter, we investigate the following topics. We examine first what the domain experts perceive as the most important factors to be considered during forecasting of load, wind, and price, in Sections 4.2, 4.3, and 4.4, respectively, and then in Section 4.5 the salient features of solar irradiance, which serve as the foundation to differentiate solar forecasting from the rest of the energy forecasting domains. We examine next those research frontiers that are shared across all four domains, very briefly, in Section 4.6, such that the later chapters of this book can expand the discussion in more technical depth. Lastly, in Section 4.7, we revisit some commonly encountered issues in energy forecasting research, as have seen in Chapter 2, based on our experience as editors of elite journals that publish energy forecasting papers, so as to offer a list of recommendations that can facilitate a more coherent scientific environment for energy forecasting research.

4.2

LOAD FORECASTING: TINY DIFFERENCES MATTER

Loading forecasting is needed for just one grand purpose, namely, to assist in the balancing between generation and load. This goal, however, can only be achieved with consistent and effective coordination among a series of intricate power systems planning and operation procedures on drastically diverse time scales, ranging from formulating energy policy decades ahead to performing frequency regulation at intervals of a few minutes. Since the availability of electricity is closely tied to the quality of life, and in many situations, life itself, load forecasting has hitherto been assigned with a cardinal importance in power system engineering, because it is the prerequisite of, and thus its quality directly affects, all subsequent operations. Considering the fact that real-life power system disruptions for mature power grids occur at an exceptionally rare rate (also known as the black swan events), one has reasons to assume that the present technologies for load forecasting are acceptable, at least to a very large extent (or that the power system is operated so conservatively that it is robust to load forecast errors). However, since those highly improbable black swan events, according to the terminology of Taleb (2007), are almost always associated with enormous impacts that can hardly be anticipated ex ante, even the smallest improvement in the ability to foretell such events would be highly rewarding. On another note, it is worth mentioning that the error difference between load forecasts prepared by a novice and a very skillful forecaster may be only 1%, whereas good and bad solar or wind forecasts may differ by tens of percent. For both reasons,

88

Solar Irradiance and Photovoltaic Power Forecasting

the title of this section—tiny differences matter—sufficiently narrates the status quo of load forecasting. Figure 4.2 presents an overview of load forecasting applications and classification. As mentioned earlier, short-term load forecasting (STLF) and long-term load forecasting (LTLF) serve fundamentally different purposes and are underpinned by substantially different forecasting philosophies, any discussion on methodology and status quo in regard to load forecasting should therefore make such a distinction. Insofar as the goal of this chapter is concerned, it is STLF that is of primary interest. As such, we should refer the readers to Lindberg et al. (2019); Carvallo et al. (2018); Willis (2002); Willis and Northcote-Green (1983) among other references for LTLF. (As evidenced by the journals in which these LTLF papers were published, the topic is more relevant to utility and energy policy, rather than the kind of energy forecasting we are accustomed to.) hourly ahead scheduling demand response

energy unit commitment trading

day-ahead scheduling

system energy planning policy

time [s] 100 second

103 hour STLF VSTLF

106 day week month

109 year decade LTLF

MTLF LTLF STLF 1 day 3 years 2 weeks

Figure 4.2 The correspondence among power system operations, load forecasting time scales, and terminology. VSTLF, STLT, MTLF, and LTLF stand for very-short-term, shortterm, medium-term, and long-term load forecasting.

4.2.1

A BIRD’S EYE VIEW ON LOAD FORECASTING

Owing to the large amount of load forecasting works in the literature that is constantly expanding at a high rate, it is almost surely not possible for anyone to examine in detail all past publications or to follow all newly appeared ones. The most efficient action as to quickly grasping the idea of load forecasting is to read review papers. Reviews on load forecasting, or for any scientific domain of study in general, can be divided into two kinds: One of those is conceptual and the other empirical. Whereas

Solar Forecasting: The New Member of the Band

89

the former focuses more on the essence, which requires high-level overviews and advice and thus might be subjective, the latter pays attention to form, which is more superficial but might be more objective. More specifically, in a typical conceptual review, one should expect not just a typology of existing works (i.e., “putting papers into different categories”) but also the authors’ viewpoints on the pros and cons of each class of methods. Additionally, good conceptual reviews should contain certain advisory elements, including listing the good research practices for that domain, summarizing the common pitfalls or mistakes propagating in the literature, and providing future research directions. In a typical empirical review, one may expect not just a list of methods and their algorithmic details but also quantitative results of comparing several methods. However, limiting all empirical reviews is their nonexhaustive nature, in that, the comparison is, more often than not, restricted to a countable few methods. In the u¨ berreview conducted by Hong and Fan (2016), a good collection of conceptual and empirical reviews published prior to 2016 has been offered, and we shall not repeat the effort here. Although there are other reviews on load forecasting published post-2016 (e.g., Kuster et al., 2017), other than enumerating a few newly published techniques, these new reviews do not go beyond the previous ones in terms of crystallizing the core philosophy and theory of load forecasting. Load forecasting techniques are many, but most do not fall outside the classification of statistics versus machine learning, which is not mutually exclusive as we have explained in the previous chapter. In essence, regardless of which class a method belongs to, load forecasting is perpetually a regression problem, in which an eclectic mix of predictors is used to explain the load at a future instance. In setting up such a regression, one may opt for black-box models (e.g., Ben Taieb and Hyndman, 2014; Hippert et al., 2001) or non-black-box models (e.g., Goude et al., 2014; Ramanathan et al., 1997), univariate models (e.g., Taylor and McSharry, 2017, 2007) or multivariate models (e.g., Fan and Hyndman, 2012; Charlton and Singleton, 2014), or a combination of them. The best modeling trick, as advocated by Hong and Fan (2016), is to separate the regression into two steps. In the first, a non-black-box and multivariate model could be employed to capture the salient features (see below) that can explain the large-scale variations in load time series, and a black-box and univariate model could be employed to explain the residual series with small-scale variations; combining the two forecasts often yields better results than single-stage forecasting. Speaking of salient features of electric load, the first and foremost ought to be those days with similar patterns in a load time series, which are direct results of the resemblance in electricity consumption behaviors. Such resemblance could be conditioned on day-of-the-week, calendar events, or similar weather regimes, e.g., the residential electricity consumption during weekdays is very different from that during weekends. Clearly then, if the influencing factors of load patterns can be known, which are in fact known most of the time, one may search for similar conditions in the historical time series, and use the materialized load patterns as forecasts—this is known as the similar-day method, which is often taken as a benchmarking model in load forecasting (Chen et al., 2010; Mandal et al., 2006). Moving beyond the similarday method, the fact that not all influencing factors have the same explanatory power

90

Solar Irradiance and Photovoltaic Power Forecasting

to the materialized load patterns should be respected; the problem here is in regard to variable selection. Since variable selection is a general issue concerning all regression problems in which predictors are numerous (see Section 4.4 also), the classic book reference by Hastie et al. (2009) constitutes a good compendium of generalpurpose methods and tricks for variable selection. Third, electric load has by nature a hierarchical structure, where the lower-level (e.g., at individual households or on distribution feeders) load profiles sum up to the higher-level (i.e., within transmission zones or on the whole of the interconnections) load profiles. Hierarchical load forecasting, which explicitly takes into consideration the effects of load aggregation, has been a key research frontier (Hong et al., 2019). Given the cardinal importance of hierarchical forecasting, it is to be further elaborated in Section 4.6.4 and in Chapter 12. Last but not least, load transients are related to weather, and soliciting weather forecasts to assist load forecasting is commonplace. Whereas ambient temperature is well known to be the most influencing weather factor affecting electricity consumption (Xie and Hong, 2018a)—due to the need for heating and cooling via electrical appliances—relative humidity (Xie et al., 2018a) and wind speed (Xie and Hong, 2017) also have non-negligible effects. Given the maturity of load forecasting, advancing the field further has become a daunting task for most. Hong and Fan (2016) consolidated five aspects which signify novelty: (1) new problems, (2) new methodologies, (3) new techniques, (4) new datasets, and (5) new findings. Although these aspects are clearly the pursuit of most forecasters or most of the academia in general, what is new and what is not is to a very large extent debatable, and any judgment made in regard to that question is related to one’s depth of understanding on the subject. Therefore, such statements do not seem to have an indisputably clear advisory effect. However, one thing is quite certain, that is, if a load forecasting problem is driven by real industry needs, then it is definitely worth pursuing. This is perhaps a guideline that solar forecasters should follow closely. 4.2.2

GAUGING THE GOODNESS OF LOAD FORECASTS

One of the major considerations when gauging the goodness of forecasts is quality assessment. Among the many aspects of quality, accuracy is often regarded as the chief one. Accuracy measures, such as the root mean square error (RMSE) or mean absolute error (MAE), compare forecasts to observations, and the smaller their differences are, the better the forecasts are said to be. In the domain of load forecasting, the mean absolute percentage error (MAPE) is certainly amongst the most popular choices. Given forecasts xt and observations yt , with t = 1, . . . , n indexing the verification samples, MAPE is calculated as:   100% n  yt − xt  (4.1) MAPE = ∑  yt  . n t=1 As mentioned by Hong and Fan (2016), the typical day-ahead load forecasting error is a mere 3% for a medium-sized US utility with an annual peak of 1–10 GW. This error must appear satisfactory for those who have acquaintance with solar or wind

Solar Forecasting: The New Member of the Band

91

forecast accuracies. Nonetheless, since 1% MAPE can translate into several hundreds of thousands of dollars per GW peak (Hong and Fan, 2016), any small improvement in MAPE should be deemed rewarding. That said, insofar as any difference in forecast accuracy is to be interpreted, one ought to be aware of two things. One is that the measurements themselves are uncertain, owing to the inevitable instrument and logging imprecision. Although no data acquisition system can be regarded as perfect, the standard practice in forecasting is to ignore measurement errors during verification and treat the measurements as perfect. Therefore, largely driven by this pragmatism, we shall not discuss, for now, ways to account for measurement uncertainty. The other thing about interpreting forecast accuracy is sampling uncertainty, which is also often neglected. A specific set of verification data can be seen as just one possible sample drawn from a population characterized by some fixed attributes. Consequently, the different accuracy measures serve as finite-sample approximations of the “true” values within the population, making them susceptible to sampling uncertainty. In load forecast verification studies, it has been atypical to evaluate this sampling uncertainty. However, without such an assessment, it is impossible to ascertain whether observed differences in accuracy are genuine or merely caused by random fluctuations (Jolliffe and Stephenson, 2012). This problem is all the more important when the differences in load forecast accuracies are more often than not tiny. To give perspective on how tiny these differences can be, we turn to the work of Xie and Hong (2018b), in which the authors reported the MAPE of a series of models in the form:   (4.2) g(h, d) = β0 + β1 Trendt + β2 Ht ×Wt + ∑ f (Tt−h ) + ∑ f T˜t,d , h

d

where Trendt is a chronological trend, Ht is the hour of day, Wt is the day of week, Tt−h is the lagged temperature of the hth previous hour, T˜t,d is the 24-h movingaverage temperature of the d th previous day, and f (T ) is a polynomial function of its argument T and some calendar predictors. With Eq. (4.2), Xie and Hong (2018b) tabulated the MAPEs of 200 different models, each with a specific combination of h ∈ {0, . . . , 24} and d ∈ {0, . . . , 7} values, applied to the GEFCom2014 dataset. Whereas the h = d = 0 case yielded a MAPE of 6.481%, which has the worst performance due to the complete ignorance of lagged temperature variables, all other models with h > 2 and d > 2 perform similarly, with MAPEs ranging from 5.232– 5.352%, and a smallest difference of 0.002%. More interestingly, when the accuracy measure is changed from MAPE to MAE, the best-performing h–d pair also changes, indicating some instability in the model performance.3 There are at least two methods that can deal with sampling uncertainty during verification. The first is bootstrapping, which is a method for estimating the variance and the distribution of a statistic. By drawing out n samples from the verification set with replacement, the MAPE (or any other accuracy measure for that matter) could be evaluated numerous times, each would differ from others by some small margin 3 It is common for model to rank differently under different accuracy measures, but the situation here is believed to be due to the aforementioned sampling uncertainty.

92

Solar Irradiance and Photovoltaic Power Forecasting

due to the different sample sets. As such, the cumulative distribution function (CDF) of MAPE could be estimated, giving some idea of how variable the MAPE could be. The second method is hypothesis testing, which is epitomized by the Diebold– Mariano test, which tests the null hypothesis of no difference in the accuracy of two competing sets of forecasts (Diebold and Mariano, 1995). 4.2.3

EXAMPLES OF RECENT LOAD FORECASTING INNOVATIONS

Although the development of load forecasting methods and techniques has already a long history as compared to other energy forecasting domains, it still receives persistent enthusiasm. Examining the recent works on load forecasting, several trending aspects could be consolidated. Machine learning undoubtedly occupies a large fraction of the recent load forecasting literature. However, bare applications of machinelearning methods that are unseen by the load forecasters can no longer collect sufficient novelty that can lead to publication. Instead, the questions that ought to be answered are why and how a particular machine-learning tool could be useful for some specific load forecasting scenario. For instance, Wang et al. (2021) applied in sequence the temporal convolutional network and light gradient boosting machine, in order to address the high volatility in individual industrial loads, which are otherwise not seen in loads on distribution-feeder or transmission-zone levels. Indeed, with the higher visibility and granularity of load data permitted by internet-of-things technology, we are now able to monitor and thus forecast load in a more delicate fashion. The main challenge is with the optimal utilization of such big data, which has become a characteristic feature in modern load forecasting. Works that exploit individual load measurements (i.e., via smart meters) and exogenous weather information (i.e., via weather stations) are not rare (e.g., Goehry et al., 2020; Fekri et al., 2022), and methods that deal with data anomaly detection (Sobhani et al., 2020) and weather station selection (Moreno-Carbonell et al., 2020) are available in decent numbers. Irrespective of the methods, the overarching aim of load forecasting with big data is to enable hierarchical and probabilistic modeling of load, which is thought to be able to best serve the power system operations as depicted in Fig. 4.2. Moving beyond the more general perspective, there are works that deal with some very specific issues concerning load forecasting. For instance, rather than focusing on forecasting the entire load profile over the horizon of interest, Amara-Ouali et al. (2023) presented a methodology for forecasting just the peak load, for that has the greatest impact on power system generation–load balancing. In forecasting the peak load, one common strategy is to decompose the variation in a load time series into several scales, such that the one that corresponds to the peak load can be scrutinized more closely in terms of modeling (Huang et al., 2022; Xu et al., 2019). Another example is that Jiao et al. (2022) considered the resilience to cyberattacks (e.g., altering or disabling the flow of load data) in a load forecasting context. More generally, data integrity attacks could be argued as a major threat in the information era, and thus their prevention and mitigation have attracted actions that seek to enhance robustness during model construction (e.g., Luo et al., 2023, 2019; Gu et al., 2014).

Solar Forecasting: The New Member of the Band

4.3

93

WIND FORECASTING: IT IS ALL ABOUT WEATHER

The desire for accurate wind forecasting comes in sync with the rise of wind power penetration worldwide. Since wind speed is a decisive driving factor of wind power, much effort has been pouring into the sensing and forecasting of wind speed. The variability and volatility of wind introduce uncertainty and intermittency, which challenge our ability to tame the wind power on two time scales. One of those ranges from milliseconds to seconds, which is relevant to the control of the turbine itself (Giebel and Kariniotakis, 2017). Turbulence at this time scale is usually sensed by a lidar in the nose of the turbine, and subsequently, the sensed wind field is advected forward in time a few seconds (i.e., forecasting), such that the control mechanism of the turbine is able to respond to it, e.g. adjusting the yaw. The second time scale of interest ranges from minutes to weeks, and forecasts on that scale primarily concern two groups of forecast users, namely, energy market participants and power system operators, who are both involved in the grid integration of wind energy (Sweeney et al., 2020). In the literature, wind forecasting almost always refers to the second time scale, because the forecasting of the first is qualitatively different from the rest of the energy forecasting methods described in this book. In a nutshell, wind forecasting can be categorized simply based on whether or not numerical weather prediction (NWP) is involved (Giebel and Kariniotakis, 2017). If the forecast horizon is short, i.e., < 3 h, statistical and machine-learning methods come in handy, and forecasters, in this case, pay much attention to the salient features of wind speed, which include primarily: (1) alternating atmospheric regimes, (2) spatio-temporal correlation, (3) diurnal and seasonal nonstationarity, (4) conditional heteroscedasticity, and (5) non-Gaussianity, as summarized by Gneiting et al. (2006). For horizons longer than 3 h, NWP has to play a part, since the statistical or machine-learning-based extrapolation of data loses accuracy after a few hours (Sweeney et al., 2020). In the former case, one has the option to operate on wind power directly, but considering spatio-temporal correlation, which often demands wind speed measurements, is a strategy that is more commonly seen. In the latter case, wind power forecasts have to be converted from NWP-based wind speed forecast, e.g. via the wind power curve (see Wang et al., 2019c, for review). Figure 4.3 shows a typical wind power curve, from which the non-injective mapping from speed to power is evident. 4.3.1

NWP AND WIND SPEED FORECASTING

NWP constitutes the primary way, and possibly the only sensible way, to forecast weather, and it can issue forecasts for almost all weather variables and events that are of interest to mankind. Producing NWP forecasts takes two steps. In the first, omni-channel weather observations are assimilated to come up with the best possible characterization of the current state of the atmosphere. Then in the second step, the initial state is developed outwards according to the best-known intrinsic laws of the atmosphere. Whereas Chapter 7 gives a much more detailed explanation of this procedure, it is useful to distinguish at this stage two kinds of variables issued by

Solar Irradiance and Photovoltaic Power Forecasting

Normalized wind power

94

1.00 0.75 0.50 0.25 0.00 0

5

10 Wind speed [m/s]

15

Figure 4.3 A typical wind power curve, which describes the relationship between 100-m hub-height wind speed and wind power. Darker colors mean more points in the vicinity.

an NWP model, one prognostic and the other diagnostic. Generally, the distinction can be made by knowing the calculation method of the variable: If the calculation involves integrating an equation forward in time, the resultant variable is prognostic; and if the calculation does not include a time derivative, the resultant variable is diagnostic. Based on this definition, wind speed is more often than not viewed as a prognostic variable, because it can be forecast using the momentum equations, which describe how the time derivatives of the three-component wind are related to various forces acting on an air parcel, such as pressure gradient force, gravity, friction, or Coriolis force, see Eqs. (7.1)–(7.3). That said, one should be aware that NWP only produces forecasts on certain vertical levels. For wind variables that are not on those levels, such as the 10-m wind that is often relevant to wind power forecasting, they need to be diagnosed, e.g., by interpolation between the lowest model level and surface assuming the same gradient functions as in the scheme of surface process. In the current wind power forecasting literature, NWP wind forecasts are often used “as is” or “as available,” and very few wind forecasters would inquire about the choice and origin of those NWP wind forecasts. Indeed, NWP wind forecasts are true to the physical representation and parameterization of the atmosphere as described by the NWP model. Notwithstanding, they are dependent upon the quality of input data, and thus can deviate from reality through amplification of errors on different scales as the forecast progresses. As mentioned in Chapter 3, ensemble forecasting is one way to reduce the effect of NWP error growth, and this has been repeatedly confirmed in terms of wind forecasting (e.g., Siuta and Stull, 2018; Bossavy et al., 2013; Pinson et al., 2009), with notable exceptions (e.g., Alessandrini et al., 2013), which are likely due to the fact that ensemble NWP forecasts are often produced on a coarser grid than deterministic NWP forecasts. Besides ensemble modeling, considering a more detailed structure of the lower-layer atmosphere is also important, as surface winds in the boundary layer are backed and decreased from the geostrophic values through the retarding action of surface friction. In NWP models, this information is provided by the land surface–atmosphere parameterization schemes, which describe how surface

Solar Forecasting: The New Member of the Band

95

features interact with the evolution of the planetary boundary layer. Because different NWP models have disparate settings (e.g., grid spacing and parameterization choice), wind speed forecasts produced by different models may be quite dissimilar in terms of performance. In short, when multiple NWP choices are present, which is almost always the case, understanding the physical considerations and assumptions made by each NWP model is beneficial to interpreting model output and thus maximizing the utility of those NWP forecasts. This should not be regarded as a trivial task. One obvious strategy to select appropriate NWP wind forecasts for wind power forecasting is through verification. One of the basic ideas of forecast verification is to compare forecasts with observations using measures of quality. That said, in verifying NWP wind forecasts one should be cautious, and in particular should be aware of several facts. First is that local surface winds depend strongly upon local exposure, which has to do with the surrounding environment of the measuring site. When the verification sites are not well exposed, winds in some directions may not be measured in a reliable manner, therefore contaminating the validity of the “ground-truth.” Next, it should be noted that surface winds vary on small spatio-temporal scales, whereas NWP wind forecasts represent only the grid average and a time integral of forecast resolution. The underlying issue here is scale mismatch, which again distorts our perception of the quality of NWP forecasts, which may be better or worse than they are verified to be. Last but not least, near-surface wind forecasts are usually poor in mountainous areas owing to the difficulties in parameterizing the complex land surface and highly varying subgrid orography.4 In that it is not wise to reject some NWP model based on just poor apparent verified quality; instead, the difficulty in the forecasting situation, i.e., predictability, must be factored in. Since local solar irradiance, just like the local surface wind, is a subgrid process, all of the above considerations apply, which is an important lesson that must be learned by solar forecasters. 4.3.2

WIND POWER CURVE MODELING

The role of a wind power curve is to create a mapping from wind speed and other auxiliary variables to wind power, such that when a new set of input variables arrives, the corresponding wind power can be predicted. Rooting on this description it is clear that the application of the wind power curve is not just limited to forecasting but also resides in wind resource assessment, wind power plant design, siting, sizing, and performance evaluation. A theoretical wind power curve consists of four regions. When the wind speed is less than the cut-in speed (i.e., region 1), the wind turbine does not generate any power. Once the speed exceeds the cut-in speed (i.e., region 2), the power generated by the turbine follows: 1 P = C p ρπR2W 3 , 2

(4.3)

where C p , ρ, and R represent respectively the wind turbine power coefficient, air density, and wind rotor radius. Equation (4.3) suggests a monotonically increasing wind power with wind speed. Nevertheless, once P reaches the rater power of the turbine 4 Orography

is a branch of physical geography that deals with mountains.

96

Solar Irradiance and Photovoltaic Power Forecasting

(i.e., region 3), it stays at that level even if wind speed continues to rise. Eventually, when the wind speed exceeds the cut-off speed (i.e., region 4), the turbine is shut down to avoid damage, and the power output is again zero. Needless to say, this theoretical wind power curve that portrays a one-to-one relationship between speed and power deviates from reality. More sophisticated wind power curve modeling is thus motivated. The key inquiry is two-fold: Which variables are relevant and how they are linked to wind power? The modeling of wind power curves has been thoroughly reviewed by Wang et al. (2019c), to whom the readers are referred. Wind power curve modeling is fundamentally a curve (or hyperplane) fitting problem, in which empiricism cannot be circumvented. More specifically, there are two broad classes of power curve modeling techniques, one presupposes a function form and the other uses regression to learn a function form. In the former case, linearized segmented models (Lu et al., 2002; Torres et al., 2003), polynomial power curves (Carrillo et al., 2013; Lydia et al., 2014), sigmoid function-based power curve (Osadciw et al., 2010), and Gaussian CDF-based power curve (Osadciw et al., 2010), among many other curves, are possible. A common trait of these power curve models is that they all employ Sshaped functions, of which the motivation is easily understood from the wind speed– power relationship exemplified in Fig. 4.3). In the case of the latter, many statistical and machine-learning regressions can be readily applied; as the idea is direct and variants are many, it is thought not necessary to enumerate examples on this matter. One commendable feature of the review by Wang et al. (2019c) is that it uses six real-world datasets to evaluate the performances of numerous power curve models, despite the so-claimed finding can be expected a priori—no wind power curve modeling strategy may be deemed universally optimal in terms of performance. Another key feature of that review is the emphasis on handling data outliers, which result from unexpected operating conditions, such as wind curtailment or blade damage. This stimulates data preprocessing before modeling. The last key ingredient of that review is a discussion on the error distribution of the power curve modeling, which is found to be asymmetric. To address that, the review proposes several strategies that focus on designing asymmetric loss functions and developing robust regression models with asymmetric error distributions, although those proposals are largely conceptual and without empirical evidence supporting their validity. 4.3.3

STATISTICAL AND MACHINE LEARNING IN WIND FORECASTING

Statistical and machine learning, in wind forecasting, serve three differing purposes: (1) post-processing of NWP forecasts, (2) wind power curve modeling, and (3) directly generating wind power forecasts. Regardless of which purpose these techniques serve, the underlying mathematical problem is one of regression. Additionally, the overarching guideline should be uncontroversial, that is, modern wind forecasting has evolved to a stage where the simple application of a new regression tool is neither attractive nor efficacious, instead, the data-driven regression tools must come together with physics and salient features of wind speed. (It is almost completely the same in solar forecasting.)

Solar Forecasting: The New Member of the Band

97

To crystallize how statistical and machine learning can play a part, we should wish to proceed with and highlight four challenges that were outlined by Pinson (2013) concerning wind forecasting. The first challenge perceived by Pinson (2013) is in regard to information utilization and extraction. It has been argued that NWP models moving forward are bound to improve in terms of both modeling and resolution, which could lead to leaps in data diversity and volume. On the other hand, statisticians should make full use of such big data and even combine it with information from other sources. The focal technology should therefore be the handling of high-dimensional spatio-temporal data. Data aggregation and correlation analysis are no new concepts, but their efficacy could be enhanced by better tools. Though not mentioned by Pinson (2013) at the time of writing, it is now well recognized that deep-learning methods, convolutional neural network in particular, are suitable for automatic feature extraction and aggregation owing to the network structure, which consists of convolutional layers and pooling layers. Evidently, that proposition made back then has turned out to be useful, and many works along this track emerged after deep learning popularized in mid 2010s (Wu et al., 2021; Chen et al., 2019b; Zhu et al., 2018). That said, most existing works using deep learning remain restricted to using ground-based sensor network data rather than NWP, which is one aspect that demands continuous attention. The second challenge identified by Pinson (2013) is on the operational side of things. More specifically, as the population of wind forecast practitioners expands, algorithmic sophistication and procedural complexity are not always justified. More important is how forecasts can be integrated with existing operational decisionmaking systems. This proposition aligns well with that of Yang (2019a), who advocated abandoning forecasting exercises that contradict the grid-integration requirements. One notable recommendation of Pinson (2013) is to morph the current norm of producing single-location forecasts towards multivariate probabilistic forecasts. On this point, Tastu et al. (2015) echoed and proposed a copula-based method to forecast wind power generation, which can issue space–time trajectories consisting of paths sampled from high-dimensional joint predictive densities. Although there are works using different approaches to issue multivariate probabilistic wind forecasts (Bessac et al., 2018), the acceptance of such practice is still low, perhaps owing to its difficulty. There is a review on this topic that might help with peripheral understanding and serve as a reference list, but the discussion and outlook presented therein are not conducive to gaining an in-depth perspective on the remaining technical difficulty (Sørensen et al., 2023). The last two challenges listed by Pinson (2013) both pertain to forecast verification. One of those is verifying probabilistic forecasts under big dimensionality and the other is making correspondence between forecast quality and value. Verification is one topic that is of major concern to this book. At present, the verification procedures used by wind forecasters (and solar forecasters for that matter) are based in large part upon the countable few seminal works, such as those by Murphy (1993); Murphy and Winkler (1987); Gneiting (2011); Gneiting et al. (2007); Gneiting and Raftery (2007). But regardless, all of these works exclusively focus on the verification of univariate forecasts. Moving forward to verifying multivariate forecasts,

98

Solar Irradiance and Photovoltaic Power Forecasting

Gneiting et al. (2008) extended the existing approaches and showed in the new context the computation of skill scores and use of diagnostic tools. This has been adopted by Pinson and Girard (2012), but the general uptake is still scant. As for the correspondence between forecast quality and value, the concept is not hard to grasp, yet few have demonstrated actual scenarios. Insofar as grid integration is of concern, the value of wind (and solar) forecasts perpetually lies in how different players in a power system could benefit from using a certain forecast over other alternative forecasts. May it be day-ahead scheduling or another power system operation as depicted in Fig. 4.2, the quality of forecasts has to be translated into dollar value or environmental reward. Clearly then, the value acquired in one scenario can rarely be transferred as an estimate of that in another, which is perhaps why the materialization of value has hitherto remained opaque.

4.4

PRICE FORECASTING: CAUSAL RELATIONSHIP

The purpose of electricity price forecasting consists of predicting the spot and forward prices in wholesale markets. Since electricity cannot be stored economically, with demand and supply being weather- and business-activity-dependent, the resulting spot price dynamics exhibits seasonality at the daily, weekly, and annual levels, and abrupt, short-lived, and generally unanticipated upward—and recently also downward—price spikes. The forward prices are less volatile due to averaging over weekly, monthly, quarterly, or even annual delivery periods. For a comprehensive review of price forecasting until 2014, the reader is referred to Weron (2014), and to Lago et al. (2021); Nowotarski and Weron (2018); Ziel and Steinert (2018); Amjady (2017) for updates and advances since. 4.4.1

MARKETPLACES AND FORECASTING HORIZONS

Given the diversity of trading regulations available across the globe, price forecasting must always be tailored to the specific market. For instance, the main mechanism of the European short-term power trading takes place in the day-ahead market (DAM), in which uniform-price auction is conducted once a day. There, every morning wholesale sellers and buyers submit their bids and offers to the power exchange for the delivery of electricity during each hour or each block of hours of the next day, see Fig. 4.4. On the other hand, Australian and Canadian power markets operate as power pools, where only suppliers submit their bids, but do it in a continuous, real-time manner and the published prices are averaged across 30-min intervals (Mayer and Tr¨uck, 2018). One should also note that the term “spot” takes different meanings in different parts of the world. In Europe, it typically refers to the dayahead price, whereas the real-time market for transactions within the day of delivery is called intra-day. However, in Australia and North America, it is typically reserved for the real-time market (RTM), while the day-ahead price is called the forward price (Weron, 2014).5 5 The forward electricity market allows electricity buyers to purchase a future contract, thereby protecting them from volatile spot prices by fixing electricity prices over a specific period of time.

Solar Forecasting: The New Member of the Band day d − 2

day d − 1

99 day d time

bidding for day d − 1 bidding for day d

Figure 4.4 An illustration of the day-ahead auction market. Before gate closure on day d − 1, agents submit their bids for the delivery of electricity during each hour of day d. From a trading perspective, the timeline of events starts a few years before delivery. Initially, annual contracts (bilateral or exchange traded) are used to construct a portfolio roughly matching the expected future exposition. As the delivery approaches, these transactions are split into shorter-term forward-type contracts with quarterly, monthly, weekly, or (in some markets) even daily delivery periods. With lead times ranging from weeks to years, medium-term price forecasting is relevant for maintenance scheduling, resource allocation, derivatives valuation, risk management, and budgeting. It covers horizons for which reliable meteorological predictions are not available and generally ignores the impact of political and technological uncertainty (Ziel and Steinert, 2018). Eventually, with only one day left to delivery, the power portfolio is adjusted in the DAM (or in the RTM if the day-ahead one does not exist). Although the volume of day-ahead transactions is typically much lower than that of the forward contracts, day-ahead prices play a central role in price setting, particularly in Europe. Market indexes are built on day-ahead prices, they are also used as reference points for bilateral and exchange traded derivatives (Weron, 2014). Their importance is reflected in the literature, with over 90% of publications concerning short-term price forecasting, particularly DA price forecasting. Since high-quality forecasts of meteorological variables, such as temperature, wind speed, or cloud cover, are available for such horizons, intricate models with hundreds of inputs can be built for day-ahead prices (Weron and Ziel, 2019). As the day-ahead auction closes, market participants enter the intra-day/real-time markets, which run until a few minutes before delivery. Their main purpose is to balance deviations resulting from positions in day-ahead contracts and unexpected changes in demand or generation (see Gianfreda et al., 2016). This situation is identical to that of unit commitment in the power system, where generators are scheduled at least twice, one for day-ahead and the other for real-time market, so as to absorb the uncertainty in earlier decisions. In most European countries, intra-day markets operate in continuous time (e.g., Germany, France, Poland, or the United Kingdom), but in some, they are organized in the form of multiple consecutive auctions (e.g., Italy or Spain). The literature on forecasting of intra-day/real-time prices is scarce but rapidly growing (e.g., Narajewski and Ziel, 2020; Uniejewski et al., 2019; Kath and Ziel, 2018). At the high-frequency end of the trading timeline, all generation or demand that is not sold or bought in the intra-day/real-time market is left for the balancing market

100

Solar Irradiance and Photovoltaic Power Forecasting

(Kath et al., 2020), a technical market operated by the power system operators to guarantee system stability. The literature on predicting balancing prices is even more scarce than that of the intra-day case (Janczura and Michalak, 2020; Maciejowska et al., 2019; Klæboe et al., 2015). This is likely due to two factors present in many markets: (1) extreme volatility and (2) the delay in publishing the settlement prices, which makes (very) short-term forecasting problematic. At the low-frequency end of the timeline, beyond the horizons considered in trading, there is long-term planning. In this context, risks of regulatory, technological, economic, social, and political nature come into play. They are extremely hard to predict, which makes long-term price forecasting almost impossible (Ziel and Steinert, 2018). Although there are studies that consider long-term scenarios or projections (Lund et al., 2018), typically they do not associate probabilities with these paths, hence they can hardly be regarded as meaningful forecasts. 4.4.2

SHORT-TERM PRICE FORECASTING MODELS

Short-term price forecasting is dominated by data-driven methods, which can be either statistical or based on machine learning. Once again it should be highlighted that there is really no clear distinction between the two, and forcing a classification may cause ambiguity. For instance, as mentioned in Section 3.3.3, regularization techniques such as the lasso or elastic nets (Hastie et al., 2019) are regarded by some as machine learning, and by others as statistical learning or, simply, statistical techniques. As has been argued in the load forecasting section, inasmuch as forecasting is concerned, the underlying scientific technique for both statistical and machinelearning forecasting methods is regression. A more relevant question to ask is rather what causal relationship can one identify in order to improve the quality of forecasts. For the sake of surveying the literature, this section only briefly introduces the modeling concept of price forecasting. Most models in the literature rely on linear regression and represent the dependent (or predictand, regressand, response, explained) variable, that is the price for day d and hour h (denoted with Yd,h ), by a linear combination of independent (or predictor, regressor, feature, explanatory) variables: 

(1)

(m)



Yd,h = x

d,h β h + εd,h ,

(4.4)

is a hour-specific length-m column vector of regreswhere β h = βh , · · · , βh   (1) (m)

sion weights, and x d,h = xd,h , · · · , xd,h is a m-dimensional input vector, and εd,h is an error term. Usually, one of the inputs should be a constant 1, which allows the regression to have an intercept; this intercept term can be omitted if the data is demeaned beforehand. It is noted that the notation used here is common in day-ahead forecasting, which emphasizes the vector structure of these price series. Alternatively, one could use single indexing: Yt with t = 24d + h; see Ziel and Weron (2018) for the pros and cons of using uni- and multivariate frameworks in price forecasting. A basic way of estimating the parameters of the regression model as described by Eq. (4.4) is to perform ordinary least squares, which minimizes the in-sample

Solar Forecasting: The New Member of the Band

101

residual sum of squares (RSS), i.e., squared differences between the fitted and actual values. However, if the regressors are numerous, say more than a dozen or two, it may be advisable to utilize regularization techniques, like the lasso (Uniejewski et al., 2019; Ziel and Weron, 2018; Ziel, 2016) or the elastic net (Uniejewski et al., 2016). Regularized regressions minimize jointly the RSS and a penalty factor—i.e., ( j) ( j) ( j) λ ∑mj=1 |βh | for the lasso, and sum of λ1 ∑mj=1 |βh | + λ2 ∑mj=1 (βh )2 for the elastic net, where λ ’s are penalty strength. As a result, they not only shrink the β ’s towards zero but also set some of them to exactly zero and thus act as a selector to exclude redundant regressors from the model. Whereas the classical lasso and elastic net regressions deal with the mean, they can be extended to probabilistic regression by replacing the dependent variable with the τ th quantile (Koenker, 2005). In this case, the β ’s can be estimated by minimizing an asymmetric piecewise linear scoring function, called the pinball loss, which is a strictly consistent scoring rule that is repeatedly mentioned throughout the book. The popularity of regularized regression in price forecasting is largely due to the need to shrink and select predictors from a large pool of potentially relevant variables. When the variables are many and confounding, artificial intelligence becomes tempting, for its perceived ability to make prediction “black-box” functions. Like in load forecasting, artificial neural network (ANN) is one of the pillars of the machinelearning-based price forecasting literature (e.g., Dudek, 2016; Abedinia et al., 2017; Marcjasz et al., 2019). Quite often they are used as elements of hybrid models, which also include decomposition, feature selection and/or clustering techniques (Amjady, 2017; Weron, 2014). Support vector machines (SVM), fuzzy systems, and evolutionary computation (genetic algorithms, evolutionary programming, swarm intelligence) are among the most commonly used (Weron and Ziel, 2019). Regularized regression models and ANNs often have the same input and output, but the hidden layers (in multilayer perceptrons, MLP) and the directed loops to the same or other layers (in recurrent neural networks, RNN) make a difference. A feed-forward architecture is already able to represent nonlinear phenomena, but recurrent architectures, e.g., composed of long short-term memory (LSTM) units or the simpler gated recurrent units (GRU), offer much more, such as the ability to model temporal dynamic behavior. Regardless of which method is used, one must ensure the opted model has the capacity to issue multivariate forecasts, which can be either a vector of 24 hourly prices for the next day, or a set of percentiles of the predictive distribution for a given time point, depending on the use case. In a regularized regression setting, instead of independently estimating 24 models defined in Eq. (4.4), all 24 can be represented by one matrix and jointly estimated via multivariate least squares, multivariate Yule– Walker equations, or maximum likelihood (L¨utkepohl, 2005); the resulting model is called the vector autoregressive (VAR) model. In case of ANNs, the output layer may simply consist of 24 neurons for day-ahead price forecasts, and likely require more hidden neurons to adequately represent the dependencies. Although multivariate models usually provide a better in-sample fit, their predictive performance does not have to be better. This, indeed, was observed for VAR models in Ziel and Weron (2018).

102

4.4.3

Solar Irradiance and Photovoltaic Power Forecasting

MODELING TRICKS FOR PRICE FORECASTING

Forecast quality of electricity price largely depends on the availability of the large set of fundamental drivers that can affect the market mechanism, which include but are not limited to electric load, weather (such as wind, temperature, precipitation, or solar irradiance), fuel price for oil and natural gas, and to a smaller extent coal, reserve margin, and the scheduled maintenance or forced outages of grid components. However, as the exact causal relationships between drivers and price are often indirect, e.g., solar irradiance affects solar power generation and then price, the limitation of regression-based forecasting is apparent, since it is unlikely that a single numerical coefficient in front of solar irradiance is able to represent all of its effects. Practically, however, one may have no other choice but to assume useful those coefficients that are statistically significant. Given the numerous drivers, one can consider explosively many models, by permuting the input space. Nonetheless, there are several modeling tricks that can be claimed as general. More specifically, they are: (1) variance stabilization, (2) seasonal decomposition, and (3) averaging across calibration windows, which are discussed in the next three paragraphs. The approaches discussed in Section 4.4.2, particularly those employing linear modeling, are sensitive to outliers in the training set. Given that price spikes are present in most electricity markets, an appropriate forecasting framework would require a dedicated treatment, e.g., via robust estimation algorithms (Grossi and Nan, 2019). This treatment, however, has received limited uptake in practice, likely due to its complex mathematical machinery. But there are two workaround solutions. One uses filters to remove outliers from the training set (Janczura et al., 2013); the obvious drawback is that models that are not calibrated to spiky data cannot predict spikes. The other applies variance stabilizing transformations (VSTs) to reduce the impact of the extreme observations, but not eliminate them completely (Uniejewski et al., 2018). For instance, in typical commodity markets, the logarithmic function is commonly used. However, in electricity markets with very close to zero (e.g., the Spanish market) or even negative (e.g., the German market) prices, the logarithmic transform is not feasible. As suggested by Uniejewski et al. (2018), before applying the VST, each variable should be standardized by subtracting the sample median and dividing by the sample median absolute deviation. Then, a well-performing VST, like the area hyperbolic sine or the probability integral transform, can be applied, the forecasts computed for the VST-transformed series, and the price forecasts obtained by applying a back-transformation; see Narajewski and Ziel (2020) for a discussion of the latter. The electricity price, like the case of many time series in a social setting, exhibits seasonality at both daily, weekly, and, to some extent, annual levels. Most authors treat the daily and weekly seasonalities as inherent features of any price forecasting model, but ignore the long-term seasonal component (also known as the trend component), which captures the large-scale variation in time series. However, as shown by Nowotarski and Weron (2016), decomposing a series of spot prices into a long-term seasonal component and a stochastic component, e.g., using the wavelet transform or the Hodrick–Prescott filter, modeling them independently and

Solar Forecasting: The New Member of the Band

103

then combining their forecasts, yields more accurate point predictions than fitting the same regression model to the prices themselves. This is also true for ANNs and probabilistic forecasts. All data-driven price forecasting models require a training window, which is also called a calibration window in the statistical price forecasting literature. However, very few studies try to find the optimal length of this window or even consider windows of different lengths. Instead, the typical approach has hitherto been to select ad-hoc a “long enough” window. As a result, windows as short as two weeks, and as long as five years, have been considered. The averaging across calibration windows concept introduced by Hubicka et al. (2019) tackles this issue more systematically. The authors argued that combining predictions obtained from a regression or an ANN calibrated to windows of different lengths outperforms selecting ex ante only one “optimal” window length. This finding can be partially explained by the philosophy of data ensemble, as discussed in Section 3.3.2.1.

4.5

SALIENT FEATURES OF SOLAR IRRADIANCE

Having looked at the technical considerations that have been given to other energy forecasting domains, we turn back to solar and examine the salient features of solar irradiance and the defining characteristics of solar forecasting. The content of this section serves as a technical overview of the rest of the book, and many concepts are discussed separately in the subsequent chapters. 4.5.1

CLEAR-SKY EXPECTATIONS

Heliocentrism, a model of the sun-centered universe and of the earth and other planets revolving around the sun, is commonly praised to Nicolaus Copernicus, a Renaissance-era mathematician and astronomer. According to Russell (2017) and many others, however, it was Aristarchus of Samos, an ancient Greek astronomer who first described heliocentrism. Regardless, the diurnal rotation of the earth and its annual revolution about the sun has become something that any ordinary man would have accepted on authority without any hesitation. Astronomers today can compute accurately the relative position of the sun in the sky, as observed from an arbitrary location on earth, with an uncertainty of ±0.0003◦ in both zenith and azimuth angles, for the years from 2000 B . C . to 6000 A . D . (Reda and Andreas, 2004). There are simply not many natural phenomena that men can apprehend to such preciseness. Applying such knowledge is then common sense, particularly when the subject as concerning this book is related to the energy from the sun. The benefit of leveraging solar position algorithm in solar forecasting is authoritative, and leaves little if not no room for skeptics to advance their objections. The immediate results of solar positioning are the extraterrestrial radiation and clear-sky radiation. Extraterrestrial radiation6 (ETR, or E0n ) is the solar radiation 6 ETR refers to the extraterrestrial radiation received by a surface perpendicular to the incident ray. From basic trigonometry, it is clear that for any surface that is not perpendicular to the incident ray, the component of the light captured by the receiving surface must be obtained by multiplying the power of the

104

Solar Irradiance and Photovoltaic Power Forecasting

received just outside the of earth’s atmosphere; ETR is thus also known as the topof-the-atmosphere radiation. Its calculation depends only on the solar constant (or Esc ), which is estimated to be 1361.1 W/m2 (Gueymard, 2018), the average sun– earth distance over the course of one revolution (Ravg ), and the current sun–earth distance (R):   Ravg 2 E0n = Esc × . (4.5) R As we shall demonstrate below, it would be barely exaggerating to think of E0n as the “initial singularity” from which everything in the realm of solar radiation modeling originates. Clear-sky radiation, on the other hand, refers to the amount of radiation received on the earth’s surface under a cloud-free atmosphere. A common misconception is that the clear-sky radiation is equivalent to extraterrestrial radiation. However tempting, one should never mix-up “no cloud” with “no atmosphere.” And here is why. When radiation travels through the earth’s atmosphere, it undergoes three types of interaction with the matter in the atmosphere: scattering, absorption, and emission. Since atmospheric emission in the solar spectrum is negligible, only scattering and absorption are relevant processes, and their combined effect on weakening solar radiation is called extinction. The governing equations of these three types of interaction are the subject matter of radiative transfer, a domain of atmospheric sciences and engineering heat transfer. In short, radiative transfer describes how a pencil of radiation traversing a medium would be weakened (and strengthened) by the effects of atmospheric physics and chemistry. As is the case of many equations in physics, the radiative transfer equations, too, are elegant and embarrassingly simple. It is, however, the approaches to arrive at the solutions that have spawned an enormous amount of literature. We would not attempt to summarize that, but to refer the readers to Liou (2002); Zdunkowski et al. (2007). The significance of the solutions to the radiative transfer equations to solar energy, together with the computational burden that would result from large-scale applications with varying meteorological conditions, have motivated solar engineers to use the Beer–Bouguer–Lambert law through parameterization. Joining, and later on leading, the investigations of this sort is Chris Gueymard. By assuming the beam radiation7 entering the atmosphere encounters but a limited number of extinction processes, such as ozone absorption, molecular scattering, or aerosol extinction, Gueymard developed a transmittance approach to simplify the radiative transfer. More specifically, these extinction processes are assumed to occur in separate layers of the atmosphere, leading to a hypothetical atmosphere in which the overall transmittance for beam radiation can be obtained as a product of layer transmittances. Denoting incoming ray with the cosine of the angle between the ray itself and the normal of the receiving surface, which is the zenith angle of any horizontal surface where the observer stands. 7 The word “beam” can be used interchangeably with “direct” in the context of radiation modeling. Despite the possible confusion that has stemmed from the terminology issue, and has unfortunately propagated through the literature, the reader should note: “beam normal irradiance” is “direct normal irradiance.”

Solar Forecasting: The New Member of the Band

105

the surface-level beam component by B, and a subscript of nc representing “clearsky normal-incident,” Gueymard’s equation reads: Bnc = E0n TR Tg To Tn Tw Ta ,

(4.6)

where the subscripts of T (or transmittance) denote Rayleigh scattering, uniformly mixed gas absorption, ozone absorption, nitrogen dioxide absorption, water vapor absorption, and aerosol extinction, respectively (Gueymard, 2008). Each transmittance can be interpreted as the fractional part of the incident beam radiation that is weakened by the medium due to a particular extinction process. Indeed, Bnc in Eq. (4.6) is better known as the clear-sky beam normal irradiance, or clear-sky BNI, which is ubiquitously used during the forecasting of BNI, a principal factor influencing the power generation of a concentrating solar power (CSP) plant. Aside from beam transmission—not to be confused with the type of transmission that happens after saying “Beam me up, Scotty”—there is diffuse transmission. The term “diffuse” is attributed to scattering, and thus distinct from how beam radiation reaches the earth’s surface. The multiple scattering processes assumed for modeling of diffuse irradiance at the surface are in contrast to Rayleigh scattering per Eq. (4.6). The Rayleigh scattering is considered to be a single-scattering process, in which scattering happens only once. Some photons, however, need to go through an immeasurable number of scattering events, either due to molecules and aerosols in the atmosphere or due to the earth’s surface itself, before finally reaching the ground or the eye of an observer. Therefore, the radiative transfer theory for diffuse radiation is inherently more difficult than that for beam radiation. And this inherent difficulty carries forward to the transmittance approach used by Gueymard. Gueymard’s modeling for diffuse radiation is divided into two additive parts, a diffuse component due to multiple scattering in the atmosphere alone (i.e., assuming a perfectly absorbing ground), and a diffuse component due to multiple scattering between the ground and the atmosphere (Gueymard, 2008). Denoting the former by Dhcp and the latter by Dhcb , where the first two letters in the subscripts represent “clear-sky horizontal,” and the last letters in the subscripts represent “perfectly absorbing” and “backscattered,” respectively, we have,    (4.7) Dhcp = Tg To Tn Tw FR (1 − TR )Ta0.25 + Fa Fc TR 1 − Tas0.25 E0n cos Z, where E0n , Tg , To , TR , and Ta are as described in Eq. (4.6); Tn and Tw are effective transmittances (under an effective air mass of 1.66) due to nitrogen dioxide and water vapor absorption; FR and Fa are forward scattering fraction for Rayleigh extinction and aerosol forward scatterance factor, respectively; Tas is the transmittance due to aerosol scattering; and Fc is a correction factor to accommodate the remaining effect of multiple scattering and other shortcomings of the model. On the other hand, Dhcb = ρg ρs (Bnc cos Z + Dhcp )/(1 − ρg ρs ),

(4.8)

where ρg and ρs are ground albedo and sky albedo, and Z is the solar zenith angle.

106

Solar Irradiance and Photovoltaic Power Forecasting

Summing up Eqs. (4.7) and (4.8) gives Dhc = Dhcp + Dhcb , the clear-sky diffuse horizontal irradiance, or clear-sky DHI. Finally, through the closure equation, one arrives at (4.9) Ghc = Bnc cos Z + Dhc , that is, the clear-sky global horizontal irradiance, or clear-sky GHI. One can note that in Eqs. (4.6)–(4.9), the extraterrestrial irradiance E0n appears, echoing the earlier analogy on “initial singularity.” This completes the main formulation of what is known to be the REST2 clear-sky model, which is an acronym for Reference Evaluation of Solar Transmittance, 2 bands. The phrase “2 bands” suggests that the model is fitted separately for 2 spectral bands, with band 1 covering the ultraviolet and visible, from 0.29 to 0.70 μm, and band 2 covering the near-infrared, from 0.7 to 4 μm. This 2-band modeling aims at distinguishing the different attenuation mechanisms at these spectral bands—ozone absorption and aerosol scattering are strong for band 1, whereas strong absorption by water vapor, carbon dioxide, and other gases characterizes band 2. Before REST2, Gueymard (2003) introduced its predecessors, the REST model, and before REST, there was CPCR2, or Code for Physical Computation of Radiation, 2 bands (Gueymard, 1989). In fact, besides these, one can assuredly find a long list of clear-sky models, though some are more elaborate than others. The level of modeling ranges from using a single empirically fitted equation to the intricacy of REST2. Most generally, clear-sky models can be categorized based on whether or not radiative transfer is used during the modeling process. There is a general positive correspondence between the model complexity and its performance. Sun et al. (2021a, 2019) presented the most comprehensive reviews by far on clear-sky models, in which 75 models for GHI and 95 models for BNI and DHI are contrasted with worldwide research-quality data. A research version of REST2 (REST2v9.1) has been deemed to be of the highest quality, followed by a public version (REST2v5). In conclusion, we note that the skillfulness in describing the cyclic behavior of solar irradiance, using a decent clear-sky model, separates solar forecasting from everything else. This is far from being an overstatement. In the solar track of GEFCom2014, the only team of participants who employed a clear-sky model—in fact, a fairly bad one, due to the lack of atmospheric information during the competition— was also the winning team (Hong et al., 2016), whose forecasts made everybody else’s pale in comparison. Besides removing the seasonality in irradiance, which is the most critical trick in solar forecasting, clear-sky models are also used in a broad range of applications, such as deriving irradiance from remote-sensing retrievals, quality control of solar irradiance, or irradiance-to-power conversion. To that end, the reader should expect the importance of clear-sky models and related concepts to be repeatedly emphasized throughout this book. 4.5.2

DISTRIBUTION OF CLEAR-SKY INDEX

Due to the apparent motion of the sun, solar radiation reaching the earth’s surface traces out a time series transient with two seasonal components, one yearly and the

Solar Forecasting: The New Member of the Band

107

other diurnal. This double-seasonal pattern can also be observed, for obvious reasons, in time series of solar power generation. Rooting on such knowledge, one can decouple the seasonal components from the irradiance or PV power time series, so as to obtain a deseasonalized quantity, which is strictly preferred during statistical analysis. To proceed, two strategies logically present. One of those is to assume an additive relationship between the deseasonalized quantity and the seasonal component. Alternatively, it is possible to consider a multiplicative relationship. Clearly, one of these strategies is more useful than the other, and the reason can be demonstrated by simply tracing out the deseasonalized quantities in both cases, as shown in Fig. 4.5. In Fig. 4.5 (a), the GHI time series on August 3, 2010, measured in Oahu, Hawaii, is drawn together with its clear-sky expectation computed using the McClear model (Gschwind et al., 2019; Lef`evre et al., 2013), which is a physical clear-sky model that can be obtained conveniently via a web service. Then, in Fig. 4.5 (b) and (c), respectively, the deseasonalized quantities using additive and multiplicative decomposition are depicted, denoted using symbols κ  and κ. One can observe that the amplitude of oscillation of the deseasonalized quantity in Fig. 4.5 (b) varies according to the time of day, with mid-day ramps larger than morning and afternoon ramps. Since the aim of deseasonalization is to remove the diurnal dependence in irradiance, the additive strategy has failed quite miserably, which, in turn, makes the multiplicative strategy the winning strategy. Indeed, as seen in Fig. 4.5 (c), the amplitude of oscillation of the deseasonalized quantity is contained between 0 and 1.5. Solar engineers refer to this deseasonalized quantity as clear-sky index, κ, which is absolutely vital to solar forecasting. Analogously, those solar forecasters who construct their models directly on irradiance or PV power are just like those computer scientists who build their machine learning models with unnormalized features. Since no well-trained computer scientist would be foolish enough to commit such mistakes, solar forecasters must be acquainted with various approaches to retrieve clear-sky index, not only from irradiance but also from PV power. Given the cardinal gravity of clear-sky index in solar forecasting, it is only to be expected that one should wish to examine its statistical properties, among which the distribution is a fundamental one. The distribution of κ has been found useful for at least three aspects of solar forecasting. The first aspect is for acquiring data, for the distribution can be used to estimate global irradiance from clear-sky irradiance and (fractional) cloud cover, in situations where ground-based measurements are absent (Smith et al., 2017). Nonetheless, as gridded irradiance, attained by either remote sensing or reanalysis, becomes increasingly available, the need to estimate irradiance from cloud cover has diminished. The second aspect is that the unconditional distribution offers some insight into how conditional distribution would behave. For instance, if a three-component Gaussian mixture is able to describe the unconditional distribution, one can convert forecasting into a classification problem, in that, once the mixture component to which the forecast belongs can be identified, the predictive distribution naturally follows. The third source of attraction is that the distribution can be used for synthetic generation of irradiance, which can be viewed as a (probabilistic) forecast downscaling method (Yang and van der Meer, 2021).

108

Solar Irradiance and Photovoltaic Power Forecasting

Irradiance [W m2]

(a) 1000

500

0 (b) κ'

0 −500 (c)

κ

1.0 0.5 0.0 06:00

09:00

12:00 15:00 Time [HH:MM]

18:00

Figure 4.5 (a) GHI transient on August 3, 2010, Oahu, Hawaii, and its clear-sky GHI. (b) Deseasonalized variable based on additive relationship. (c) Deseasonalized variable based on multiplicative relationship.

The unconditional distribution of κ is known to be both time-resolution- and location-dependent. The unconditional distribution also depends on cloudiness (e.g., cloud cover in Oktas), solar zenith angle (Smith et al., 2017), and likely many other variables. Despite that, empirical evidence has shown that the clear-sky index distribution is bimodal in most places in the world (Bright, 2019a), owing to the clear and cloudy skies. Figure 4.6 shows the unconditional κ distributions at three locations in the United States. Whereas Goodwin Creek (GWN) and Penn State University (PSU) show bimodality, Desert Rock (DRA) has a dry desert climate with predominant clear-sky conditions, showing a unimodal concentration at κ = 1. At this stage, readers need to be alerted to a commonly encountered suboptimal practice of solar forecasting. In the literature, especially when dealing with solar power forecasting, some forecasters do not seem to be aware of the fact that clearsky index can be calculated for solar power (e.g., see Engerer and Mills, 2014, for

Solar Forecasting: The New Member of the Band

Probability density

DRA

GWN

6

PSU 1.5

2

4

1.0 1

2 0 0.0

109

0.5

1.0

0 0.0

0.5 0.5 1.0 Clear−sky index, κ

0.0 0.0

0.5

1.0

Figure 4.6 Hourly clear-sky index (κ) distribution at three locations in the United States, over 2015–2018. one example approach). Instead, when a distributional assumption needs to be made, those forecasters often argue that GHI and solar power follow a (mixed) beta distribution (e.g., Fatemi et al., 2018; von Loeper et al., 2020). This is inappropriate for two reasons. One of those is the assumption of beta distribution is often made based on the misbelief that irradiance and solar power are bounded between zero and the clear-sky expectation, which seems to correspond well with the [0, 1] interval of the beta distribution (see Fatemi et al., 2018, for an example argument of this sort). As we have seen in Fig. 4.6 and shall detail more in Section 4.5.3, the clear-sky expectation is by no means the upper bound of irradiance/solar power. The second reason is that working directly on GHI and solar power, so as to assess the spatio-temporal dependence structure (see von Loeper et al., 2020, for an example), implies for certain that the correlation between two time series will be exaggerated. In other words, whatever correlation computed based on GHI or solar power time series contains not only the effect of moving clouds but also the effects of seasonality. In fact, GHI time series at any two distant sites with similar latitudes—say Los Angeles and Shanghai—would have a high correlation, but no one would absurdly think one series explains the other. 4.5.3

PHYSICAL LIMITS OF SOLAR IRRADIANCE

The relevance of understanding the physical limits of solar irradiance rests on the two ends of the forecasting process. In the beginning, any responsible forecaster would perform quality control to prevent spurious data from entering the modeling process. In the end, one must ensure the forecast value is physically possible. Though fundamental, these two principal considerations have been grossly neglected by many. Inspecting the literature would immediately result in a long list of articles that place the Gaussian assumption on either the clear-sky index distribution or the predictive distribution. More interestingly, there is an evident propagative effect, in that the later works simply cite the earlier ones, as if the citations are able to justify the choice of the Gaussian assumption. It is true that Gaussian variables are highly amenable to operate with, but one must neither be diverted by mathematical convenience nor by

110

Solar Irradiance and Photovoltaic Power Forecasting

the supposition that this common practice is scientifically acceptable. Indeed, irradiance and solar power are physically bounded, and the Gaussian distribution, which extends to infinity in both directions, is inadequate. The lower bound of irradiance or solar power is zero, theoretically. Nevertheless, at extremely low-sun or no-sun conditions, it is possible to have negative irradiance records due to thermopile pyranometer nighttime offsets (Michalsky et al., 2017). Since the low-sun conditions are only of marginal importance to the utilization and control of solar power, one can safely accept zero as the lower bound of irradiance without raising much practical concern. That is the unconditional lower bound. For conditional lower bound under clear-sky conditions, one may set it to 0.95 times the clear-sky irradiance (or in terms of clear-sky index, just 0.95), for the typical measurement uncertainty of radiometers is < 5%. It is also reasonable to reduce the lower bound further, in consideration of the uncertainty in the clear-sky model, e.g. with regards to aerosol effects in areas with dust, air pollution, or forest fires. For other sky conditions, the lower bounds can be identified from the data. The upper bound of irradiance or solar power calls for more attention. It should be clarified, once again, that the clear-sky irradiance or power generation under clear skies is not the maximum value that the random variable can take. Irradiance reaching a horizontal surface is composed of the beam and diffuse components. In cases of intermittent clouds, there is a chance for the sun to be completely exposed to the observer while the clouds in its vicinity scatter additional diffuse radiation onto the observer—this is known as a cloud-enhancement event or over-irradiance. In addition to cloud enhancement, an observer on a horizontal surface sometimes receives another irradiance component due to ground–cloud reflection—this is known as an albedo-enhancement event. Gueymard (2017a,b) provided a pair of reviews on this subject, which serves as good background reading. For bounds of PV power, one needs to distinguish between DC power from the PV panels and AC power of the inverter. Nominal DC power (i.e., the panel’s rated power) is measured under the so-called standard test condition (STC), which refers to an in-plane irradiance of 1000 W/m2 , a cell temperature of 25◦ C, and an air mass 1.5 (AM1.5) spectrum. Consequently, when in-plane irradiance exceeds 1000 W/m2 (at a temperature of or less than 25◦ C), the power from a PV module exceeds its DC nominal power. Irradiance regularly exceeds 1000 W/m2 under clear skies at small solar zenith angles and especially at higher altitudes. In addition to exceedances during clear skies, cloud-enhancement events as discussed in the last paragraph also translate to the case of PV power. Notwithstanding, compared to PV power exceeding its clear-sky expectation due to cloud enhancement, which can occur at any time of the day, the probability of PV power exceeding the nominal DC power is far lower, since such events only take place around summertime solar noons (except for PV plants at low latitudes or high elevations) when the clear-sky irradiance is highest. That said, during grid integration, DC power needs to pass through an inverter to reach the power grid. On this point, the AC power rating of the inverter represents a hard upper limit to what a solar power plant can feed into the grid. This limit is especially convenient for solar forecasters for overbuilt plants, where plant owners expand the DC power two or more times the AC power. Such plants operate at a

Solar Forecasting: The New Member of the Band

111

higher capacity factor and provide more stable power output as any irradiance larger than a few 100 W/m2 below 1000 W/m2 already results in maximum AC power. For solar forecasters, this means that during several hours of the day, thin clouds and aerosol fluctuations become irrelevant, as the irradiance would still be large enough for the AC power to reach the upper limit. While AC power output may be limited during a certain fraction of the day as described above, just like irradiance, generally PV power exhibits conditional heteroscedasticity (e.g., when conditioning on timeof-day), and it is not at all meaningful to use a single value (i.e., nominal power) as the upper bound for different conditions. The lower and upper bounds of irradiance or PV power are also functions of temporal and spatial scales. More particularly, cloud and albedo enhancements occur only at high frequency. Gueymard (2017a) reported that the observed GHI can reach ≈ 1900 W/m2 and ≈ 1600 W/m2 , at 1-s and 1-min resolution, respectively. For any time period longer than 5 min, these enhancement events tend to average out. Geographically, the diameter of a cloud-enhancement event is on the order of tens of meters, which is comparable to the estimated side length of a 100-kW PV plant (J¨arvel¨a et al., 2020). What this implies is that for PV systems smaller than 100 kW, there would not be much smoothing effect, and as a result, the systems would be affected by cloud enhancement. Based on the analysis of an actual 100-kW rooftop PV plant at Tampere University in Finland, the observed peak value of plant-average irradiance was close to 1400 W/m2 , which is over 1.5 times of the corresponding clear-sky irradiance. When the plant size increases to about 1 MW, the value drops to 1.4 times, but is nonetheless still significant. The main message contained in the above paragraphs is this: The upper bound of irradiance or DC solar power should be about 1.5 times of the clear-sky expectation, and one should adjust this multiplier based on the time and/or spatial scale of the data under analysis. On this point, some authors use 1.2 as the upper bound (Pedro et al., 2019; Yang et al., 2020b), which can be deemed fine with 30-min or hourly data, but the bound may be too tight for data of higher frequency. 4.5.4

SPATIO-TEMPORAL DEPENDENCE

Solar irradiance is a spatio-temporal process that is continuous in both space and time. Owing to the need for sampling, the solar irradiance data acquired in real life are all discretized versions of the continuous process. For instance, images captured by a sky camera—an upward-facing fish-lens camera that has a field-of-view of the hemispheric local sky—or by radiometers onboard weather satellites can be considered as time series of lattice processes, for the images are regularly spaced in time, and each containing spatial information that can be mapped onto a (regularly spaced) lattice. As introduced in Section 3.3.1, NWP models discretize the threedimensional space into a horizontal grid and vertical layers when estimating the atmospheric states using differential equations of motion. Shared across these three sources of data is the potential to explain the dynamics of solar irradiance. On this point, another salient feature of solar irradiance is its spatio-temporal dependence.

112

Solar Irradiance and Photovoltaic Power Forecasting

The intrinsic correlation of one irradiance time series on another—after the seasonality is removed by the means of a clear-sky model—is, in principle, solely due to moving clouds. On the other hand, the speed at which clouds move is primarily determined by the wind speed at the cloud height. Hence, in the case of camera and satellite data, obtaining the wind field, either by comparing two or more consecutive images, or from another data source, is the foremost step in forecasting. Subsequently, by advecting the cloud or irradiance field with respect to the wind field, forecasts can be made (see Miller et al., 2018, for review). Notwithstanding, advection-based forecasting assumes a “frozen” cloud field, which is reasonable over small time windows of a few minutes for camera data, and a few hours for satellite data, beyond that, the rate at which clouds form, morph, and dissipate renders the assumption invalid. Clearly then, such time windows put a limit on the predictability using camera or satellite data; NWP is strictly indispensable for solar forecast over any horizon beyond a few hours. Besides physical methods to capture the spatio-temporal dependence of irradiance, statistics plays a relatively minor role. All purely statistical methods to capture the cloud dynamics must start with a sensor network, of which the sampling frequency, extent, and spacing jointly determine the network’s ability to make good forecasts. Dense networks with a sub-kilometer spacing and a high sampling frequency have been shown to be able to produce exceptionally good forecasts, under the condition of prevailing trade winds (Yang et al., 2022e, 2015b; van der Meer et al., 2020). But empirical evidence shows that any network larger than a few kilometers is unable to produce satisfactory forecasts (Yang et al., 2014a, 2013a) as compared to the physics-based methods, even though spatio-temporal statistical methods such as kriging or VAR are well established. Another grave limitation of sensornetwork-based forecasting is scalability, for it is economically infeasible to collocate a sensor network with every solar power plant. Some forecasters have advocated for spatio-temporal solar forecasting by treating rooftop PV systems as sensors (Lonij et al., 2013), which can relieve the need to install pyranometers. But the forecast horizon over which a network of rooftop PV systems is able to provide forecast skill—the longest horizon reached by Lonij et al. (2013) is 90 min—falls short of the grid-side requirements for both the DAM and RTM operations. In conclusion, if one of the most important aspects of solar forecasting is to serve intra-day and day-ahead grid integration of solar energy, as has been argued in Chapter 1, then we can no longer pretend that NWP-based solar forecasting is but one of the many options; it is possibly the only option. This conclusion can be reached from a simple argument with a pair of premises: 1. Grid operators typically require solar forecast horizons of > 4 h for RTMs, and of > 36 h for DAMs (see, e.g., Yang et al., 2021a, 2019; Kaur et al., 2016; Luoma et al., 2014). 2. All other forecasting methods, which include those based on cameras (which has an effective forecast horizon of < 30 min, see Kazantzidis et al., 2017; Rodr´ıguez-Ben´ıtez et al., 2021), satellites (which has an effective forecast horizon of < 4 h, see Miller et al., 2018; Nielsen et al., 2021), sensor networks

Solar Forecasting: The New Member of the Band

113

(which is limited by the network scale and availability), or data-driven techniques (unable to capture the spatio-temporal dynamics of clouds), are unable to produce satisfactory results. Be that as it may, aside from intra-day and day-ahead grid integration, other applications of solar forecasting do exist and require forecast at shorter horizons, e.g., secondary control in power systems (Yang et al., 2022e; Ganger et al., 2018) or optimal control of plant-level ramping events with on-site energy storage (Notton et al., 2018) require sub-minute solar forecasting. In any case, the remaining chapters of the book shall review these mentioned solar forecasting techniques in depth. 4.5.5

SOLAR POWER CURVE

Normalized solar power

The concept of solar power curve is highly analogous to that of wind. However, unlike how wind speed is translated to wind power, which is usually a one-step procedure, converting irradiance to solar power requires a sequence of steps, each encompassing an energy meteorology model that serves a specific purpose. To that end, the physical solar power curve is also known as the model chain. Figure 4.7 depicts a typical relationship between GHI and PV power, and the relationship is clearly noninjective. In principle, it is possible to use curve-fitting or AI-based methods (i.e., regression), just like those used to describe the wind power curve, to address the noninjective mapping. Notwithstanding, since the elements of the model chain represent the purest crystallization of ideas conceived by many generations of solar engineers, the ability to take advantage of the model chain separates skillful solar forecasters from everyone else. This view is to be revisited and strengthened in Chapter 11 with greater details. 1.00 0.75 0.50 0.25 0.00 0

300 600 900 Global horizontal irradiance [W/m2]

1200

Figure 4.7 A typical solar power curve, which describes the relationship between global horizontal irradiance and PV power. Darker colors mean more points in the vicinity.

The meteorological inputs required by a model chain, at the very minimum, should consist of GHI (Gh ) and ambient temperature (Tamb ), and the output is PV power. One can certainly include more inputs, such as wind speed, surface albedo (ρg ), or precipitation, to refine the modeling, but as compared to these inputs, PV

114

Solar Irradiance and Photovoltaic Power Forecasting

system specifications are far more vital. System specifications including the location of the site, the geometric orientation of the panels, panel temperature coefficient (γ) and nominal operating cell temperature (NOCT), and the DC nominal power output (Pdc, mpp, ref ) have to be known to create a basic working-condition model chain. More specifically, all model chains first calculate the solar position, which requires the geographical location and the panel orientation, so as to compute the zenith angle (Z), the incidence angle (θ ), and the extraterrestrial GHI (E0 ), for the time stamp(s) of interest. Subsequently, a separation model is needed to estimate beam and diffuse irradiance components from GHI, via the clearness index (kt = Gh /E0 ). With the estimated beam normal irradiance (Bn ) and diffuse horizontal irradiance (Dh ), as well as GHI, the global irradiance on the collector surface, or global tilted irradiance (GTI, Gc ), can be arrived at with the transposition equation: Gc = Bn cos θ + Rd Dh + ρg Rr Gh ,

(4.10)

where Rd and Rr are the diffuse transposition factor and the transposition factor due to the ground’s reflection, respectively, which can be estimated using the panel orientation and irradiance components. Next, Tcell can be estimated from Tamb and NOCT through the following equation: Tcell = Tamb +

NOCT − 20◦ C Gc . 800 W/m2

(4.11)

As the last step, the PV power output (P) is computed using P = Pdc, mpp, ref

Gc [1 + γ (Tcell − 25◦ C)] . 1000 W/m2

(4.12)

With no surprise the above equations constitute the simplest possible model chain which one can construct. In reality, the PV power generation is also impacted by soiling loss, mismatch loss, AC and DC cable losses, reflection loss, and inverter and transformer losses, all of which require separate models. Aside from the losses, another practical concern is the choice of models. As mentioned in Section 4.5.1, there are close to 100 clear-sky models available in the literature, and the situation is similar with other radiation models. For instance, there are hundreds of separation models and tens of transposition models, with varying complexity and accuracy. The problematic aspect is not with the sheer number of models, as one can always identify the best or quasi-universal models among the crowd, rather, it is the combination of models, in which the best individual models might not lead to the best model chain, considering the intricate error propagation mechanisms (Schinke-Nendza et al., 2021; Mayer and Gr´of, 2021; Holmgren et al., 2018). In short, the solar power curve is a subject that deserves full attention from solar forecasters, as the quality of the model chain and that of irradiance forecasts jointly determines the quality of solar power forecasts. To that end, irradiance-to-power conversion techniques shall be revisited in much greater detail in Chapter 11.

Solar Forecasting: The New Member of the Band

4.6

115

SHARED RESEARCH FRONTIERS IN ENERGY FORECASTING

Energy forecasting, it is true, has evolved far beyond the stage that allows novelty claims by just fixing standard implementations of existing methods and algorithms, or hybrids of them, onto “new” problems. There is an excess of articles of this kind, which does not advance but greatly inflates the literature. It follows that, for anyone entering the field, there is much background information to filter and absorb before one can truly innovate and contribute to the field. Based on a search conducted in 2020, it is estimated that there are more than 11,000 energy forecasting papers published in various journals over the 10-year period from 2010 to 2019, and there must be more now; one can easily be inundated with the sheer amount of information. It would be unwise to think an unguided literature survey based on a few hundred or even tens of papers can narrate the state-of-the-art. To avoid wasting time on published works that do not offer much value in theory and/or practice, several most important research frontiers in energy forecasting are outlined. 4.6.1

ADVANCED PHYSICS-BASED FORECASTING

All four domains of energy forecasting can benefit tremendously from high-quality weather forecasts. Although the mappings from weather forecasts to load, wind power, electricity price, and solar power forecasts are far from being trivial, it can be known by simple induction that the quality of input weather forecasts is closely linked to the level of success of energy forecasts. Thus, in order to improve the accuracy of energy forecasts, professional weather forecasting models that leverage the latest-generation remote-sensing and/or NWP with advanced configurations, augmentation, and parameterization are to be developed. The topics of remote sensing and NWP belong to the field of atmospheric sciences, whereas many energy forecasters have a statistics or engineering background. This mismatch of the knowledge base is thought to be the major reason preventing energy forecasters from conducting more physics-based investigations. Two pieces of evidence can be used to support this claim. First, the inclusion of remote-sensing data in energy forecasting is often restricted only to the available products and databases. Second is that a super majority of existing physics-based ensemble forecasting uses the poor man’s ensemble instead of the dynamical ensemble, since the latter would demand the perturbation of initial conditions, which is a highly specialized skill that can only be acquired after years of training. Logically speaking, there is every confidence that atmospheric scientists will push the boundary of knowledge further in time. However, given the fact that energy forecasting is but one application of remote sensing or weather forecasting, one should not expect the latest advancements in those areas to be solely relevant to energy forecasting. The main goal of the NWP research and development is still largely the improvement of conventional high-risk weather forecasts, such as heat waves, rain and flooding, hurricanes, and severe winds. Insofar as the scientific method of NWP and remote sensing is susceptible to almost indefinite refinement, energy forecasters can always identify niche areas in which weather forecasters have little

116

Solar Irradiance and Photovoltaic Power Forecasting

motivation or limited knowledge to fine-tune their models. For instance, although the European Centre of Medium-Range Weather Forecasts (ECMWF) has the world’s best global NWP models, its operational analysis can be used as initial and boundary conditions for running the Weather Research and Forecasting (WRF) microscale simulations (hectoscale and large eddy) of wind events, which typically results in higher-value wind forecasts. Similarly, analyses from the operational National Centers for Environmental Prediction’s Rapid Refresh model can provide initial and boundary conditions every 3 h for WRF-Solar (Jimenez et al., 2016). Both cases depict how global- and continental-scale NWP models can be used as input for running other special-purpose regional meteorological models. For solar forecasting, it is relatively easy to improve clear sky irradiance forecasts with custom NWP models, e.g. through improved aerosol optical depth input, but higher resolutions NWP models do not necessarily result in improved cloud forecasts (Mayer et al., 2023). 4.6.2

MACHINE LEARNING

Leaving the theories of physics-based forecasting aside, a fundamental assumption of data-driven forecasting is to assume the future resembles the past in some way, and discovering the pattern consisted in historical data and/or the relationship connecting past events to the present is key to high-quality forecasting. On this point, machine learning, as a general strategy of pattern discovery and relationship mapping, can certainly benefit the advancement of energy forecasting. In fact, machine learning is not as new as commonly comprehended, but has been adopted for energy forecasting for over three decades (Weron and Ziel, 2019; Chen et al., 2004; Kariniotakis et al., 1996). A simple but general typology of machine learning is shown in Fig. 4.8. Since information about these types of machine learning can be found elsewhere easily, it is not elaborated here. Most works that integrate the techniques of machine learning with energy forecasting fall under supervised learning, or more specifically, regression. Very often, due to the various challenges such as big data (e.g., remote-sensing data), heterogeneous data transient (e.g., dataset with PV plants with fixed and single-axis tracking panels), or changing regimes (e.g., clear, partially cloudy, and overcast skies), regression techniques are paired with dimension reduction, clustering, or classification in creating hybrid models. As these hybrid models became commonplace in recent years, the field turned to more advanced machine-learning techniques, such as deep learning (Shi et al., 2018b; Wang et al., 2017a), reinforcement learning (Feng et al., 2020), or transfer learning (Cai et al., 2020), which is the new norm. A downside of these advanced machine-learning techniques is that their training process is much more complex and time-consuming than machine learning with “shallow” networks. This is not only due to the sheer number of parameters (weights) to estimate but also due to the difficulties involved during the optimization of the hyper-parameters (i.e. network structure, activation functions, stopping conditions, regularization, among others) (Lago et al., 2018). Clearly, advanced machinelearning techniques can only be expected to be high-quality forecast tools when an enormous amount of training data is available. Stated differently, a few years

Solar Forecasting: The New Member of the Band

117

Dimension reduction

Classification

Unsupervised learning

Supervised learning

Machine learning Clustering

Regression

Reinforcement learning

Figure 4.8 A simple typology of machine learning.

of hourly data from a single location most likely does not warrant the use of deep learning. At the moment, the literature seems to have been focusing more on the form rather than on the essence of deep learning, in that, most papers spend pages discussing the theory with complex but general equations, but only limited attention is paid to application-specific settings, adaptations, and augmentations. Nonetheless, with ever-increasing computing power and collected data, one has reason to believe that the applications of deep-learning techniques would eventually be fully justified. It should be clear at this stage that machine-learning-based methods for load, wind, price, and solar forecasting can all benefit from including the physical characteristics of the processes involved, both for modeling and variable selection. This, however, does not mean that one should lump all exogenous data and weather variables in their raw and unprocessed forms into the machine-learning models, and hope the algorithm can crystallize the useful information on its own. Instead, one ought to factor in all known physical properties and salient features of those exogenous data, such as those mentioned in Section 4.5. (Believers of deep learning would disagree with these statements. However, there is not much evidence at the moment that can close such debate in an absolutely indisputable way. We have to keep in view both schools of thought.)

118

4.6.3

Solar Irradiance and Photovoltaic Power Forecasting

ENSEMBLE, COMBINING, AND PROBABILISTIC FORECASTS

Probably the most important mentality shift in the recent history of energy forecasting is marked by the increased valuation of probabilistic forecasting. Wind power forecasting certainly took the lead during the transition. This may largely be due to the early collaborations between wind power forecasters and meteorologists (Hong et al., 2020). In the recent five years, the number of works on probabilistic solar forecasting has also dramatically increased (Yang and van der Meer, 2021). On this point, researchers have become increasingly concerned with probabilistic forecast verification, to ensure model comparison can be carried out most objectively and fairly. Whereas a pair of reviews on probabilistic wind and solar forecast verification is now available (Messner et al., 2020; Lauret et al., 2019), another important aspect is benchmarking, which offers a leveled platform for forecasters to showcase their forecast skills—just for solar forecasting alone, there have been four recent articles proposing various benchmarks for probabilistic forecasting (van der Meer, 2021; Le Gal La Salle et al., 2021; Doubleday et al., 2020; Yang, 2019h). It is thought that compliance with these proposed verification and benchmarking guidelines is a necessary prerequisite for making progress in probabilistic energy forecasting. The typology of ensemble forecasting has been laid out in Section 3.3.1, in that the forms which are most relevant to energy forecasting, are poor man’s ensemble and dynamical ensemble. Whereas the former proceeds from a collection of data groups, models, and/or parameter sets, the latter is exclusively generated by NWP models with perturbed initial conditions. The Poor man’s ensemble is conceptually no different from combining forecasts, which is a more popular phrase in statistical forecasting literature. Nonetheless, as the name “poor man’s ensemble” suggests, the pertinence of this form of ensemble is in situations where dynamical ensembles are not available. Methodologically, a poor man’s ensemble is much easier and much faster to construct in comparison to a dynamical ensemble. It would not be unreasonable to conclude that the former is an inferior option to the latter, insofar as weather forecasting is concerned, because, otherwise, there would not be any motivation for weather forecasters to pursue a more sophisticated ensemble approach when the simpler one can do the job just as well. Since weather forecasting through dynamical ensembles, particularly on a global and regional scale, is beyond most if not all energy forecasters, practically, it is the post-processing that deserves more attention. Stated differently, energy forecasters are users of weather forecasts, and their main task is to convert the available weather forecasts into usable forms that can suit energy forecasting applications. Post-processing refers to both the task of calibrating the initial forecasts and the task of converting the calibrated weather forecasts to load, wind power, price, and solar power forecasts—it is also possible to perform both tasks simultaneously. What distinguishes good and bad energy forecasters is therefore the amount of domain knowledge that they possess, which directly translates into how successful the energy forecasts would be. Another facet of probabilistic energy forecasting is perpetually tied to the applications, e.g., how probabilistic forecasts can benefit power system operations such

Solar Forecasting: The New Member of the Band

119

as load flow, unit commitment, or economic dispatch. However, as noted in numerous surveys (e.g., Li and Zhang, 2020; Hong et al., 2016), the industry adoption of probabilistic energy forecasts and related concepts is still limited. The reasons can be attributed, on the whole, to the disconnect between research and industry. The power system industry is a very conservative one. While probabilistic forecasting could reduce costs by generally reducing and better fine-tuning the scheduling of reserve power plants, power system operators generally prefer minimizing risk over reducing costs by planning for worst-case scenarios and using personal judgment rather than relying solely on the output of a probabilistic model. Given that power system operation procedures have evolved over decades, it is not realistic to expect power systems to update their operational procedure whenever new forecasting models are proposed. In the research counterpart, since a vast majority of forecasting works is grossly detached from the operational standards and submission requirements, as explained by Yang et al. (2021a); Yang (2019a), the proposed methods and models cannot be used at any rate, and thus possess no intrinsic value (recall that one of the three types of goodness of forecasts is value, which is solely assigned by forecast users; if there is no user, there is no value). 4.6.4

HIERARCHICAL FORECASTING

Power system operates as a unity, but like all systems, it consists of components of which the behavior, when examined jointly, reveals insights on how the system is likely to behave. Very often, the information carried by the individual components cannot be directly observed at the system level, and that of the system is also opaque to the observers at the component level. In this regard, effective information sharing has been a key emphasis in power system operation, and standardized supervisory control and data acquisition (SCADA) systems have been widely employed by power system engineers for more than two decades. In today’s power systems with highlevel renewable penetration and highly distributed renewable generation, existing SCADA systems need to be upgraded into multi-directional, multi-level information exchange systems (Ili´c et al., 2011). Since energy forecasting is an integral part of power system operation, it too can benefit from information sharing. A convenient way to model the interaction among forecasts at various system levels is through hierarchical modeling. Time series, may it be load, wind power, or solar power, can be allocated according to their natural temporal and/or geographical hierarchies or groupings. For instance, the sum of net loads (i.e., load minus renewable generation) at distribution feeders should equal the net load at the corresponding transmission bus minus losses. Similarly, the power output of individual strings should add up to the PV system output. Whereas the power generation and load are, by nature, aggregate consistent, it is not the case for forecasts, owing to the uncertainties involved in the forecasting process. To resolve the inconsistency in aggregation, hierarchical forecasting, or grouped forecasting,8 which reconciles base forecasts generated by different 8 On terminology, when the hierarchy can be uniquely defined, the term “hierarchical forecasting” is used, otherwise, the term “grouped forecasting” is used.

120

Solar Irradiance and Photovoltaic Power Forecasting

forecasters, individually, for different levels of a hierarchy or groups. Besides making forecasts aggregate consistent, another advantage of hierarchical or grouped forecasting consists in its improved accuracy over the base forecasts. Hierarchical and grouped time series forecasting may be unheard of by many solar forecasters, since evidently there have only been a handful of publications on this subject (Di Fonzo and Girolimetto, 2023; Yang, 2020f; Yagli et al., 2020c, 2019b; Yang et al., 2017b,c). In other energy forecasting domains, hierarchical forecasting can be traced back to a slightly earlier date (Hong et al., 2014), and in the more general statistical forecasting literature, to at least the 1960s (Grunfeld and Griliches, 1960). That said, the hierarchical forecasting to which we refer in this book has a modern flavor, that is, it utilizes all information in the hierarchy, as opposed to the traditional top-down, bottom-up, and middle-out approaches, which only leverage information from a single level. The so-called “optimal reconciliation method,” first proposed by (Hyndman et al., 2011), aligns perfectly well with the future SCADA systems as envisioned by Ili´c et al. (2011). More importantly, it echoes the latest grid code which mandates forecast submissions by owners of renewable energy systems (Yang et al., 2021a; Luoma et al., 2014; Makarov et al., 2010). Stated differently, owners of renewable energy systems, who are motivated by both the obligation to comply with grid codes and the impulse to minimize penalties, are responsible for producing the best possible power output forecasts for their systems and submitting them to the power system operators, whereas power system operators, who have access to system-level information, are in charge of producing the transmission- and distribution-level forecasts, as well as reconciling these forecasts with the submitted ones. To that end, hierarchical forecasting is perhaps one of the most pressing research topics in energy forecasting, and the subject shall be fully expanded in Chapter 12 of the book. It should be noted at this stage that hierarchical forecasting not only can reconcile forecasts on different levels of a hierarchy but also has the potential to bring together load, wind, and solar forecasting, so as to generate the most informative net load forecasts, which is the final quantity of interest for many power system operations.

4.7

COMMON ISSUES IN AND RECOMMENDATIONS FOR ENERGY FORECASTING RESEARCH

The present authors have observed the developments and changes occurring in the field of energy forecasting from the perspectives of authors, reviewers, and editors. Based on a rough count, we have written more than 300 journal papers on or related to energy forecasting, have reviewed another few hundred papers, and have handled many thousands of submissions, during the past decade. It is on this account that the content of this section is based upon a long and extensive practical experience. It should be noted that the mission here is to share with the readers those common pitfalls which might substantially lower the chance of getting manuscripts published. This, however, does not mean that the published and peer-reviewed works do not fall victim to these problems. Hence, a more important mission is to make

Solar Forecasting: The New Member of the Band

121

recommendations and remedies that can help the readers avoid those common pitfalls, and to standardize, if possible, the views on what constitutes good energy forecasting research. 4.7.1

COMMON ISSUES

Chapter 2 of this book listed some of the common issues in solar forecasting, such as peeking the future, hiding inconvenient facts (with Occam’s broom), exaggerating novelty (with the “novel” operator), unintelligible writing (with the smoke grenade), or low reproducibility, which have led to the philosophical debate therein. This section adds to or reiterates some of the items on the list from the perspective of publishing quality energy forecasting papers. 4.7.1.1

Limited dataset

One of the major limitations of energy forecasting works has always been the choice of dataset, and consequently, this gives rise to limited reproducibility and unverifiable performance claims. Since the performance of forecasting models depends highly upon the datasets used, when unique and context-specific datasets are used, the forecasting results obtained are not transferable to other scenarios. Stated differently, even if a method can be demonstrated to be high-performance, insofar as the dataset is restricted, the conclusions made ought not to be regarded as general. By “unique” we mean two things. First, it refers to those datasets with highly specialized input variables that can never be fully generalized. For instance, instruments such as sunphotometer, spectrophotometer, or spectroradiometer, are able to measure atmospheric composition or spectral irradiance, which are known to be beneficial to solar forecasting. Given the price and the scientific specialization required, one can never expect these instruments to receive a widespread uptake. It follows that the methods that depend on such instruments cannot be generalized. Second, the word “unique” also refers to proprietary datasets. While acknowledging that proprietary datasets do exist, due to organization policy, commercial considerations, and trade secrets, it must not be used as an excuse for circumventing reproducibility considerations. As we shall see in Chapter 6, publicly available datasets for solar forecasting exist in bulk, as do other energy forecasting datasets. If proprietary datasets have to be used, e.g., due to requirements from the sponsor, authors should at least add forecast results for a publicly available dataset to their paper. Context-specific datasets are not recommended, as they are often subject to deliberate filtering, selection, and other manipulations that could distort the interpretation of results. More importantly, all data-driven forecasting methods require fitting (or training, in the machine-learning jargon). If the fitting dataset comes from a specific climate or weather context, it is likely that the fitted model only works well under that context, again limiting the uptake and transferability of the proposed model. One may argue that the method can be general, and can be refitted using other datasets in other contexts. But given the multitude of available methods nowadays, it gives no good reason to choose a method that has not been extensively tested. In other words,

122

Solar Irradiance and Photovoltaic Power Forecasting

to ensure a method has the potential to be picked up by others, the forecaster who proposed the method must make sure to test it under diverse contexts. 4.7.1.2

Inadequate verification procedure

Superiority claims have become commonplace in energy forecasting, since such claims are viewed by many as a necessary and sufficient condition for novelty. While this view is flawed, another hidden problem is inadequate verification procedures, which can lead to superficial results that promote those superiority claims. Inadequate verification procedures fall under three types: (1) inappropriate error metrics, (2) weak benchmarks, and (3) small verification sets. Error metrics are numerous, and each exists for a reason. Because different error metrics examine different aspects of forecast quality, a winning model under one metric might not perform well under another. Hence, a very commonly practiced trick is to pick those error metrics that are in favor of the proposed models. To counter the situation, some have advocated using a suite of metrics, which only adds complexity to the already messy situation—the dimensionality of the verification problem multiplies with the number of error metrics used. In another case, many who are unfamiliar with the statistical properties of the error metrics may pick the inadequate ones, e.g., mean absolute percentage error (MAPE) is used for close to zero load, prices, or renewable power generation, or when mean absolute error (MAE) is used to assess the performance of a least-squares regression model. Another often overlooked aspect is testing the statistical significance of the error difference. When the observed differences in errors of one model to another are close to zero, statistical significance tests have to be performed, but even so, the superiority is still not absolutely conclusive. The use of error metrics is to be revisited in the forecast verification chapters of the book. The performance of any proposed model also depends on the benchmarks chosen. To compare forecasts generated at different locations and using different time periods, weather forecasters often employ na¨ıve reference methods, which mainly refer to climatology, persistence, and their combination. In the statistical forecasting literature, similar benchmarks are available. Nonetheless, for time series with seasonality, seasonality should be removed before applying the na¨ıve references; this is often neglected in practice. Another issue, which arises when selecting benchmarks, is that some forecasters only compare the proposed model with weaker versions of the model from the same family, e.g., comparing a deep neural network to a shallow one, a carefully tuned neural network with one that is barely tuned, a model with regularization to one without, or an ensemble model to a single-component model. All these could no doubt make the proposed model appear strong, but in fact, the performance is exaggerated, and can be easily matched or overtaken by any other model with the same complexity. The last pressing issue pertaining to forecast verification is the size of the verification dataset. It is not uncommon in the energy forecasting literature to see papers with small verification datasets with just tens of data points or even fewer. The exact words of Rob Hyndman, in commenting on this situation, are “That is just silly, and

Solar Forecasting: The New Member of the Band

123

anyone who believes it does not know anything about forecasting.” Indeed, a general rule in energy forecasting is to have at least a full calendar year of verification samples if the data is hourly. For data with higher granularity, the verification period may be shorter, but should nevertheless be able to represent most if not all scenarios over a calendar year. 4.7.1.3

Inconsistent terminology

A phenomenon that can be observed in any quickly expanding scientific domain is literature with highly inconsistent terminology. Inventing new terms and jargon seems to be very tempting for the novices, owing to their limited knowledge of the existing terminology. Other times, new terms and jargon are used just to make the proposed method look novel. These newly invented terms and jargon are not only taxing on the readers but also tend to propagate through the field. It is true that language is of an evolving nature, but it is also true that the authors are responsible for a thorough literature survey before conducting research, such that the unnecessary invention of new terms and jargon can be minimized. There may also be cases where inconsistent terminology exists for historical reasons. The most representative pair of terms is “ensemble forecasting” and “forecast combination,” in that, both terms were popularized in the 1960s, but originated from different circles of forecasters. In the field of solar energy, the words “beam” and “direct,” when used to describe irradiance components, are exchangeable; and even the most experienced solar engineers today could not have a full trace of the origin of this diversification. Lack of consistent terminology can lead to unintelligible writing, which harms both the readers and the authors. The ambiguous description of forecasting models, methodologies, and processes makes papers hard to reproduce, and thus limits their chance of getting accepted. In the academic world where the literature is growing bulkier each day, the only way to stay noticeable and get recognition is to consider the reader’s viewpoint, and deliver an easy-to-read piece of work that contains decent innovation and novelty. 4.7.2

RECOMMENDATIONS

There is no formula for making publications. Whereas scientific novelty is always the core of any proposal made for academic purposes, there are certain practices, which, when followed, could increase the chance of acceptance. Additionally, these practices can facilitate a healthy environment for current and future researchers, such that the accumulated forecasting knowledge can stay in its most useful form and grow faster. To that end, we wish to make the following five recommendations pertinent to energy forecasting research. 4.7.2.1

Conducting literature reviews

The list of references, as a necessary part of any academic publication, reflects an author’s level of understanding of the subject. Experienced reviewers and editors are

124

Solar Irradiance and Photovoltaic Power Forecasting

able to make decisions through examining the cited references. Indeed, an incomplete literature review has hitherto been a major reason for rejection. In any academic field, there are researchers who are well-established for their substantial contributions to the field. Given the fact that the reviewers and editors are likely from the same field, they expect to see discussions about those earlier works by those established researchers, without which any subsequent analysis may be deemed as lacking foundation. Genuine ideas do not come frequently, it is thus logical to follow the path laid out by the forerunners. Some journals which publish energy forecasting research mandate binge selfcitation of recent publications from their own journals, as a means to inflate their impact factors. This action not only violates the publication ethics but also has the tendency to isolate the journal from the rest of the literature. To hide the true underlying reason for rejecting a submission, those journals often use “lack of relevance to the journal” as a disguise; once binge self-citations to the journal are made, the submissions are often easily passed. In antithesis to that, many journals that value innovation and novelty often do not have a high impact factor, partly owing to the delay between the publication date and the time at which responsible citations arrive, and partly due to the technical depth usually associated with the truly innovative and novel proposals, which prevents people in the domain of mediocristan to apprehend the work. Discussions made about the references should focus on the essence rather than on the form. “Roster-like” reviews listing “who did what” have little meaning. Instead, authors should pay more attention to the logical flow of ideas, such as why certain things were proposed, what the benefits and drawbacks of the proposal are, what others have done, and what we can do to address the drawbacks. If this guideline is followed precisely, another desirable outcome follows, that is, the irrelevant references on side topics can be kept to a minimum. The coverage density of the literature should reduce as the area goes further away from the core proposal. Review papers are great sources of references. Whereas researchers are able to quickly gather enough references for their own papers, they are nevertheless encouraged to read the references at least once before citing them. We have seen references being included in a paper simply because someone else included them, whereas the actual references have long been lost in the midst of time (e.g., old proceedings that only appeared in print), or in a foreign language that is not understood by most. This is just a bare display of irresponsibility. 4.7.2.2

Forecasting terminology

As mentioned earlier, the introduction of new terms and jargon should be kept to a minimum, since, given the maturity of energy forecasting, most problems, processes, and phenomena, can be clearly explained by following existing terminology precisely. Not knowing the existing terminology may also lead to desk-rejections, since it displays the incapacity of the authors to position their research in the correct part of the knowledge base. In many circumstances, a word or a phrase might have been popularized by researchers in sister domains, so there is no need to reinvent the

Solar Forecasting: The New Member of the Band

125

wheel. On this point, philosophers are perhaps the most responsible species in terms of bookkeeping of terminology; the Stanford Encyclopedia of Philosophy is one of the centralized databases for philosophical terminology and concepts. Though the energy forecasting field lacks such an encyclopedia, and it is unlikely to come by any time soon, various International Organization for Standardization (ISO) or organizational (such as the Institute of Electrical and Electronics Engineers or International Energy Agency) standards are highly functional in that respect. Scientific forecasting has been around for a hundred years, and many forecasters along the way have summarized the history of it. Some deeply rooted practices, such as the out-of-sample tests (Tashman, 2000) or cross-validation (Arlot and Celisse, 2010), have also been surveyed, from which novices can trace the origin and legitimate variations of well-accepted forecasting practices. A concept central to energy forecasting is forecast horizon, which defines the length of time into the future for which forecasts are to be issued. The corresponding recommendation is to avoid using phrases such as “very short term,” “short term,” or “long term,” without explanations, since these descriptions take different interpretations in different energy forecasting domains, as we have seen in earlier sections of this chapter. Forecast horizon alone is insufficient in narrating the forecasting process, for instance, “day-ahead forecast” can either mean a single forecast for the next day, or 24 hourly forecasts for the next day, in that, the forecast resolution matters. The ambiguities regarding the forecast horizon, alongside other time parameters describing forecasts, are detailed in Section 5.6. Authors are encouraged to use the precise notation therein presented to describe their forecasting setup. 4.7.2.3

Enhancing reproducibility and falsifiability

The valuation of reproducibility has been steadily ramping up in academia, since it grants falsifiability of the research. This is due to three reasons. The first attractive feature of reproducibility is that it allows swift replication of the results, which saves time, and therefore is critical in the fast-paced “publish or perish” regime. The second appealing reason for reproducibility is that it is a sign of confidence and honesty, which is the best disclaimer against fraud, dishonesty, and other dodgy businesses that are becoming increasingly harder to detect nowadays. Third, full reproducibility, e.g., via the provided code, is able to resolve all linguistic ambiguities, which is particularly useful, with the ever more complex forecasting procedure turning into a characteristic of energy forecasting. Yang (2019a) separates reproducibility in forecasting research into two kinds, one of which provides both the data and code, and the other provides just the data. These two kinds of reproducibility are fundamentally distinctive: One is achieved by a direct demonstration of the forecasting procedure, which is called ostensive reproducibility; the other requires the readers to articulate the descriptive text provided by the authors, which is called verbal reproducibility, and it presupposes a certain amount of forecasting knowledge from the readers. Clearly, the former option is preferred over the latter. In fact, the best kind of contribution which a forecaster can make is to share her code with the rest of the community. Forecasting as a service is

126

Solar Irradiance and Photovoltaic Power Forecasting

a commendable mentality, the absence of which is hardly conducive to the progress of forecasting science—just imagine the ramifications of weather forecasters keeping NWP forecasts to themselves without disseminating them to the public. Of course, the value of any forecast is only defined by its user. Insofar as research papers on forecasting are concerned, the primary users are fellow forecasters in academia. If the proposed methods cannot be reproduced, the research papers have no intrinsic value. 4.7.2.4

Identifying a suitable journal

All journals need editors, and the academic standard of the editors directly affects the standard of the journal. Nature and Science have a large board of dedicated full-time editors to make sure the editorial expertise is capable of covering just every aspect defined in the journal scope. Nonetheless, most journals do not have the luxury of hiring many editors, and a natural consequence is the lack of experts among the editorial board members on certain subjects. Many general energy journals and power system journals receive and publish energy forecasting papers. Scanning through the editorial board members, one may soon notice that it is possible that none of the editors has any experience in forecasting, which puts their ability to gauge the quality of energy forecasting submissions in doubt. Echoing the earlier point of mandatory self-citation, the two groups of journals often overlap. Top journals should really be defined based on subject expertise, rather than based on impact factors. The subject expertise in energy forecasting is able to reflect through both the number and the percentage of energy forecasting papers in a journal. In this regard, Hong et al. (2020) surveyed the recent literature, and identified several top energy forecasting journals, such as Wind Engineering, Wind Energy, International Journal of Forecasting, IEEE Transactions on Sustainable Energy, Journal of Renewable and Sustainable Energy, or Solar Energy. That said, not every manuscript can eventually be published by the top journals. To avoid disappointment and wasting time on the peer-review process, authors are encouraged to conduct a pre-submission survey as to gauging the authors’ technical standards with that of the published works in those top journals. 4.7.2.5

Be persistent with what you do

It is believed that the understanding of a scientific subject is, in the main, proportional to the time spent, and this is true regardless of the level of intelligence of individuals. One may counter-argue that, with the accumulation of knowledge, the ability to make comprehension improves, and thus can lead to a faster rate of acquiring knowledge. This may be true on one account, but on the other account, established researchers are often inundated with other tasks, such as teaching, management, or business administration, that can deviate them from conducting first-hand research. On top of that, if an established researcher in one department is to extend her research into another which requires a substantially different thinking style and knowledge from the ones to which she is accustomed, there is no reason to suppose that an established researcher is any better than an undergraduate degree holder in the new department.

Solar Forecasting: The New Member of the Band

127

Indeed, Russell (2017) argued that the scientific method is intrinsically simple, but it is engaged by only a minority, who themselves confine the engagement to a minority of problems upon which they have enough capacity and experience to make educated and informed judgments. He further argued that due to the incapacity and lack of experience in other matters, even the most eminent man of science, when asked, would sooner or later express wholly untested opinions with a dogmatism, on subjects such as politics, theology, or finance. The idea central to Russell’s argument is that no one is able to be an expert on everything, which is true on the extent of faculties (i.e., major division of knowledge, such as science, engineering, or law), that of department (i.e., division of a faculty, such as electrical engineering, civil engineering, or mechanical engineering), and that of subject (i.e., division of a department, such as power system engineering, control & automation, or microelectronics). Energy forecasters can be divided, according to their professions, into statisticians, meteorologists, and engineers, none of which has the same educational background or trained way of thinking as the others. Hence, following the arguments presented above, one should not expect any individual energy forecaster to be familiar with all aspects of energy forecasting. If we are to project the distance between the origin and the frontier of energy forecasting on a 0-to-100 scale, in which 0 marks the position of someone who just entered the field, and 100 marks the collective knowledge of energy forecasters, it is estimated that even the best energy forecasters today would at most attain a position at the low tens, and this is only achievable with a lifetime of successive refinement of understanding and ideals. Clearly then, very few people are able to make useful contributions to energy forecasting research in their early careers. Nevertheless, the good thing is that the seasoned energy forecasters, who hold editorial and chair positions of various journals and conferences, know that well, and probably are willing to accept the fact that gaining experience is a gradual process that has to start somewhere. In conclusion, one has to be persistent and patient in conducting energy forecasting research. The entrance is instantaneous, but the commitment has to be long-term. Only then, does one have a chance to reach an elevated understanding in the field of energy forecasting.

4.8

CHAPTER SUMMARY

The movement of the field of energy forecasting, as is the case of any other scientific field, is partly cyclic and partly progressive. We may regard the cyclic part as repetitive and derivative, such as how a plain transfer of an existing method from one application area to another is resented nowadays, in that it is likely to attract criticism. Nonetheless, without the cyclic part, we cannot fix our attention upon what is progressive, upon what discriminates stellar works from mediocre ones. It is only by observing those occasions on which the movement deviates from cycling, we become aware of the advances made. On this point, although the research frontiers and common issues in energy forecasting have been identified and elaborated, it is not our hope to see these being followed by all. Instead, that information is only meant to provoke the thoughts of the responsible forecasters.

128

Solar Irradiance and Photovoltaic Power Forecasting

This chapter starts off by examining the position of each major domain of energy forecasting on a two-dimensional maturity scale that was proposed in 2016. Then, it goes on to discuss the key considerations and defining characteristics of individual domains. Through such an analysis of the state-of-the-art, it should be now clear that solar forecasting has unique aspects in terms of seasonal behavior, statistical distribution, physically possible limits of irradiance and solar power, spatio-temporal dependence, and irradiance-to-power conversion method. Indeed, these salient features of solar irradiance profoundly influence how solar forecasting should further develop. The single most important consideration of solar forecasting is the removal of seasonality by the means of a clear-sky model, which has the potential to explain all sources of variability in solar irradiance except for clouds. The next take-home point is that solar irradiance is a physical process that exhibits strong spatio-temporal dependence due to moving clouds, which can only be captured via physics-based methods with a minimum of a two-dimensional “view” of the space, which primarily refer to (1) image-based advection of cloud or irradiance fields, and (2) NWP. It follows that purely data-driven methods, especially those based on single-location data, are seriously handicapped, regardless of the complexity and sophistication of the methods. Deriving from this point, one can readily conclude the best approach for solar power forecasting should always take two steps: First forecast the irradiance, and then convert (post-processed) irradiance forecasts to solar power forecasts via a solar power curve. Whereas directly regressing solar power on forecast weather variables is one strategy for solar power curve modeling, using model chain constitutes another, where the knowledge of solar energy meteorology may allow for more elaborated modeling of the conversion process. Last but not least, since grid integration concerns the power system as a whole, developing forecasting methods that demand very specific data, which cannot be generalized, seems futile.

Guide to Good 5 AHousekeeping “Everything should be made as simple as possible, but not simpler.” — Albert Einstein

There are certain housekeeping chores that may be so familiar to expert solar forecasters that the narratives and descriptions regarding those chores, both in thinking and in writing, are often omitted or expressed only in brevity. With increasing emphasis placed on scientific novelty in solar forecasting research, it is frequently observed that there is an excessive amount of effort spent on piling those mathematical and algorithmic details that are not only unnecessarily abstract but also ostentatious. Those who do that seem to believe that the number of equations and the length of flowcharts are associated with the complexity and intricacy of a forecasting method, and thus possess more value than the actual execution, implementation, and uptake. The problem is exacerbated when complexity and intricacy are, sometimes, wrongfully associated with the technical skill of a forecaster, especially by those who know very little about solar forecasting. This phenomenon has been discussed in Chapter 2, in which using the lay audiences as decoys has been proposed to parry the intentional smoke grenades and to emphasize reproducibility. In the following pages, we have confined ourselves, in the main, to those subtle practices and standard procedures that expert solar forecasters regard as preliminary, and perhaps not worthy of discussion, but are profoundly important for novices to acknowledge and understand. In particular, the chapter proceeds from the most fundamental concern of solar forecasting in terms of practicality, namely, the choice of software; this is done in Section 5.1. Then in Section 5.2, we move on to give a full account of the terminology and notations of irradiance components and the k-indexes, which are often mixed up and used wrongly. After standardizing the terminology and notations, quality control of irradiance components, of which the importance has often been discounted during solar forecasting and forecast verification, is elaborated in Section 5.3. The choice of clear-sky model and the ways to obtain clear-sky irradiance are detailed in Section 5.4. In Section 5.5, the reader is alerted to potential dangers that may arise due to temporal misalignment and incorrect averaging of data. Finally, in Section 5.6, ambiguities with the terms “forecast horizon” and “forecast lead time” are highlighted, and a quadruplet of variables is proposed to fully elucidate the elements of time in operational solar forecasting.

DOI: 10.1201/9781003203971-5

129

130

5.1 5.1.1

Solar Irradiance and Photovoltaic Power Forecasting

PICKING THE RIGHT SOFTWARE AND TOOLS PYTHON AND R

Insofar as one wishes to work on data science, two programming languages are commonly acknowledged to be the most powerful, namely, Python and R. Furthermore, given the fact that the available packages and functions in either one of the languages can be called from another, mastering just one of them is thought sufficient for solar forecasting research. The reason for recommending Python and R is three-fold. First, one of the most prominent features of these languages is that they are associated with a large user base and numerous online forums, which offer strong support on various software-related issues from within the community itself. Stated differently, whenever a programming problem occurs, it is very likely that the solution is already made available by those who encountered the same problem earlier. Consequently, Python and R users can expect to enjoy a much swifter programming experience than users of other software choices. Second, both languages offer a multitude of packages (i.e., libraries), which contain functions that are written for specific tasks. At the moment, there are close to 20,000 R packages available on the Comprehensive R Archive Network (CRAN), which is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R, and Python packages are ten times greater in number than R, which are as listed on the official Python Package Index (PyPI), a repository of software for Python. These packages and libraries are contributed by scientists working in various domains, among whom many have a forecasting background. A convenient way to learn about the availability of R packages is by accessing the CRAN task views (https://cran.r-project.org/web/ views/), which is a centralized catalog of documents that provides guidance on which packages are relevant for which data science task. Otherwise, a Google search would usually suffice. The third benefit of Python and R is their open-source nature, which allows the user to access the code in full. Before 2005, Matlab was perhaps the most popular research language among engineers, and it is still being taught in many universities as the introductory programming language for scientific computing in general. Nevertheless, Matlab is a proprietary tool, meaning that the source code cannot be viewed by its users, who would then have a limited understanding of how a routine works and how to modify it if the routine does not fit exactly the task at hand. What exacerbates the situation further is that Matlab is also prohibitively expensive, which may be a problem for many people outside the academic setting. To that end, Matlab can no longer be placed with the same importance, at any rate, as Python and R. Learning a new programming language can be time-consuming, despite that all programming languages have some degree of resemblance. However, as data science is the foundation of forecasting, spending time learning a programming language that is likely to continue being popular over the following decades is a highly rewarding investment. Considering that everyone learns things in a different way, this book does not attempt to list the instructional references regarding Python or R.

A Guide to Good Housekeeping

5.1.2

131

SOLAR FORECASTING PACKAGES IN PYTHON AND R

There are many general-purpose data-science packages that are valuable to solar forecasting, among which the most iconic ones are Scikit-learn in Python (Pedregosa et al., 2011) and caret in R (Kuhn, 2021). Both Scikit-learn and caret are designed to streamline the process for constructing statistical and machinelearning models, in that they allow the fast realization of hundreds of supervisedand unsupervised-learning methods that have stood the test of time. Recall that one major caveat of data-science-based forecasting studies in the current literature lies in the generality of the superiority claims—there may be many other interchangeable, comparable, or better-performing method choices. But with the off-the-shelf implementations in Scikit-learn and caret, which are standardized in terms of how the functions are called, forecasters can include a large number of methods using a simple loop over the method names, and thus significantly extend the scope of comparison. The reader is referred to the Python code published by Pedro et al. (2019) and the R code by Yang et al. (2020b) for examples of how such loops can be realized in a solar forecasting context. Another notable work of this sort was carried out by Yagli et al. (2019a), who compared 68 machine-learning methods, based on the caret package, in a univariate solar forecasting case study. Besides general-purpose data science packages, there are also packages and libraries that are specifically written for forecasting purposes. The forecast package in R (Hyndman and Khandakar, 2008) is the most representative one, and has hitherto been the most popular choice among statisticians when it comes to methods and tools for displaying, analyzing, and forecasting univariate time series. However, as explained in Section 4.5.4, the forecast skill that is associated with univariate time series methods is often limited due to the lack of consideration of the spatio-temporal dependence of solar irradiance. Hence, the methods implemented in the forecast package are usually taken as benchmarks for spatio-temporal statistical models (e.g., Yang et al., 2015b) or physical models with a spatio-temporal nature such as those based on numerical weather prediction (NWP) (e.g., Zhang et al., 2022). Since the time series methods in the forecast package have undergone decades of refinements and upgrades, the level of optimization of its code is the highest, which is able to guarantee the quality of the benchmarks. Speaking of spatio-temporal statistics, it is not just highly relevant to solar forecasting but also an important domain of statistics. To that end, there is a CRAN task view topic dedicated to it.1 This task view aims at enumerating and summarizing R packages that are useful for analyzing spatio-temporal data. Some packages that are thought pertinent to solar forecasting are given in Table 5.1, though the list is far from complete. Particularly worth mentioning are the gstat (Gr¨aler et al., 2016) and FRK (Zammit-Mangion, 2020), which perform spatio-temporal kriging and fixed-rank kriging through spatio-temporal basis functions, respectively. In solar forecasting, spatio-temporal kriging has gained popularity (Jamaly and Kleissl, 2017; Aryaputera et al., 2015; Yang et al., 2013a), especially for forecasting when irradiance sensor networks are present. In contrast, fixed-rank kriging, which is able 1 https://cran.r-project.org/web/views/SpatioTemporal.html

132

Solar Irradiance and Photovoltaic Power Forecasting

to handle massive spatio-temporal data such as that from remote sensing, has yet to make an appearance in the solar forecasting literature. For an overview of spatiotemporal statistics with R, the reader is referred to the book by Wikle et al. (2019).

Table 5.1 Selected R packages for handling and analyzing spatio-temporal data. Package

Author

Description

Remark

atakrig

Hu and Huang (2020)

Area-to-area kriging

EnviroStat

Le et al. (2006)

Statistical analysis of environmental space-time processes

fields

Nychka et al. (2017) ZammitMangion (2020) Gr¨aler et al. (2016)

Tools for spatial data

Enables data fusion of remote-sensing datasets at different resolutions Contains routines to handle spatio-temporal anisotropy, which is common in irradiance One of the most essential packages for spatial data Performs kriging under high dimensionality

FRK

gstat

hts

Hyndman et al. (2020)

lars

Efron et (2004)

ncdf4

Pierce (2019)

raster

Hijmans (2020)

openair

Carslaw and Ropkins (2012)

al.

Fixed rank kriging

Spatial and spatiotemporal geostatistical modeling, prediction and simulation Hierarchical and grouped time series Least angle regression, lasso and forward stagewise Interface to unidata netCDF (Version 4 or Earlier) format data files Geographic data analysis and modeling Tools for the analysis of air pollution data

Performs kriging

spatio-temporal

Models and reconciles spatially distributed PV power forecasts Predictor selection for spatio-temporal data netCDF is the common format for weather data

Handles remote-sensing data Manipulates wind direction and speed

Global- and continental-scale NWP models are run and disseminated by national and international weather centers. Solar forecasters, on the other hand, are mainly concerned with acquiring weather forecasts, post-processing and/or downscaling them, and converting the post-processed forecasts to solar power forecasts. As the current literature shows, weather forecasts from the European Centre for MediumRange Weather Forecasts (ECMWF) and the National Oceanic and Atmospheric Administration (NOAA) have received the most attention. Whereas the detailed

A Guide to Good Housekeeping

133

procedures for acquiring NWP forecasts are deferred to Chapter 6, two R packages are mentioned here, namely, ecmwfr and rNOMADS, which provide programmatic interfaces to ECMWF’s dataset web services and the NOAA’s Operational Model Archive and Distribution System (NOMADS). Given the fact that the NWP models are operational, it is attractive to call the functions in these packages using cron jobs, which are able to schedule the automatic downloading of relevant weather variables right after dissemination, such that the final solar forecasts are also operational. The principle of image-based solar forecasting is based on deriving dynamic cloud, and thus irradiance, information from two or more consecutive snapshots. Such methods generally comprise three steps: (1) cloud motion estimation, (2) motion extrapolation, and (3) cloud-to-irradiation conversion. Block matching and optical flow are two popular approaches to estimating the cloud motion vectors (CMVs), which jointly describe a field representing the motion of clouds at each pixel of the image. These approaches for motion field estimation have been studied by computer scientists for decades, and there is no surprise that implementations of a majority of the popular variants of optical flow can be found in Python packages, such as OpenPIV (Liberzon et al., 2021) or OpenCV. The reader is referred to the books by Howse (2013) and Vill´an (2019) for OpenCV-related information, and to the recent article by Aicardi et al. (2022) for solar forecasting case studies that compared five different methods for CMV estimation. Converting weather forecasts to photovoltaic (PV) power forecasts requires extensive knowledge of solar energy meteorology. Section 4.5.5 has introduced the concept of model chain, which leverages a sequence of models that have a physical appeal for irradiance-to-power conversion. At the moment, there is no package in R that offers a complete workflow of model chain (see Table 5.2 for the available packages specifically written for solar applications), but the pvlib package (Holmgren et al., 2018) in Python does just that. That said, given the intricacy of the task that pvlib is designed to handle, the content of the package is too rich to be mastered in just a short period of time. Stated differently, one should anticipate a somewhat steep learning curve in order to be able to exploit the package at its full potential. It is for that reason that this particular package is revisited in Chapter 11, in which the scientific theory and its software duel are discussed in conjunction.

5.2

TERMINOLOGY OF IRRADIANCE COMPONENTS AND THE KINDEXES

This section covers the terminology of solar irradiance components and k-indexes. Two sets of solar irradiance components are of interest in solar forecasting. There are those that fall on a horizontal plane, and there are those that fall on a tilted plane. The rationale behind this method of division owes to the fact that most irradiance data are collected or produced for horizontal surfaces, whereas PV systems are, more often than not, installed with a tilt angle, in order to either maximize the annual energy production without incurring the cost of sun-trackers or comply with the orientation constraints of the rooftop or landscape on which the panels are installed. From the various irradiance components, one can derive a set of k-indexes, which can be

134

Solar Irradiance and Photovoltaic Power Forecasting

Table 5.2 Some notable R packages written specifically for solar applications. Package

Author

Description

Remark

camsRad

Lundstrom (2016)

Client for CAMS radiation service

insol

Corripio (2021)

SiteAdapt

Fern´andezPeruchena et al. (2020)

Functions to compute insolation on complex terrain Site adaptation of solar irradiance modeled series

Provides interface to CAMSRad and McClear web services Performs solar positioning

SolarData

Yang (2018c, 2019e)

Easy access of publicly available solar datasets

solmod

Yang (2016)

Transposition modeling of solar irradiance

Removes bias in gridded irradiance products based on a short period of ground data Downloads data from several popular databases such as the Baseline Surface Radiation Network Implements 26 transposition models

thought of as the normalized versions of irradiance components. For instance, the clear-sky index is in fact a k-index. In the following pages, irradiance components and k-indexes are formally presented. 5.2.1

IRRADIANCE COMPONENTS

The global horizontal irradiance (GHI), that is, the total irradiance received on a horizontal surface, can be split into a diffuse component and a beam component, called diffuse horizontal irradiance (DHI) and beam horizontal irradiance (BHI).2 This is the most fundamental decomposition of irradiance, which is also understood whenever the term closure equation is referred to, that is: Gh = Dh + Bh ,

(5.1)

where Gh , Dh , and Bh denote GHI, DHI, and BHI, respectively. One (minor) complication is that the beam irradiance component on horizontal surfaces is not directly observable, since one needs to point the radiometer at the sun to account for the beam component. In other words, when beam irradiance is measured, the radiometer strictly reports what is known as the beam normal irradiance (BNI), which relates to BHI through: Bh , (5.2) Bn = cos Z 2 As mentioned in Chapter 4, the word “beam” is used interchangeably with “direct,” but the former is preferred so as to avoid notation overload.

A Guide to Good Housekeeping

135

with Z being the solar zenith angle, which is the angle between the sun and the zenith. Therefore, the closure equation can also be written as Gh = Dh + Bn cos Z.

(5.3)

The second set of irradiance components, i.e., the ones that describe the irradiance on a tilted plane, can be termed analogously. The global tilted irradiance (GTI), which is also known as the in-plane irradiance or plane-of-array (POA) irradiance, can be split into a diffuse component and a beam component, but in addition, there is a third component due to ground reflection. GTI, or Gc , is related to the horizontal irradiance components through the transposition equation, which has its most general form as: beam comp. diffuse comp. reflected comp.   !   !  ! (5.4) Gc = Bn cos θ + Dh Rd + ρg Gh Rr , where θ is the incidence angle, which is the angle between the normal of the tilted surface and the sun; Rd is the diffuse transposition factor; Rr is the transposition factor due to ground reflection; and ρg is the surface albedo, which is the fractional amount of sunlight reflected back from the ground, see Gueymard et al. (2019) for a more precise definition. Whereas ρ can be measured or remote-sensed, Rd and Rr are subject to transposition modeling. In fact, transposition models, as referred to in the solar energy meteorology literature, only differ from each other by the way in which Rd is modeled. It is critically important to note that the power output of a PV system holds a linear relationship with GTI, but not with GHI. On this point, if the data permits the modeling of GTI, one has, therefore, little reason to consider GHI as a direct input of the forecasting model during PV power forecasting. The best-known transposition model, that is, the 1990 version of the Perez model (Perez et al., 1990), has long been shown to have reached an asymptotic level of optimization in terms of its predictive ability (Yang, 2016; Yang et al., 2014b), which gives all the more motivation for using it to obtain GTI. Unfortunately, this rule-of-thumb is rarely, if at all, respected by solar forecasters. The under-utilization of the Perez model in solar forecasting can be attributed to: (1) ignorance—transposition modeling, as a solar energy meteorology topic, is less known by forecasting novices; and (2) untested premise—people place strong trust in machine-learning methods, in terms of their ability to figure out the GHI–GTI–power relationship without explicit use of domain knowledge. Both ignorance and untested premises are harmful for obvious reasons. Hence, it seems necessary, in the remaining part of the book, to distinguish irradiance-to-power conversion (i.e., solar power curve modeling) by means of two types of procedures, one direct and one indirect, where the latter is usually preferred. As introduced in Section 4.5.5, the indirect procedure to acquire the solar power curve is called the model chain. Combining Eqs. (5.3) and (5.4), it is apparent that two out of three horizontal irradiance components are needed to uniquely determine GTI. In some situations where only GHI is available, forecasters need to split DHI and BNI from it. The procedure is known as separation modeling. Nonetheless, the accuracy of separation

136

Solar Irradiance and Photovoltaic Power Forecasting

models is much lower than that of transposition models, due to the non-injective mapping between GHI and DHI, which can be primarily attributed to clouds. That is, for the same GHI value, there could be infinitely many DHI values (Yang and Gueymard, 2020). Separation modeling is the bottleneck when it comes to indirect irradiance-to-power conversion. To improve its accuracy, researchers have developed new frameworks leveraging remote-sensing data and ensemble modeling in reference to the traditional empirical approaches, but have only attained limited success. At present, the most accurate deterministic separation model is the one proposed by (Yang, 2021a), whereas the most accurate probabilistic one is developed by Yang and Gueymard (2020). We shall circle back to transposition and separation modeling in a later chapter. Each of Gh , Dh , Bn , and Gc has a clear-sky counterpart, which has already been annotated in Section 4.5.1—an additional subscript “c” is used to denote the word “clear-sky,” so it gives Ghc , Dhc , Bnc , and Gcc , which are known as the clear-sky irradiance or clear-sky expectations. Besides clear-sky irradiance, there is extraterrestrial irradiance, which denotes the irradiance before entering the earth’s atmosphere. Extraterrestrial irradiance comes in two types, extraterrestrial GHI (E0 ) and extraterrestrial BNI (E0n ). Terminology-wise, when the phrase “extraterrestrial irradiance” is mentioned without specification, it usually refers to E0 —there are nevertheless many who do not follow this convention, so one just needs to be more careful in interpreting the intended meaning—whereas E0n , in this case, is referred to as the normal-incident extraterrestrial irradiance. E0 and E0n differ by a factor of cos Z, and both can be computed via a solar positioning algorithm to a very high degree of accuracy. Hence, they are regarded as known quantities in solar energy meteorology. 5.2.2

K-INDEXES

Through the irradiance components, a series of indexes can be defined, which are known as the k-indexes. The most used k-index in solar forecasting, as has been repeatedly stressed, is the clear-sky index. A common confusion in regard to the clear-sky index is that many think it only refers to the ratio of GHI and clear-sky GHI. On the contrary, the clear-sky index is not limited to that, but can also refer to the ratios between other irradiance components and their clear-sky expectations, of which the modeling strategy has been depicted in Section 4.5.1, as well as the ratio between solar power and its clear-sky expectation (Engerer and Mills, 2014). Frequently denoted with κ, clear-sky index is best to be accompanied by a subscript, whenever necessary, to clarify the original quantity before transformation, i.e., κGHI is the clear-sky index of GHI, κBNI is the clear-sky index of BNI, κPV is the clear-sky index of PV power, et cetera. Following the arguments presented in Chapter 4, clearsky index, may it be κGHI , κBNI , or κPV , would be the most indispensable feature of any solar forecasting method. The next k-index which we should introduce is the clearness index, which is defined as the ratio of GHI and extraterrestrial GHI. Denoted with kt , the clearness

A Guide to Good Housekeeping

137

index is:

Gh . (5.5) E0 In the literature, researchers who confused kt with κ are great in number. Given the fact that E0 also has a bell-shaped transient, kt is also a normalized measure of the overall amount of surface irradiance. Nonetheless, E0 does not consider the attenuation of the atmosphere on irradiance, hence, the normalization is not as thorough as that provided by clear-sky GHI. In solar forecasting, kt may be used as the forecast variable, on which the model is built, but is nevertheless suboptimal, due to the contaminated normalization. One particularly useful aspect of kt is for separation modeling, for mainly historical reasons—before clear-sky models were popularized, kt was the only option for normalization. The third k-index is called the diffuse fraction, or diffuse ratio, which is the ratio between DHI and GHI. Mathematically, denoting the diffuse fraction by k, one has: kt =

k=

Dh . Gh

(5.6)

The diffuse fraction marks the percentage of total surface irradiance that is attributed to diffuse light, so naturally, it takes values from the [0, 1] interval. Since diffuse light is primarily due to clouds, k is a coarse measure of cloudiness, in that, an overcast sky corresponds to a k near unity, and the value generally decreases with cloudiness. However, k can never reach zero, because, even under clear skies, there would be a portion of diffuse due to atmospheric constituents and particulates—recall the discussion in Section 4.5.1, particularly Eq. (4.9), i.e, Ghc = Bnc cos Z + Dhc . The diffuse fraction has been the subject of separation modeling, and almost all models seek to predict k. First, this is because k is a normalized quantity, which isolates the effect of the zenith angle, and thus the air mass, to a large extent. Second, once k is known, one can readily obtain Dh , since the premise of separation modeling is the availability of Gh . Last but not least, there are two other k-indexes that can be defined as analogous to kt . One of those is the ratio between BNI and extraterrestrial BNI, or equivalently, the ratio between BHI and extraterrestrial GHI: Bn Bh = . (5.7) kb = E0n E0 Since kb describes how much beam radiation is transmitted through the atmosphere, it is referred to as the direct transmittance (Gueymard, 2005), which reflects the optical state of the atmosphere related to aerosols, water vapor, and other gases. Similarly, one can write the ratio between DHI and extraterrestrial GHI: kd =

Dh , E0

(5.8)

and name it diffuse transmittance (Gueymard, 2005). With these definition, the kindexes are unequivocally interrelated, e.g., k = kd /kt or kt = kd + kb . In any case, kd and kb are not used as frequently as the others, but they find their application in quality control of irradiance, which shall be detailed next.

138

5.3

Solar Irradiance and Photovoltaic Power Forecasting

PERFORMING QUALITY CONTROL

The three complementary sources of solar radiation data are ground-based radiometers, remote sensing, and dynamical weather models, among which the measurements from well-maintained ground-based radiometers are commonly regarded as most reliable, and therefore are used to gauge the quality of data from the latter two. Nevertheless, it is characteristic of ground-based radiometers that low-quality and spurious measurements are produced over certain periods, which may be due to prolonged calibration cycles, instrument failures, logging errors, and maintenance deficiencies, among other factors. Therefore, compelled by the necessity to prevent such low-quality and spurious data points from affecting the subsequent results, quality control (QC) is viewed as the foremost step in all solar resource assessment and forecasting tasks, inasmuch as ground-based irradiance measurements are involved. It should be noted that QC for PV data is also required, but since the idea of performing QC for PV is analogous to performing QC for irradiance, the former is not explicitly discussed here, and the reader can expect to find most of the relevant issues pertaining to it from Killinger et al. (2017).3 Performing QC on data of environmental processes is a necessarily challenging task, for information about the true value of the process is always unknown. In an ideal case, one should wish to conduct repeated measurements of the same event, in order to minimize random errors or to eliminate obvious outliers. Given the fact that environmental processes are dynamic in nature and their values change with location and time, repeated measurements can only be realized if multiple identical instruments are collocated and used concurrently, which is highly inefficient, especially when a single instrument can cost thousands of dollars (see Table 6.1 in Chapter 6). To that end, besides a handful of research institutes, which have the luxury (and tenaciousness!) of setting up platforms with multiple radiometers for research purposes (e.g., Habte et al., 2016), operational radiometry almost always relies on just one radiometer for each irradiance component and at each weather station. And because conducting repeated measurements is not an economically viable option, one has to seek alternative ways to infer the range in which the true value of the process lies. Indeed, the philosophy of performing QC is to design a set of filters, narrating what constitutes an acceptable value. Whenever judgment on what is and what is not acceptable is concerned, it will have as a consequence that the notion of subjectivity is involved. The filters adopted by one individual may be regarded as too stringent or too relaxed by another. Hence, there is no QC procedure that can be universally agreed upon. Furthermore, because QC, as mentioned earlier, is an essential step in solar resource assessment and forecasting, the variants of QC procedures in the literature are too numerous to be fully consolidated. That said, the bulk of those variants can be traced back to the procedure outlined by Long and Shi (2008), who developed an automatic QC algorithm using data from the United States Department of Energy Atmospheric Radiation Measurement Program (ARM). Indeed, this QC procedure is also used as a basis for the content of this section. 3 This

article won the best paper award of Solar Energy in 2018.

A Guide to Good Housekeeping

139

The word “automatic” refers to the fact that the QC tests can be completed without human interpretation. This is nevertheless problematic for two reasons. First, as Long and Shi (2008) developed the procedure using the ARM data, which only covers some climate types in the United States, the empirical values selected in the tests do not reflect all possible climatic situations and are unable to accommodate all kinds of instruments (pers. comm. with Chris Gueymard, Solar Consulting Services, 2021). Second, the automatic tests alone do not cover all types of errors on the one hand, and may mislabel valid data as invalid on the other. Therefore, expert visual inspections are necessary, but the problem with subjective judgment is still of concern. Anyone who is conducting QC faces the dilemma of keeping only the ultra-best quality data, which are conducive to many data gaps in the end, versus leaving as many data points as possible, which results in a more complete dataset that is more amenable to operate, but of possibly lower quality. 5.3.1

AUTOMATIC TESTS

The automatic tests proceed with a set of inequalities, each defining a range beyond which a data point is more likely to be spurious than to be valid. One should take special note that the tests discussed in this section apply only to high-resolution (e.g., 1 min) measurements, since irradiance data with lower temporal resolution behave substantially different from the high-resolution ones, in a statistical sense. For instance, due to cloud-enhancement events, the 1-min Gh can be much higher than its clear-sky expectation, but this is rarely the case for 1-h Gh . What this suggests is that when ground-based measurements are used to validate or verify gridded irradiance or irradiance forecasts at low temporal resolutions (e.g., 15 min or 1 h), QC needs to precede averaging. This will inevitably leave many gaps in the after-QC data sequence, and gap-filling techniques are the subject of Section 5.3.3. The physically possible limits (PPL) tests employed by the Baseline Surface Radiation Network (BSRN), the world’s largest radiometry network, are first discussed. They are described by three inequalities, one for each irradiance component: • −4 W/m2 ≤ Gh ≤ 1.5E0n cos1.2 Z + 100 W/m2 , • −4 W/m2 ≤ Dh ≤ 0.95E0n cos1.2 Z + 50 W/m2 , • −4 W/m2 ≤ Bn ≤ E0n , and if any condition is not fulfilled, the corresponding irradiance component should be rejected, or at least flagged. These physically reasonable values were set to accommodate the wide range of irradiance conditions found over the entire globe (Long and Shi, 2008). The lower bounds in these tests are negative because thermopile-type radiometers produce negative nighttime offsets due to thermal (infrared) loss to the surroundings, e.g., through the receiver disk or the protective domes. On the other hand, the procedure of setting the upper limits was not explicitly explained in the original documents. However, as Long and Dutton (2010) noted, all limits are subject to further refinement, if one wishes to achieve better QC results on data from

140

Solar Irradiance and Photovoltaic Power Forecasting

some specific location. The ways in which the tests can be customized are discussed in the next section. Very similar to the PPL tests are the extremely rare limits (ERL) tests, which are as follows: • −2 W/m2 ≤ Gh ≤ 1.2E0n cos1.2 Z + 50 W/m2 , • −2 W/m2 ≤ Dh ≤ 0.75E0n cos1.2 Z + 30 W/m2 , • −2 W/m2 ≤ Bn ≤ 0.95E0n cos0.2 Z + 10 W/m2 . As the name suggests, irradiance values that fall outside of the bounds of these tests are extremely rare, in that, the tests describe the minimum acceptable values for shortwave radiation measurements (Long and Shi, 2008). The lower bound of −2 W/m2 in the ERL tests is tighter than that of the previous tests. Although the thermal offset of some radiometers can go beyond −4 W/m2 , the larger offset should nevertheless be corrected. Hence, a tighter lower bound is sufficient for well-calibrated instruments. For the upper bounds, the BSRN guidelines clearly indicate that their ERL should be set based on the climatology of the location (Lanconelli et al., 2011). Limits valid for one location are not necessarily applicable to another, as they may reject too many or too few data points, which echoes the general dilemma of QC stated earlier. The PPL and ERL tests both seek to perform QC by defining bounds for individual irradiance components. Since the PPL tests are simply more relaxed versions of the ERL tests, choosing one set of tests during solar forecasting is sufficient. Notwithstanding, the closure relationship allows another type of QC which assesses the quality of irradiance components jointly; the test, therefore, is known as the threecomponent closure (TCC) test, • abs(closr) ≤ 8%, for Z ≤ 75◦ and Gh > 50 W/m2 ; • abs(closr) ≤ 15%, for 93◦ > Z > 75◦ and Gh > 50 W/m2 . The quantity closr = Gh / (Bn cos Z + Dh ) − 1 is the decimal difference of the closure relationship. It should be noted that, in theory, the numerator and denominator of closr are interchangeable, but such an action alters the results in practice. Hence, following the original setting of Long and Shi (2008) is recommended. Furthermore, as the conditions for the TCC test set a minimum value for Gh , Gh values smaller than 50 W/m2 render the test inconclusive. In irradiance QC, data points that lead to inconclusive results are not rejected which, informally, means that one should give the benefit of the doubt to those data points. Lastly, since the TCC test evaluates all three components together, it is not possible to isolate the problematic component if the conditions of the test are not met. Stated differently, since the test is nondefinitive, all components must be rejected simultaneously if need be. There are also tests designated for the QC of various k-indexes defined in Section 5.2.2. Some of the k-index tests are comparative, while others, like PPL and ERL, use bounds. Some of the most popular k-index tests include:

A Guide to Good Housekeeping

141

• kb < kt , for Gh > 50 W/m2 , kt > 0, and kb > 0; • kb < (1100 W/m2 + 0.03 × altitude in m.a.s.l.)/E0n , for Gh > 50 W/m2 and kb > 0; • kt < 1.35, for Gh > 50 W/m2 and kt > 0; • k < 1.05, for Z < 75◦ , Gh > 50 W/m2 , and k > 0; • k < 1.10, for Z ≥ 75◦ , Gh > 50 W/m2 , and k > 0; • k < 0.96, for kt > 0.6, Z < 85◦ , Gh > 150 W/m2 , and k > 0. As with all of the aforementioned tests, if any condition is not met, the corresponding k-index test rejects the involved component(s). In the literature, some of the k-index tests go by other names, e.g., the tests involving k are known as the diffuse ratio tests. For physical interpretations of these tests, the reader is referred to Geuder et al. (2015). 5.3.2

VISUAL SCREENING

Automatic QC tests must be complemented with visual screening. The benefits of visual inspection during QC have been highlighted (Lanconelli et al., 2011; Yang et al., 2018b). Worth mentioning is PanPlot2 (Sieger and Grobe, 2013), which is a time series visualization software specifically designed for data hosted on PANGAEA, on which many datasets of earth and environmental science, including the BSRN data, are published. PanPlot2 has been recommended by BSRN for QC purposes (Driemel et al., 2018). Aside from PanPlot2, other QC software and procedural efforts are available, such as the bias-based quality control (BQC) web service (Urraca et al., 2020) or the SERI-QC procedure employed by the National Renewable Energy Laboratory. Notwithstanding, performing visualization through programming would always be the most flexible option. In what follows, several most commonly used visualizations are exemplified through a one-year dataset collected at Cairo University, Egypt (30.036◦ N, 31.009◦ W, 104 m), which consists of 1-min measurements of three horizontal irradiance components. Figure 5.1 shows the scatter plots of GHI, DHI, and BHI versus the zenith angle. In each subplot, the two eyebrows denote the upper bounds of PPL and ERL tests for the respective irradiance component. From this figure, the way to customize the tests is immediately obvious. Take the upper bound of the PPL test for GHI for instance, namely, 1.5E0n cos1.2 Z + 100 W/m2 , the parameters 1.5, 1.2, and 100 jointly control the shape of the eyebrow, which can be adjusted according to the local data. In the particular case shown in Fig. 5.1, the ERL tests appear to be too loose, which is likely to encourage low-quality data. More specifically, from the scatter plot of DHI, one can see that there is an excess amount of data points located just below the ERL upper bound, which have no issue passing the test but are conspicuously faulty. The visualizations for the three-component closure test for the Cairo data are shown in Fig. 5.2. In both plots, black dots mark the rejected data points, whereas

142

Solar Irradiance and Photovoltaic Power Forecasting

GHI

DHI

BHI

Irradiance [W/m2]

2000 1500 1000 500 0

0

25

50

75

0

25

50 75 Zenith [°]

0

25

50

75

Figure 5.1 Scatter plots of GHI, DHI, and BHI (i.e., BNI× cos Z), versus zenith angle, as well as their corresponding physically possible limits (black eyebrow) and extremely rare limits (gray eyebrow). The data is collected in Cairo, Egypt (30.036◦ N, 31.009◦ W, 104 m), over 2019. the gray dots are the retained ones. Figure 5.2 (a) helps to visualize how well the three components agree with the closure relationship—perfect measurements would coincide with the identity line. In this regard, the plot depicts the magnitude of deviations, and the range of GHI over which the deviations occur. Figure 5.2 (b) is a time series plot of the ratio between measured and calculated GHI. Through this kind of time series, one can detect possible drift in the GHI ratio, which calls for targeted further investigation. (a)1250

(b)

GHI/GHIcalc

GHIcalc[W/m2]

2.0 1000 750 500 250 0

1.5 1.0 0.5

0

250

500 750 1000 1250 GHI [W/m2]

Apr

Jul Time

Oct

Figure 5.2 (a) Scatter plots of measured GHI versus calculated GHI. (b) The ratio of measured and calculated GHI. In both plots, black dots indicate those data points rejected by the three-component closure test. The same data as Fig. 5.1 is used for plotting.

Visualizations for the k-indexes are typified by Fig. 5.3, in which subplot (a) shows a kt –k plot commonly used in separation modeling, and (b) shows a kt –kb

A Guide to Good Housekeeping

143

plot. Recall that in theory: k=

Dh ≤ 1, Gh

(5.9)

which means that DHI cannot exceed GHI. In practice, however, a conditional upper bound is used, which is to account for measurement uncertainties in both the GHI and DHI pyranometers; this conditional upper bound is indicated by the solid and dashed lines in Fig. 5.3 (a)—the three horizontal segments correspond to the three k-index tests pertaining to k, as listed in Section 5.3.1. One should note that not all points beyond the bound are rejected, e.g., for Gh ≤ 50 W/m2 , the tests are inconclusive and the data points are retained. These retained points appear near the low-kt –highk region of Fig. 5.3 (a), which violate the physical law, and could be removed if a better QC outcome was desired. Similar to the case of k, kb also does not exceed kt in theory, because Bh Gh kb = ≤ = kt . (5.10) E0 E0 In practice, such a situation may occur, e.g., during a cleaning event of the GHI pyranometer (Geuder et al., 2015). The line segments in Fig. 5.3 (b) show the bounds of the k-index tests relevant to kb and kt , outside of which the data points need to be rejected (as indicated by the black dots). (b) k b [dimensionless]

k [dimensionless]

(a) 0.9 0.6 0.3 0.0 0.0

0.5 1.0 k t [dimensionless]

0.75 0.50 0.25 0.00 0.0

0.5 1.0 k t [dimensionless]

Figure 5.3 (a) Diffuse ratio (k) versus clearness index (kt ). (b) Direct transmittance (kb ) versus clearness index (kt ). The line segments mark the bounds described by various k-index tests. The same data as Fig. 5.1 is used for plotting.

Aside from the plots related to the automatic tests, there are many other routes that can lead to the visualization of data quality as a whole. For instance, one of the most popular techniques, namely, irradiance calendar heat map, is exemplified in Fig. 5.4. GHI (or BNI) values are represented by the colors of pixels arranged according to the day of the year and the time of the day. This type of heat map is proficient in detecting gaps in measurement and possible shading events. In Fig. 5.4, the vertical white stripes mark the times at which measurement gaps occur, and the arched white stripes during summer mornings indicate another systematic issue, which warrants

144

Solar Irradiance and Photovoltaic Power Forecasting

further inspection. The heat maps can also be arranged based on the sun path, which is shown in Fig. 5.5. The dotted line at low solar elevation denotes the horizon line seen from the measurement site. This horizon line is derived from the Shuttle Radar Topography Mission (SRTM) digital elevation model, which can be accessed from the SoDa website.4 SoDa delivers solar radiation and meteorological data services for solar photovoltaic and thermal applications.

Time of day

(a) 16

GHI [W/m2] 1250 1000

12

750 500

8

250 0

4 100

200 Day of year

300

Time of day

(b)16

BNI [W/m2] 750

12

500 8

250 0

4 100

200 Day of year

300

Figure 5.4 Heat maps of (a) GHI and (b) BNI. The last visualization tool which we wish to discuss is the routine time series plot of the clear-sky index. Examining the time series, both as a whole and over selected time intervals, is an often neglected aspect of QC. When the time series is viewed as a whole, one can become aware of calibration issues or sensor changes. For instance, the GHI clear-sky index plotted in Fig. 5.6 obviously attracts skepticism. There are two unusual aspects. First, from January to July, the frequency at which κ exceeds 1 is too high, for such a high occurrence of cloud-enhancement is very unlikely. Second, there appears to be a drift in κ values since July. In any case, these unusual observations can be further checked by plotting a shorter time series, e.g., over a window of a few days, such that the intra-day transient of κ can be observable. 4 https://www.soda-pro.com/web-services#altitude

A Guide to Good Housekeeping

145

(b) 80 Solar elev. [°]

Solar elev. [°]

(a)80 60 40 20 0

0

100

200 Azimuth [°]

GHI [W/m2]

60 40 20 0

300

0

100

200 Azimuth [°]

BNI [W/m2]

0 300 600 9001200

0

300

250 500 750

Figure 5.5 Heat maps of (a) GHI and (b) BHI plotted based on the sun path. Dotted lines

Clear−sky index, κ [dimensionless]

mark the horizon, which reflects potential shading.

1.6 1.2 0.8 0.4 0.0 Jan

Apr

Jul Time

Oct

Figure 5.6 Time series plot of GHI clear-sky index, κ. The dashed line indicates κ = 1.

5.3.3

GAP-FILLING TECHNIQUES

It is desirable to have serially complete data for forecasting. One main reason is that many algorithms, and therefore their software implementations, require gap-free data to work. For example, the 1-step-ahead persistence model takes the observation at time t − 1 as the forecast for t. If that observation at t − 1 is missing, persistence would have to rely on an observation made at an earlier time. Although this can result in a forecast value, the forecast horizon is no longer 1-step ahead, which, however small the effect may be, affects the quality of persistence. To that end, gaps in afterQC ground-based data are often filled via statistical means, which are referred to as data imputation techniques. Numerous statistical gap-filling methods have been proposed (e.g., Yelchuri et al., 2021; Demirhan and Renwick, 2018). These methods are either based on interpolation of various kinds, or based on time series modeling, such as Kalman filtering

146

Solar Irradiance and Photovoltaic Power Forecasting

or moving average. According to a comparison of nine different methods, the errors of the imputed data using both classes of methods are similar, and more elaborated methods do not necessarily outperform the simple ones (Yelchuri et al., 2021). This can be expected as the deficiencies of using statistical models to capture the dynamics of clouds are well known. Besides the statistical gap-filling methods, a couple of alternatives could be thought of. First, if data other than irradiance itself, such as cloudiness or aerosol, are present, one can opt for a similarity-based method to identify similar conditions in the recent past, and use the past observations to fill the current data gap. For instance, the production of National Solar Radiation Data Base (NSRDB) fills the missing irradiance values by borrowing cloud properties from the most recent valid time stamp with the same cloud type as the time stamp being imputed (Sengupta et al., 2018). However, as cloud type does not have a strong correspondence with irradiance, the filled irradiance values often contain artifacts, such as unnatural and sudden downward spikes on a clear day (Yang, 2021b). Next, based on the strong empirical evidence that satellite-derived irradiance data has a much smaller uncertainty as compared to NWP forecasts, one may consider filling those gaps in ground-based measurement sequence using satellite-derived irradiance. In a more general sense, there have already been some studies that investigate the possibility of replacing ground-based measurements with satellite-derived irradiance for certain solar applications. For instance, Yang and Perez (2019); Perez et al. (2016b) compared ground-based observations and satellite-derived irradiance in terms of their ability to gauge the quality of NWP forecasts. In both studies, the root mean square errors (RMSEs) of the NWP forecasts under scrutiny were computed with respect to both sources of reference, and the two RMSEs were found to be approximately equal, suggesting that satellite-derived irradiance is as effective as ground-based measurements in gauging forecasts, insofar as RMSE is taken as the accuracy measure. However, as further explained in Section 6.1.2, some caveats should be added to this conclusion—we should let this serve as a preemptive statement. Regardless of whether or not, or to what extent, ground-based measurement and satellite-derived irradiance are interchangeable, since gap filling only requires a small proportion of missing ground-based data to be filled by something slightly more inaccurate, we have every confidence that employing satellite-derived irradiance is the most preferred gap-filling technique.

5.4

CHOICE OF CLEAR-SKY MODELS IN SOLAR FORECASTING

There are close to one hundred clear-sky models proposed in the literature with varying complexity (Sun et al., 2019, 2021a), hence, the choice of clear-sky model for solar forecasting purposes calls for attention. On a very general account, clear-sky models are either physical or empirical, and the trade-off is one between complexity and quality. Physical models are of higher quality, but they require as input several atmospheric parameters that need to be solicited from heterogeneous sources; in contrast, empirical models are generally less accurate, but can be computed with just a few lines of code with easily obtainable parameters. When clear-sky models are used

A Guide to Good Housekeeping

147

for resource assessment purposes, for example, in producing satellite-derived irradiance, the quality is of primary concern. However, when clear-sky models are used during solar forecasting, i.e., for deseasonalization and normalization purposes, one must inquire about two other aspects. The first aspect is how much the quality of the clear-sky model is able to propagate into, and thus affect, forecast quality, and the second is to what extent the statistical properties of the clear-sky index hold to justify the initial proposition of using the clear-sky model to normalize irradiance. On the first aspect: Compared to the uncertainty of forecasting models, one can deem that the uncertainty of clear-sky models is much lower. Hence, whether a high-performance clear-sky model would lead to quantifiable benefits in forecast quality is unknown. It is logically true that better input (i.e., high-quality clear-sky index) often leads to better output (i.e., clearsky index forecast), the changes introduced by employing a good clear-sky model on top of employing an average one might not be always significant, and statistical analysis is thought to be necessary if any conclusion is to be drawn. On the second aspect: The initial proposition of using a clear-sky model to deseasonalize/normalize irradiance time series is to hope the clear-sky index can be independent of the zenith angle, such that the conditional heteroscedasticity in irradiance can be excluded from the analysis. In other words, one wishes the clear-sky index to be stationary, which is a desirable statistical property on many accounts, particularly during model building and diagnosis (Box et al., 2015). Combining the discussions so far, this section aims to investigate the choice of clear-sky model in solar forecasting from three perspectives: (1) accessibility, (2) forecast performance, and (3) statistical properties. Accessibility, in the present case, refers to the effort involved in acquiring clear-sky irradiance via a clear-sky model— for example, a model has low accessibility if it requires time-consuming coding effort or input preparation, and high accessibility if its model output can be downloaded directly. Forecast performance is analyzed through a concept called “mean square error (MSE) scaling,” which allows a forecaster to decompose the MSE of reference irradiance forecasts into three terms, each carrying a notion of predictability. The degree of validity of the decomposition, however, depends on how much the clearsky index time series deviates from stationarity. To that end, stationarity must be examined via hypothesis testing. 5.4.1

REST2, MCCLEAR, AND INEICHEN–PEREZ

Instead of bringing a long list of clear-sky models into the discussion, which would not be efficient, three representative models with varying predictive performance are selected and put to analysis, in the following pages. The first candidate is the REST2 model, as has been introduced in Section 4.5.1, which is the highest-performance model (Sun et al., 2019). REST2 is a physical model that leverages a simplified radiative transfer in modeling the attenuation of extraterrestrial radiation as it traverses the atmosphere. Since the amount of attenuation depends on a series of atmospheric absorption, emission, and scattering processes, a total of nine input parameters are needed by REST2, namely, extraterrestrial BNI (E0n ), zenith angle (Z),

148

Solar Irradiance and Photovoltaic Power Forecasting

ground albedo (ρg ), surface pressure (p), aerosol optical depth at 550 nm (τ550 ), ˚ Angtr¨ om exponent (α), total column ozone (uO3 ), total nitrogen dioxide amount (uNO2 ), and total precipitable water vapor (uH2 O ). Among these nine parameters, E0n and Z can be calculated via a solar positioning algorithm, whereas the remaining ones are to be obtained from other sources. On this point, atmospheric remote-sensing and reanalysis products, such as the Moderate Resolution Imaging Spectroradiometer (MODIS) atmosphere gridded product (Levy et al., 2013; Remer et al., 2005), the Copernicus Atmosphere Monitoring Service (CAMS) reanalysis (Inness et al., 2019; Flemming et al., 2017), or the ModernEra Retrospective Analysis for Research and Applications, version 2 (MERRA-2; Gelaro et al., 2017), which offer long-term global-coverage estimates of atmospheric composition, are the obvious options. One practical limitation of employing remotesensing products is that they do not offer all required REST2 inputs, so that one has to seek any remaining variables from additional data sources; considering reanalyses offer all required inputs, the remote-sensing option is not considered further. Between the two aforementioned reanalyses, namely, CAMS and MERRA-2, the latter might be deemed more adequate for REST2 clear-sky modeling for two reasons. One of those is that the temporal resolution of MERRA-2 is hourly, which is higher than the 3-hourly resolution of CAMS; since clear-sky irradiance is often needed at 1-min resolution (see Section 5.5.2), higher-resolution input is preferred (Fu et al., 2022). Another reason lies in accuracy. Although both reanalyses host a wide range of variables with errors being spatially heterogeneous, given the fact that τ550 and α are the most influential factors in determining the clear-sky irradiance, MERRA-2 aerosol optical depth (AOD), which has been verified globally to be of slightly better quality than CAMS AOD (see Fig. 5.7; also Gueymard and Yang, 2020), due to its advanced AOD data assimilation technique (Randles et al., 2017; Buchard et al., 2017), is again preferred. Whereas the preference of MERRA-2 over CAMS can be regarded as general, some studies, which compared MERRA-2- and CAMS-powered REST2 models, have indeed shown that CAMS is more advantageous over certain specific regions, more particularly those with high anthropogenic emissions and daily biomass burning, owing to its timely assimilation of aerosol (Fu et al., 2022). Beyond using those existing available aerosol products, further refinements in REST2 accuracy may result from applying data-driven corrections to remote-sensed or modeled AOD (e.g., Fu et al., 2023; Zhang et al., 2020). In contrast to REST2, a clear-sky model with a much simpler construct but decent performance is one proposed by Ineichen and Perez (2002), which is known as the Ineichen–Perez clear-sky model. Instead of the multi-parameter modeling of the attenuation effect on radiation, as in the case of REST2, this clear-sky model approximates the atmospheric transmittance by a single exogenous5 parameter called the Linke turbidity factor (TL ). The history of using TL to express the total optical thickness of a cloud-free atmosphere traces back to 1922, when Linke first proposed the relationship: Bnc = E0n exp(−δcda · TL · AM), (5.11) 5 By

“exogenous” we mean parameters other than those calculable ones, such as zenith angle.

A Guide to Good Housekeeping

149

Continent

τ550 MBE

τ550 RMSE

South America North America Island/Ocean Europe Australia Asia Antarctica Africa −0.4

0.0 α MBE

0.4

0.0

0.3

−0.4

0.0

0.4 0.00 0.25 Error [AOD unit]

0.6 α RMSE

0.9

South America North America Island/Ocean Europe Australia Asia Antarctica Africa

CAMS

0.50

0.75

MERRA−2

Figure 5.7 Mean bias error (MBE) and root mean square error (RMSE) of 3-h AOD at ˚ 550 nm (τ550 ) and Angtr¨ om exponent (α) over 2003–2017 at 793 AERONET sites worldwide (CAMS and MERRA-2 gridded estimates against AERONET ground-based measurements). Tukey’s boxplots are used for visualization. Black dots indicate outliers. The evaluation is grouped by continent. The height of the boxes is proportional to the number of sites. where δcda is the optical thickness of an aerosol- and water-free atmosphere— subscript “cda” stands for clear and dry atmosphere—AM is the air mass. Based on this formula, one can see that the optical thickness of a cloud-free atmosphere is the product of δcda , TL , and air mass, and the dependence on the last item is thought problematic. It is motivated by this fact that Ineichen and Perez (2002) proposed an air-mass-independent formulation of TL , which led to the Ineichen–Perez clear-sky model. The Ineichen–Perez clear-sky model can be implemented with just a few lines of code on top of the solar-positioning result, and is thus popular among solar forecasters (Yang, 2020a). However, based on the survey by Sun et al. (2019), this model only attained an average performance ranking among the 75 models compared. Standing between the predictive performance of REST2 and Ineichen–Perez is the McClear clear-sky model (Lef`evre et al., 2013; Gschwind et al., 2019), which is another physical model. McClear simplifies the radiative transfer calculation via

150

Solar Irradiance and Photovoltaic Power Forecasting

a look-up table (LUT). The advantage of the LUT approach consists in its ability to speed up the radiative transfer calculation by five orders of magnitude, without sacrificing too much accuracy (Lef`evre et al., 2013). The McClear model is parameterized by ρg , Z, parameters describing the optical state of the atmosphere (such as uO3 uH2 O , τ550 , or α), and those solar-positioning parameters. Particularly, the latest version of McClear (version 3) exploits the aerosol properties offered by CAMS using the approach of Ceamanos et al. (2014), which has led to a smoother transition between different mixtures of aerosols, and thus has resolved the issue of having discontinuities in irradiance time series as observed in the previous versions (Gschwind et al., 2019). Besides the better exploitation of aerosol properties, McClear v3 has several other improvements over its predecessors, of which the details can be found in Gschwind et al. (2019). 5.4.2

ACCESS AND COMPUTATION OF CLEAR-SKY IRRADIANCE

It has been argued that Ineichen–Perez, McClear, and REST2 typify the average-, good-, and top-performance clear-sky models in the literature. However, not all solar forecasters are familiar with the specifics of accessing these models. Take for instance the original publication on REST2 (Gueymard, 2008), it does not have the most user-friendly presentation—i.e., a step-by-step guide—and the numerous equations presented therein can be overwhelming for people without an atmospheric science background. It is perhaps due to this reason that the adoption of the REST2 model in solar forecasting has been dire for more than a decade since its proposal. Similarly, the pair of documents on the McClear model (Gschwind et al., 2019; Lef`evre et al., 2013) focuses more on the science behind the model rather than on execution. Hence, it is necessary to discuss these models with pragmatism, since most forecast practitioners would pay more attention to implementation than the radiative transfer theory. Before 2019, the implementation of REST2 was attempted by only a handful of researchers in the field (e.g., Engerer and Mills, 2015; Zhong and Kleissl, 2015). Since publishing reproducible solar energy research with open-source data and code was not a popular practice at any rate, at that time, REST2 remained more or less proprietary to those who spent a significant amount of time on implementing it, and anyone else wished to use it would necessarily need to spend more or less the same effort. It was not until the paper by Sun et al. (2019) that the situation was completely altered. In that paper, the R code for REST2v5 and 72 other clear-sky models was released due to the authors’ endorsement of reproducible research. With that piece of code, anyone with basic programming literacy is able to generate REST2 results with ease. The model input, as used by Sun et al. (2019), comes from the MERRA-2 reanalysis, which implies that the REST2 model can now serve to compute clear-sky irradiance for any location on earth, over any period between 1980 to now, which is the temporal coverage of MERRA-2. MERRA-2 data can be downloaded from NASA’s Goddard Earth Sciences Data and Information Services Center6 (GES DISC), which hosts a wide range of global climate data, with a particular focus 6 https://disc.gsfc.nasa.gov/

A Guide to Good Housekeeping

151

on atmospheric composition and dynamics, precipitation, and solar irradiance. Data hosted on GES DISC can be downloaded fairly conveniently through the built-in interactive subsetting tool. Once the subset of data is selected, one may proceed with the download either from the browser itself or via command line tools, by following the foolproof instructions for using wget or curl on Windows, Linux, or Mac. Besides the official downloading tool, Bright et al. (2020) has published a Python package for MERRA-2 downloading. McClear clear-sky irradiance can be obtained from the SoDa website,7 which is a web service that grants access to numerous solar, meteorological, astronomical, and atmospheric datasets. On a positive note, this web service allows browser-based downloading of all three clear-sky irradiance components (Ghc , Bnc , and Dhc ) for global locations, for a time range of 2004 to two days ago, in 1-min to 1-month resolutions. Furthermore, because surface radiation is also marginally affected by altitude, the web service applies an on-the-fly height correction to the radiation values before disseminating them to users. Although the service is occasionally unavailable due to scheduled maintenance or server downtime, the service is free of charge with unlimited access. In situations where the McClear irradiance needs to be integrated with other computer experiments, one can also access McClear via a programming means, with the help from the official R package camsRad (Lundstrom, 2016). The drawback of this web service, however, is that it does not allow users to scrutinize McClear from a technical perspective, which prevents interested individuals from modifying and extending the model. What this also implies is that the McClear clear-sky model is only able to support solar forecasting in a research setting, but not operationally, since operational forecasting demands future clear-sky irradiance. (A 2-day-ahead version of McClear is available in the HelioClim-3 database on the SoDa website—pers. comm. with Philippe Blanc, MINES ParisTech, 2022—but that horizon is still shorter than the one demanded by most grid integration needs.) On this account, MERRA-2-powered REST2 suffers from the same limitation, but the option of using forecast model input from other sources remains open—e.g., one can use the forecasts issued by ECMWF. Ineichen–Perez model is less elaborate in terms of formulation, and thus welcomes direct implementation. The only exogenous variable, namely, the Linke turbidity, is however not a quantity commonly available in real-time or high temporal resolution; instead, the climatology value is often used. The gridded monthly average of Linke turbidity can be obtained from the SoDa website, in the format of georeferenced TIFF maps. Since these maps have global coverage, one is able to read off the value at any location of interest. Standard implementations of the Ineichen–Perez model can be found in both Python (pvlib package; Holmgren et al., 2018) and R (SolarData package; Yang, 2018c). Unlike REST2 and McClear, Ineichen–Perez is not limited by the period over which clear-sky irradiance is needed, since the same climatology values are shared for all years. This, of course, completely ignores the inter-annual variability in clear-sky irradiance. 7 http://www.soda-pro.com/

152

5.4.3

Solar Irradiance and Photovoltaic Power Forecasting

MSE SCALING

To investigate the effect due to the choice of clear-sky model on solar forecast performance, a forecasting method has to be specified beforehand. Because the quality of forecasts is method-dependent, it would be useful to decouple such dependence as far as possible, such that the intrinsic relationship between the choice of clear-sky model and forecasting performance can be isolated. On this point, reference solar forecasting methods, such as the optimal convex combination of climatology or persistence (CLIPER; Yang, 2019b,f), which are na¨ıve in construct and do not require training, are thought to be appropriate. As these reference forecasting methods are to be discussed in detail in Chapter 9, only key results are listed here to facilitate the present discussion. To avoid notation overload, the reference irradiance forecast and corresponding irradiance observation are denoted using r and x, respectively, whereas the clear-sky irradiance is denoted using c. Hence, by definition, reference forecast of clear-sky index and clear-sky index observations are to be written as ρ ≡ r/c and κ ≡ x/c. The first result is that the h-step-ahead clear-sky index forecast made for time t using CLIPER can be expressed as: ρt = γh κt−h + (1 − γh ) E(κ),

(5.12)

where γh = corr(κt−h , κt ), which is the lag-h autocorrelation of the clear-sky index time series, and E(·) denotes the expectation of the argument. Stated differently, the CLIPER forecast is a convex combination of γh times the persistence forecast and (1 − γh ) times the climatology forecast. With that, the second result can be obtained via a straightforward derivation—the MSE of ρ gauged by κ is:   (5.13) MSE(ρ, κ) = 1 − γh2 V(κ), where V(·) denotes the variance of the argument. The third result is established upon an assumption. Suppose the clear-sky index time series {κt } is stationary—this is the assumption—it follows that the forecast clear-sky index time series {ρt } is also stationary. Consequently, the MSE of irradiance forecast, MSE(r, x), is: # " # " MSE(r, x) = E (r − x)2 = E c2 · (ρ − κ)2 #     " (5.14) = E c2 · E (ρ − κ)2 = E c2 · MSE(ρ, κ). It should be noted that the above equation is only valid if {κt } is stationary, which implies a constant variance and # an autocorrelation that is only a function of the time " lag, such that the E (ρ − κ)2 term is a constant and can be taken out of the summation. Yang (2020a) refers to Eq. (5.14) as the exact MSE scaling, which suggests that when the clear-sky index is stationary, the MSE of clear-sky index forecasts can be scaled to the MSE of irradiance forecasts by the sum of squared clear-sky irradiance. From a forecast verification viewpoint, it is clear  fromEq. (5.14) thattheMSE of irradiance is the product of three terms, namely, 1 − γh2 , V(κ), and E c2 , which

A Guide to Good Housekeeping

153

can be examined separately. A better clear-sky model should correspond to a smaller MSE, which, under the current   MSE decomposition, translates to a higher γh , a lower V(κ), and/or a lower E c2 . The remaining task is thus to verify whether or not, or to what extent, the stationarity assumption on the clear-sky index is valid. 5.4.4

STATIONARITY TEST

Clear−sky index, κ [dimensionless]

Stationarity as a fundamental concept in time series analysis has a rigorous mathematical definition (Box et al., 2015). In words, a time series is said to be stationary if its statistical properties do not change with time. Indeed, as the central aim of normalizing irradiance is to remove the zenith dependence and thus the bell-shape seasonality, one expects that a clear-sky index time series to exhibit low instability, which is herein defined to be the variance of the means of sub-series, and low lumpiness, which is herein defined as the variance of the variances of the sub-series. Hyndman and Athanasopoulos (2018) offered several heuristics to rule out stationarity through visual inspection: (1) observe for periodicity, (2) observe for trend, and (3) observe for a change in variance. Based on these heuristics, one is able to quickly identify time series that are obviously not stationary. 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0 1.5 1.0 0.5 0.0

REST2

McClear

Ineichen−Perez

2015

2016

2017 Year

2018

2019

Figure 5.8 Time series of 15-min clear-sky index, retrieved using REST2, McClear, and Ineichen–Perez models, at Table Mountain, Colorado (40.125◦ N, 105.237◦ W, 1689 m), over 2015–2018. Figure 5.8 shows the 15-min clear-sky index time series retrieved using REST2, McClear, and Ineichen–Perez models, from a GHI time series collected at Table Mountain, Colorado, over a four-year period from 2015 to 2018. Whereas there is neither an increasing nor a decreasing trend in these time series, one can argue for the existence of a weak annual periodicity, which is evidenced by the minor reductions in variance during the winter months. In this regard, if these observations can be considered admissible, we can readily conclude that the clear-sky index is not stationary. That said, just like the case of all methods based on visual inspection, the conclusion may be deemed subjective, since a pattern obvious to one observer might

154

Solar Irradiance and Photovoltaic Power Forecasting

Clear−sky index, κ [dimensionless]

not be agreed upon by another. To add more scientific flavor to the analysis, solar engineers often turn to hypothesis testing—Dong et al. (2013) used the Kwiatkowski– Phillips–Schmidt–Shin (KPSS) test, and Ekstr¨om et al. (2017) used the augmented Dickey–Fuller (ADF) test to determine whether or not a clear-sky index time series is stationary. Unfortunately, both tests are tailored for detecting nonstationarity in the form of a unit root in the process (see Shin and Schmidt, 1992; Dickey, 2011), and are not suitable for detecting other forms of nonstationarity, such as that due to seasonality. Hence, if we are to use hypothesis testing, alternatives have to be sought. Stationarity in the present context can be related to distribution, that is, if the κ time series is stationary, then the conditional distributions of κ given different c values should be identical. To test the equality of continuous one-dimensional probability distributions, the two-sample Kolmogorov–Smirnov (KS) test comes in handy; here “two samples” refers to the fact that both probability distributions being compared are empirical distributions, as opposed to the one-sample case in which one of them is a theoretical distribution. Since clear-sky irradiance is a continuous random variable, binning is required. More specifically, the range over which c varies is divided into N nonoverlapping intervals, and all values that fall inside each interval take the value of the bin center. Subsequently, N conditional distributions of κ allow a total of C2N = N(N − 1)/2 KS tests to be carried out, each comparing a pair of conditional distributions. As an example, Fig. 5.9 shows the conditional distributions of κ given c, with N = 6, i.e., c1 ∈ [0, 200], c2 ∈ [200, 400], . . . , and c6 ∈ [1000, ∞) W/m2 . REST2

2.0

McClear

Ineichen−Perez

1.5 1.0 0.5 0.0

100

500

900

100 500 900 100 Clear−sky GHI, c [W/m2]

500

900

Figure 5.9 Visualization of the conditional distributions of κ given c, using the same data as Fig. 5.8.

The two-sample KS test has the test statistic:     Dn,m = sup F 1,n (κ) − F 2,m (κ) , κ

(5.15)

where n and m are the sizes of the two samples; F 1,n and F 2,m are the empirical cumulative distribution functions (ECDFs) of the two samples. For large samples,

A Guide to Good Housekeeping

155

the null hypothesis is rejected at level α if $ $ 1 α n+m Dn,m > − ln · . 2 2 nm

(5.16)

Clear−sky GHI bins [W/m2]

Without loss of generality, an α value of 0.05 is used in what follows. Based on the data shown in Fig. 5.8, KS tests are conducted, and the results are shown in Fig. 5.10. REST2

McClear

Ineichen−Perez

900 600 300 0

0

300

600

900 0 300 600 900 0 Clear−sky GHI bins [W/m2] KS test result

H0 retained

300

600

900

H0 rejected

Figure 5.10 Result of the two-sample KS test on the pairwise equivalence of conditional distributions of κ given c, using the same data as Fig. 5.8. Each matrix has a dimension of 21×21, which corresponds to 12 c bins: c1 ∈ [0, 50], c2 ∈ [50, 100], · · · , c21 ∈ [1000, ∞) W/m2 . In this case study, as there are ample data points (four years of 15-min data), N is taken to be 21 to ensure a refined binning, i.e., c1 ∈ [0, 50], c2 ∈ [50, 100], · · · , c21 ∈ [1000, ∞) W/m2 . Therefore, for each set of clear-sky index generated using REST2, McClear, or Ineichen–Perez model, a total of 210 KS tests can be conducted, and they correspond to the 420 symmetrical off-diagonal entries in each matrix in Fig. 5.10. The test results indicate that most tests rejected the null hypothesis, meaning that the conditional distributions are not equivalent. Exceptions take place at the neardiagonal entries, which implies that when the clear-sky irradiance values are close enough, the conditional distribution of the clear-sky index is also highly similar. This kind of phenomenon can be regarded as local stationary, under which the statistical properties of the time series change slowly with time (Das and Nason, 2016). In the case of clear-sky index time series, local stationarity is with respect to clear-sky irradiance, which in itself is a function of time. The hypothesis testing results are not in favor of the exact MSE scaling, since the clear-sky index is nonstationary in a wide sense. More specifically, Fig. 5.10 reveals that if the range of clear-sky irradiance of the verification samples is within 100 W/m2 , e.g., all clear-sky irradiance falls within the [700, 800] W/m2 interval, MSE scaling is valid. Considering the clear-sky irradiance at a mid-latitude site, which has a typical range of [0, 1200] W/m2 , one can at most expect MSE scaling to be an

156

Solar Irradiance and Photovoltaic Power Forecasting

approximation, that is,   MSE(r, x) ≈ E c2 · MSE(ρ, κ).

(5.17)

The degree to which this approximation is true is investigated next. 5.4.5

COMPARING DIFFERENT CLEAR-SKY MODELS

Since MSE(r, x) can be decomposed  multiplicative  terms under the exact  to three MSE scaling, this section compares 1 − γh2 , V(κ), and E c2 computed from clearsky index time series retrieved using different clear-sky models. The data of choice comes from the Surface Radiation Budget Network (SURFRAD), which consists of seven highest-grade ground-based monitoring stations—see Section 6.4.2 for station listing. Using 15-min GHI collected at all seven stations, over a four-year period of 2015–2018, the statistics are computed and shown in Table 5.3. Since γh depends on the forecast horizon (i.e., time lag), four h values, namely, h = 1, 3, 12, 24, are selected as examples, which, given the data resolution, translate to forecasting horizons ranging from 15 min to 6 h; these horizons are typically required for grid integration (Yang et al., 2021a, 2019).   Low MSE corresponds to high γh , low V(κ), and low E c2 . Then, one can see from Table 5.3 that the result is inconclusive, for none of the clear-sky models leads to dominating forecasting performance on all statistics. For instance, κ generated with REST2 at the BON station has the lowest γh , which does not possess any  advantage over the competitors, but the model also gives the lowest E c2 , which compares favorably against the other models. Another important observation from Table 5.3 is that the differences in the statistics across different clear-sky models are far less significant than those across locations, which may allow one to conclude that the choice of clear-sky model is only of marginal importance in solar forecasting, since much of the variation in forecasting performance is explained by the differences in locations, which may be interpreted as a proxy of climate, weather, and sky conditions. We end this section with an analysis of the validity of MSE scaling. Inequality 5.17 suggests two ways to compute the MSE of irradiance forecasts, one is to compute that directly, and the other is to first compute the MSE of clear-sky index forecasts and then scale it to the MSE of irradiance, by multiplying the sum of squared clear-sky irradiance values. Figure 5.11 shows the comparison of RMSEs calculated using these two ways. In each panel, RMSEs computed based on κ time series retrieved using the three clear-sky models of interest are annotated with different symbols, and there are seven sets of symbols, corresponding to the results at seven SURFRAD stations. It can be seen that the MSE scaling, as theoretically argued, is not exact, particularly for low-sun conditions. The result here further confirms the previous conclusion that the choice of clear-sky model has a negligible effect on forecast performance, for the data points in Fig. 5.11 are in clusters of threes. Next, it has been argued earlier that the exact MSE scaling happens only when the verification is performed over a smaller range of c values, such as 100 W/m2 . On this point, Fig. 5.12 repeats the analysis, but with only c ∈ [700, 800] W/m2 . Exact MSE

A Guide to Good Housekeeping

157

Table 5.3   Values of γh , V(κ), and E c2 using 15-min GHI measurements at seven SURFRAD stations, over 2015–2018. REST2, McClear, and Ineichen–Perez models are considered for clear-sky index retrieval. Stn.

Clear-sky model

γ1

γ3

γ12

γ24

V(κ)

  E c2

BON

REST2 McClear Ineichen–Perez

0.912 0.914 0.920

0.820 0.823 0.836

0.629 0.638 0.659

0.468 0.478 0.510

0.106 0.106 0.110

562 571 585

DRA

REST2 McClear Ineichen–Perez

0.858 0.858 0.868

0.681 0.683 0.706

0.442 0.459 0.485

0.334 0.359 0.381

0.050 0.050 0.047

616 625 657

FPK

REST2 McClear Ineichen–Perez

0.885 0.886 0.893

0.755 0.756 0.772

0.529 0.542 0.553

0.374 0.395 0.390

0.085 0.090 0.080

520 521 558

GWN

REST2 McClear Ineichen–Perez

0.904 0.904 0.913

0.805 0.806 0.823

0.617 0.623 0.650

0.462 0.469 0.504

0.107 0.106 0.106

589 599 619

PSU

REST2 McClear Ineichen–Perez

0.895 0.897 0.902

0.785 0.790 0.800

0.588 0.596 0.614

0.437 0.446 0.468

0.111 0.110 0.111

559 570 589

SXF

REST2 McClear Ineichen–Perez

0.922 0.922 0.930

0.827 0.827 0.844

0.634 0.639 0.670

0.471 0.476 0.516

0.102 0.103 0.106

544 549 569

TBL

REST2 McClear Ineichen–Perez

0.880 0.879 0.888

0.713 0.711 0.733

0.459 0.462 0.499

0.297 0.309 0.350

0.102 0.101 0.096

595 604 648

scaling is evident. The computation could be repeated for other ranges of c, such as c ∈ [400, 500] or c ∈ [900, 1000] W/m2 , but the same finding can be anticipated. 5.4.6

SECTION SUMMARY

Clear-sky models have cardinal importance in solar forecasting, since they can effectively remove the seasonality and diurnal cyclic components from the irradiance time series. To that end, a substantial number of pages have been allocated to the discussion pertaining to the choice of clear-sky model in solar forecasting. The main take-home points are as follows: • Even the best clear-sky models, such as the REST2v5 model, are unable to completely remove the zenith dependence, leaving a nonstationary clear-sky index time series.

158

Solar Irradiance and Photovoltaic Power Forecasting

h=1 90

E(c 2) ⋅ RMSE(ρ, κ) [W/m2]

h=3

140

85

130

80

120

75

110

70

100 70 75 80 85 90 h = 12

100 110 120 130 h = 24 200

180

180 160

160

140 120

140

160

REST2

140 180 140 RMSE(r, x) [W/m2] McClear

160

180

200

Ineichen−Perez

Figure 5.11 RMSEs of reference forecasts using the optimal climatology–persistence combination method on three κ time series retrieved using REST2, McClear, and Ineichen–Perez. 15-min data from seven SURFRAD stations, over 2015–2018, are used. The RMSEs are computed in two ways: (1) direct calculation (on the abscissa) and (2) reconstruction using MSE scaling (on the ordinate).

• The clear-sky index time series is nonstationary in a wide sense, but is locally stationary, and its statistical properties change slowly with clear-sky irradiance. • MSE scaling is able to approximately link the MSEs of irradiance and clearsky index, by a multiplicative factor that takes the value  of the sum of squared clear-sky irradiance. Mathematically, MSE(r, x) ≈ E c2 · MSE(ρ, κ). • The degree to which the MSE scaling is valid can be used as a gauge for the performance of the clear-sky model. If a clear-sky model performs well at a location, the difference between the directly calculated and the reconstructed MSEs is expected to be small. • All three clear-sky models investigated have good accessibility, since opensource code is available for REST2 and Ineichen–Perez models, and McClear clear-sky irradiance can be obtained via the SoDa web service.

A Guide to Good Housekeeping

159

h=1

h=3 160

110

E(c 2) ⋅ RMSE(ρ, κ) [W/m2]

100

140

90 120 80 80 210

90 100 110 h = 12

120 250

190

225

170

200

140 h = 24

160

175

150

150

130 130

150

170

REST2

190 210 150 175 200 225 250 RMSE(r, x) [W/m2] McClear

Ineichen−Perez

Figure 5.12 Same as Fig. 5.11, but only verification samples with c ∈ [700, 800] W/m2 are used for error calculation.

5.5

TEMPORAL ALIGNMENT AND AVERAGING OF DATA

Solar forecasting requires data of different types and from independent sources, such as irradiance measurements from ground-based radiometers, irradiance estimates from remote-sensing products, irradiance forecasts from NWP models, and power output measurements from power meters at PV plants. In a vast majority of cases, the temporal resolutions of various types of data do not match, and the temporal alignment and averaging of data become relevant. The task of aligning and averaging data is often perceived as trivial, and thus very few people actually spend effort in describing it. However, not paying close attention to data alignment and averaging can lead to adverse effects. Worse still, since open research is still not a common practice in solar forecasting, the mistakes and oversights related to data preparation often go unnoticed. In this section, two issues, which take place in virtually all solar forecasting works but are often omitted in the description, are used to exemplify the dangers of misaligned and wrongfully averaged data. 5.5.1

DATA ALIGNMENT AND IMPACTS ON FORECAST VERIFICATION

When two datasets of different temporal resolutions are used simultaneously, one can either average the higher-resolution data to match the lower-resolution data, or

160

Solar Irradiance and Photovoltaic Power Forecasting

interpolate the lower-resolution data to match the higher-resolution data. In forecast verification, for instance, the high-resolution 1-min ground-based measurements are averaged to hourly resolution, which is the typical resolution of NWP forecasts. NWP output can be either instantaneous or time-averaged. For example, the National Centers for Environmental Prediction’s (NCEP’s) Rapid Refresh (RAP) and HighResolution Rapid Refresh (HRRR) models output instantaneous (i.e., “snapshots”) forecasts, so a forecast with a time stamp of 12:00 would be the expected value at that time. On the other hand, the ECMWF’s High-Resolution (HRES) model issues time-averaged forecasts, in that, a forecast time stamp of 12:00 corresponds to the average value over the time period starting from 11:01 to 12:00. Given that the solar irradiance, and thus solar power, time series takes the shape of a bell over a day, there would be an offset between instantaneous and timeaveraged clear-sky values for the same hour. What this implies is that when a lowresolution data point is a snapshot in time, the averaging must take place both before and after the time stamp, e.g., for an RAP forecast made for 12:00, the corresponding ground-based irradiance needs to be aggregated from 11:31 to 12:30, such that the averaged clear-sky expectation can match that of the forecast. As for an HRES forecast made for 12:00, the corresponding ground-based irradiance needs to be aggregated from 11:01 to 12:00. One can also approach the issue from the perspective of aggregation schemes. Most generally, there are three aggregation schemes, namely, floor, ceiling, and round. A floor-aggregation scheme means that the average value of several highresolution data points within an interval is time stamped at the beginning of that interval, e.g., 1-min data from 11:00 to 11:59 would correspond to an hourly time stamp of 11:00 after floor aggregation. A ceiling-aggregation scheme takes the exact opposite of floor aggregation, that is, the average value of several high-resolution data points within an interval is time stamped at the end of that interval, e.g., 1-min data from 11:01 to 12:00 would correspond to an hourly time stamp of 12:00 after ceiling aggregation. Lastly, a round-aggregation scheme puts the time stamp in the center of an interval, e.g., the average value of 1-min data from 11:01 to 12:00 (or 11:00 to 11:59) would correspond to a time stamp of 11:30. In practice, floor-aggregation is seldom used, but ceiling- and round-aggregation are both common. Referring back to the RAP and HRES examples, roundaggregation must be used for the verification of the former, and ceiling-aggregation for the latter. To give a visual representation of the effect of different aggregation schemes, Fig. 5.13 overlays the averaged 1-min ground-based GHI measurements using floor, round, and ceiling operators, on August 4, 2020, at Desert Rock, Nevada, with the corresponding RAP and HRES forecasts on that day at that location. Since August 4, 2020, was a clear day at Desert Rock, both NWP models generated good forecasts. However, it is also evident from Fig. 5.13 that, when an incorrect aggregation scheme is used, the error can be quite substantial. For a one-year period (2020), the RMSEs of RAP forecasts are 118, 112, and 90 W/m2 when gauged on groundbased measurements under floor-, ceiling- and round-aggregation schemes, respectively; the numbers are 152, 71, and 98 W/m2 for HRES—the correct aggregation scheme gives much lower error than the incorrect ones.

A Guide to Good Housekeeping

RAP

1000

GHI [W m2]

161

Floor

750

Round

500

Ceiling

250 0

HRES

1000 750 500 250 0 00:00

06:00

12:00 Time

18:00

Figure 5.13 Effect of different temporal aggregation schemes. One-min GHI measurements on August 4, 2020, at Desert Rock (36.624◦ N, −116.019◦ W), Nevada, are averaged using floor-, round-, and ceiling-aggregated schemes, which are presented as dotted, dashed, and dotdash lines from left to right, in both sub-figures. The corresponding hourly NCEP’s RAP (top) and ECMWF’s HRES (bottom) forecasts for the same location and same day are represented using solid gray lines.

While aggregation differences for hourly intervals become obvious in graphs like Fig. 5.13, aggregation into shorter intervals may result in smaller differences that can easily escape the eye. Therefore it is also advisable for any irradiance dataset to plot the irradiance versus the solar zenith angle, where the solar zenith angle is calculated based on the time stamp. If on a clear day, the rising path (morning) of the plot does not coincide with the falling path (afternoon) of the curve, it is advisable to reexamine the assumptions on the time stamp. One particular advantage of this quality check is that it does not require any secondary data; rather it is a self-consistency check. The temporal alignment issue is also applicable to other gridded products, namely, satellite-derived irradiance and reanalysis. For instance, the NSRDB is a half-hourly satellite-derived irradiance product, which is derived primarily from the instantaneous cloud properties retrieved from NOAA’s Geostationary Operational Environmental Satellite. Given the instantaneous nature of this product, the roundaggregation scheme should be used. One related investigation has been conducted by Yang (2018a), who repeated the validation of the NSRDB, and found that the higher apparent errors reported originally by Sengupta et al. (2018) might be due to a dataaggregation error, and upon correction, the accuracy was significantly improved.

162

5.5.2

Solar Irradiance and Photovoltaic Power Forecasting

CLEAR-SKY IRRADIANCE AVERAGING AT SUNRISE AND SUNSET HOURS

Another question relevant to data averaging is this: Given an all-sky irradiance or PV power time series of an hourly resolution, how should one proceed with calculating the clear-sky expectation? It is believed that most solar forecasters would simply compute clear-sky expectations with inputs of hourly resolution, and there does not seem to be anything worthy of discussion. Take for instance an hourly GHI value representing the average condition between 11:01 to 12:00, it is logical to compute the hourly clear-sky GHI based on the average zenith angle of that hour, which would be a value approximately equal to the instantaneous zenith angle at 11:30, due to the averaging effect. Whereas this procedure is adequate8 for most hours of a day, and most days of a year, it is problematic for about tens of hours throughout a year, all of which are either sunrise or sunset hours. A sunrise hour is defined to be the hour in which the zenith angle turns from >90◦ to 90◦ for a sunrise or sunset hour, the resultant clear-sky expectation would be zero, but in reality, there is a small integration of energy over that hour. More specifically, if sunrise takes place in the later half of the sunrise hour, the average zenith angle for that hour would be >90◦ , resulting in an hourly clear-sky expectation value of zero; there are nevertheless irradiance values from the sunrise instance to the end of that hour, which integrate to some energy. Similarly, if sunset takes place in the first half of the sunset hour, the average zenith angle for that hour would again be >90◦ , resulting in an hourly clear-sky expectation value of zero; there are nevertheless irradiance values from the beginning of that hour to the sunset instance, which again integrate to some energy. Figure 5.14 provides a visualization of the situation, using a sunrise hour as an example, from which the case of the sunset hour can be readily inferred. Hourly ECMWF’s HRES GHI forecasts for September 7, 2020, at Bondville, Illinois, are considered. The corresponding clear-sky GHI is calculated using two approaches. One of those follows the “usual” way of computing the average zenith angle for each hour, and deriving the Ineichen–Perez clear-sky GHI (Ineichen and Perez, 2002) from there. The other approach first computes 1-min zenith angles, derives the 1min Ineichen–Perez clear-sky GHI, and then aggregates the 1-min values into hourly clear-sky GHI. From Fig. 5.14, it is apparent that the first approach causes an artifact at 6:00, whereas the second approach is not affected by the aggregation effect. The pitfall of computing clear-sky expectation once per hour is that the resultant clear-sky expectations can be zero at some sunrise and sunset hours, which would then lead to undefined clear-sky indexes at those hours, due to the division by zero. Expanding the situation further, even if the computed clear-sky expectations are not 8 Using the average zenith angle is not exact, as solar zenith angle and irradiance are not linear time series. If higher accuracy is desired then clear-sky irradiance for any hour of the day should be aggregated based on minute clear-sky model outputs as described below.

A Guide to Good Housekeeping

163

GHI [W m2]

200 150 100 50 0 00:00 200

06:00

12:00

18:00

800 600 400 200 0 00:0006:0012:0018:00

HRES forecast

800

150

Clear−sky from hourly data

600

100

Clear−sky from 1−min average

400

50

200

0

0 06:00

06:00 Time

Figure 5.14 Effect of clear-sky irradiance averaging at the sunrise hour. ECMWF’s HRES GHI forecasts for September 7, 2020, at Bondville (40.052◦ N, −88.373◦ W), Illinois, alongside Ineichen–Perez clear-sky GHI calculated from both hourly data and 1-min averages are shown. The panels are: (top-right) time series on that day; (top-left) zoom on the ordinate; (bottom-right) zoom on the abscissa; and (bottom-left) zoom on both the ordinate and abscissa, showing the clear-sky GHI calculated from hourly data is inadequate for the sunrise hour. zero at those hours, i.e., sunrise occurring in the first half of the sunrise hour or sunset occurring in the second half of the sunset hour, there would always be some discrepancies between the computed hourly clear-sky expectation and the actual value, due to the missing portion of irradiance during integration. This may cause extremely large clear-sky indexes, which, in turn, can lead to many kinds of problems during forecasting model fitting, parameter estimation, and forecast verification. In summary, the problem raised at the beginning of this section lends itself to two solutions: (1) computing a single hourly clear-sky index using hourly averaged inputs, and (2) computing sixty 1-min clear-sky indexes using 1-min inputs and averaging them subsequently. The latter approach should be strictly preferred, though it may require more effort in data processing. On a more general account, it is always worthwhile to check the temporal alignment and aggregation of data in solar forecasting, which may be done via a simple time series plot as demonstrated in Figs. 5.13 and 5.14. Furthermore, such a visual check should not be done just once; instead, it should be performed over several selected days, spanning the entire dataset of interest. Indeed, maintaining this kind of academic rigor is time-consuming but profoundly beneficial.

164

5.6

Solar Irradiance and Photovoltaic Power Forecasting

MAKING FORECASTS OPERATIONAL

In today’s solar forecasting literature, a vast majority of publications are motivated from a grid integration perspective, which implies that the proposed forecasting methods are intended to be used in conjunction with other grid-side operations, such as unit commitment in day-ahead markets (Yang and Dong, 2018), economic dispatch in real-time markets (Yang et al., 2019), or automatic load–generation balance at minute or sub-minute time scales using flexible resources for regulation purposes (Yang et al., 2022e). In the remaining cases, solar forecasting is motivated by off-grid applications, such as devising operation management strategies for hybrid energy systems with solar, diesel, and energy storage (Rodr´ıguez-Gallegos et al., 2021), or stabilizing and regulating variability in solar generation through coordinating electric vehicle charging schedule (Wang et al., 2019a). Although there are exceptions, such as those aforementioned references, most solar forecasting works tend to idealize and simplify the forecasting setup, by ignoring the real-life operational requirements and constraints. For instance, hourly solar forecasting is arguably the most studied time scale thus far, and thousands of algorithms can be deemed suitable, and in fact have been used, to generate 1-step-ahead forecasts with hourly data. Unfortunately, the practicality of this kind of research is quite restricted, because those hourly forecasts cannot be integrated into any power system operation in their existing forms, and thus any inference from the verified superiority in research papers to the expected real-life benefits would be utterly invalid. We should anticipate objections, at this moment, to our seemingly outrageous claim. Objectively, such objections may arise from the fact that forecasts as a time scale of 1 h are indeed needed by grid operators to perform real-time economic dispatch (Makarov et al., 2011). Subjectively, such objections may be raised just based on a “voters know best” mentality—if so many solar forecasters have found 1-stepahead hourly forecasting worth publishing, how can it be not useful? This contradiction between the overriding evidence from the literature and our current claim can be resolved if the context of operational forecasting is taken into consideration. In short, the phrase “forecast horizon” is ambiguous, and thus is unable to fully narrate the time specifications of an operational forecasting context. 5.6.1

TIME PARAMETERS IN OPERATIONAL FORECASTING

The phrase “forecast horizon” is almost always used interchangeably with “lead time,” not just in the solar forecasting literature, but in all forecasting domains; and it is customary to use h to denote forecast horizon. In the hourly forecasting context, 1-step-ahead forecasting with hourly data corresponds to a forecast horizon, or lead time, of h = 1. Whereas this definition is no problem by itself, one ought to realize that it is also possible to generate forecasts over the same horizon, by either aggregating or averaging multiple-step-ahead forecasts, using data of higher resolutions. For example, the forecast for the time stamp 1-h ahead of the forecast issuing time can be obtained by adding (or averaging, depending on whether or not the forecast variable is an accumulated field) two half-hourly forecasts produced using half-hourly data,

A Guide to Good Housekeeping

165

which correspond to h = 1, 2, or four 15-min forecasts produced using 15-min data, with h = 1, 2, 3, 4. Since all of these alternatives can be said to be 1-h-ahead forecasts, there is a high degree of ambiguity with using the h notation alone. This problem is known as multiple aggregation in the field of statistical forecasting (Kourentzes et al., 2014; Kourentzes and Petropoulos, 2016), and can be efficiently reconciled with a temporal hierarchy (Athanasopoulos et al., 2017; Yang et al., 2017c), of which the discussion is deferred to a later chapter. The situation is further complicated when other time factors are involved. It is well known that operational forecasts need to be submitted in advance, such that there can be a time window reserved for system operators to respond to and make decisions based on the forecasts. The size of this time window is usually not negligible as compared to the forecast horizon, and may not be an integer multiple of the data resolution. For example, if the 1-h-ahead forecasts are to be submitted half an hour before the operating hour, the horizon h = 1.5 is clearly problematic when only hourly data is present. One can subsequently argue to produce h = 2 forecast using hourly data, to satisfy the lead time requirement, but how to describe that forecast— should it still be called hourly forecasting—is again ambiguous. A third factor that complicates the description of the operational forecasting setup lies within its rolling nature. Regardless of whether it is in research or in practice, a forecasting procedure has to be repeated over a sufficiently long period of time, before any claim on its performance can be concluded. In most research settings, there is no overlap between forecasts, e.g., at time t, a forecast for t + 1 is made, then with the observation made at t + 1, a forecast for t + 2 is made, and so on and so forth. However, in multiple-step-ahead forecasting, the forecasts often overlap, e.g., at time t, forecasts for both t + 1 and t + 2 are made, then with the observation made at t + 1, forecasts for t + 2 and t + 3 are made, which leads to two different forecasts made for t + 2 at different lead times. All the above-mentioned time-related issues as in describing a forecasting setup are seldom thought about and thus seldom discussed. When these issues are discussed, the discussion is only textual, and the intelligibility of textual expressions depends on the writer’s level of English and her ability to organize the content of her manuscript. Speaking from experience, it is not at all uncommon that one has to spend hours trying to figure out the exact forecasting setup which the authors used, by cross-comparing textual descriptions scattered in different parts of an often very long paper. However, all these can be avoided if the right tool is used; a fourparameter symbolic representation of time-related matters in forecasting setup has been introduced by Yang et al. (2019); Yang (2020b), which is able to fully specify all time parameters involved in operational forecasting.   The representation uses a quadruplet of S , R f , L , U to denote span, resolution, lead time, and update rate, respectively. Forecast span S corresponds to the time between the first and last forecasts in a single forecast submission. R f denotes the resolution of the forecasts, where the subscript “ f ” stands for forecast, which is necessary since the data resolution needs not be the same as the forecast resolution. Lead time L marks the time difference between the forecast submission deadline to the first forecast. Finally, the forecast update rate U denotes the frequency at

166

Solar Irradiance and Photovoltaic Power Forecasting

which new submissions are to be made. Next, the actual operational forecasting requirements of two independent system operators (ISOs), namely, the California ISO (CAISO) and China Southern Power Grid (CSG), are considered, so as to exemplify the usage of the quadruplet. 5.6.2

CAISO’S AND CSG’S OPERATIONAL FORECASTING REQUIREMENTS

The generation scheduling process at CAISO has been fully explained by Makarov et al. (2010). Although the date of publication of that article is not recent, the operations therein described are unlikely to change drastically, and thus can still be used as a reference for operational solar forecasting studies. In brief, CAISO’s scheduling timeline can be divided into four main scales, based on the operations that take place in each scale. In the day-ahead market (DAM), CAISO performs dayahead load forecasting, which in turn results in an hourly block energy schedule out to 3 days, with a 20-min ramp between hours. In the real-time market (RTM), CAISO performs short-term unit commitment (STUC) and real-time economic dispatch (RTED). STUC has a time granularity of 15 min and needs to be planned 75 min before the operating hour, whereas RTED is provided 7.5 min before the dispatch operating target (DOT) at a 5-min resolution. On top of that, the STUC is conducted hourly, and in each iteration, a 5-h look-ahead is provided, whereas RTED is conducted every 5 min, with a 65-min look-ahead. Finally, the real-time regulation with automatic generation control (AGC) takes place every 4 s in a non-overlapping fashion. Figure 5.15 summarizes these operations. lead time L

resolution R f

span S

update rate U

4s AGC/4 s DOT submission 7.5 min 5 min

65 min RTED/5 min

RTM close 75 min

15 min

5h STUC/1 h

DAM close 14 h

1h

72 h DAM/1 d

Figure 5.15 A schematic diagram depicting the timelines of various power system operations in CAISO. (Figure is not drawn to scale.) CAISO’s generation schedules imply that solar forecasts, if they are to be involved in various scheduling processes, ought to conform to the time requirements.

A Guide to Good Housekeeping

167

Using the quadruplet introduced earlier, the kind of solar forecasts that can be used  65 min for RTED is S , R 5f min , L 7.5 min , U 5 min , which means that thirteen 5-min forecasts need to be submitted 7.5 min before the operating time window, and the submission needs to be repeated   every 5 min. Similarly, for STUC, one can use min , L 75 min , U 1 h to denote the kind of solar forecasts that can be of S 5 h , R 15 f   use; for DAM, the quadruplet is S 72 h , R 1f h , L 14 h , U 24 h ; and for AGC, the   quadruplet is S 4 s , R 4f s , L 0 s , U 4 s . It is immediately obvious that the quadruplet can completely remove all ambiguities, inasmuch as the time parameters of a forecasting setup are to be described. Besides ensuring transparency in description, the quadruplet is also able to make the presentation concise. Here is an example demonstrating how easy it can be: Yang et al. (2021a) provided a review on the grid code issued by China Southern Power Grid (CSG) alongside other ISOs of China, in which requirements on operational solar forecasting  are listed. More specifically,  min , L 12 h , U 24 h CSG demands two kinds of solar forecasts, namely, S 24 h , R 15 f   4 h 15 min 15 min 15 min for RTM. ,L ,U for DAM and S , R f 5.6.3

VERIFICATION OF OPERATIONAL FORECASTS

Whereas the methodology for forecast verification is the theme of Chapters 9 and 10, this section reveals some time-related complications that may arise during the collection of verification samples. Foremost is the fact that verification only makes sense when there is a large-enough pool of forecast–observation pairs. If the pool is too small, i.e., it contains just a few samples, the verification outcome is neither representative nor reliable. Typically, one year of hourly data is regarded as a minimum requirement for the error statistics to be considered as representative (Mathiesen and Kleissl, 2011; Perez et al., 2013b). However, if the data resolution is high, such as sub-minute forecasting, conducting verification over just a few days may be deemed sufficient (see Chow et al., 2011; Yang et al., 2015b, for example). This is because the number of 1-min forecast–observation pairs in a few days is comparable to, if not more than, the number of hourly forecast–observation pairs in a year. Forecast verification examines the correspondence between forecasts and observations either through accuracy measures, of which the framework is said to be measure-oriented, or through the joint distribution of forecasts and observations, of which the framework is said to be distribution-oriented (Murphy and Winkler, 1987). In either case, forecast–observation pairs need to be prepared. Since the forecasts consisted in a single forecast span S are fewer than the samples in a verification pool, the latter needs to be constructed by rearranging forecasts produced at different forecast issuing times. To elaborate, consider two forecasting tasks with specifications:     and S 24 h , R 1f h , L 1 h , U 1 h , (5.18) S 24 h , R 1f h , L 1 h , U 24 h

168

Solar Irradiance and Photovoltaic Power Forecasting

which differ only in terms of U . Whereas task one is the kind of day-ahead forecasts typically used in a time series setting (e.g., see Yang and Dong, 2018), task two is the setup of NCEP’s RAP and HRRR models (Zhang et al., 2022). If both forecasting tasks were run to cover a one-year period over which verification is to be performed, the first forecasting task would issue 24 × 365 = 8760 forecasts, one for each hour of the verification with no overlap, but the second forecasting task issues many more forecasts: Forecast issuing time

Forecast time stamp

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

day 1, 00 day 1, 01 day 1, 02 .. . day 1, 23 day 2, 00 day 2, 01 .. . day 365, 00 day 365, 01 .. . day 365, 23

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨

day 0, 23 day 1, 00 day 1, 01 · · · day 364, 23 h=1 h=2 h=1 h=3 h=2 h=1 .. .. .. .. . . . . h = 24 h = 23 h = 22 h = 24 h = 23 h = 24 .. . h=1 h=2 .. .. . . h = 24

(5.19)

In (5.19), the time convention is the day number followed by the two-digit hour of the day, i.e., “day x, HH.” One can see that the verification sample pool of task one is in fact just a single long time series, whereas the pool of task two consists of many short and overlapping time series. Although the two forecasting tasks in (5.18) can both be described as “24-h-ahead forecasts over a year,” forecast errors computed based on all-sample verification are not comparable. Furthermore, in task two, since each observation corresponds to multiple forecasts, it leads to numerous ways of rearranging forecasts into verification groups, e.g., by nonoverlapping 24-h blocks (as in Perez et al., 2013b), or by h values in (5.19) (as in Zhang et al., 2022). The above example elaborates the complication in verification as caused by just the forecast update rate U alone, notwithstanding other members of the quadruplet, namely, S , R f , and L , can lead to other complications of a like nature. This type of time-induced ambiguity exists in bulk quantities in the literature. During the literature review, some forecasters tend to lump RMSEs reported in different papers together, for comparison purposes, without explicitly factoring in these time-induced ambiguities (e.g., Blaga et al., 2019), which is a very bad practice. Care must be taken when interpreting verification results, especially when the samples come from an operational forecasting procedure, which circles back to the motivation for the invention of the quadruplet.

6 Data for Solar Forecasting “One of my biggest problems has been the uncertainty and systematic errors in the base data. That has prevented me from deriving concise models that incorporate systematic biases in the analysis. Forecasting is a growing field and has economic benefits. But there is a need to show people how to use the information and develop ways to facilitate its use. Also one should clearly identify the limits and problems.” — Frank Vignola

It is customary in science to regard information as “data,” from which knowledge is inferred and then transformed into a justified course of action during decisionmaking. In forecasting, the sort of inference which forecasters make is not just concerned with the present but also the future. It follows that the validity of any decision reached based on such inference procedures may be affected by three main factors: (1) the validity of the inference itself, (2) the dissimilarity, or non-uniformity, between the current and future states of affairs, and (3) the quality of the data. Forecasters only build models that are aligned with their preconceived theories or beliefs. These theories or beliefs are nothing but laws of association and of “learned reaction” acquired over time. Stated simply, the inference procedures all reduce to the form: Based on data D, model M is thought appropriate in accordance with the previously observed facts. To reveal the potential danger of such procedures, consider this analogy: Any rational person would have no trouble in discriminating what is and what is not edible among the items on a dining table, in contrast, a toddler may pick up anything and put it in the mouth. The same visual appearance of the items on the table triggers different reactions—the adult could associate various stimuli to their taste and thus decide what to pick up, whereas the toddler could not. Similarly, because the amount of forecasting experience differs, even if the same data is presented to them, the validity of the models selected by an expert and a novice would most certainly be different. However, a good sense of modeling choice, it is believed, can be trained over time; thus, we may ignore, temporarily, this complication on the validity of inference. Change is the only constant. What has been shown to be able to explain the past and present might not be capable of describing the future, and factors that can cause the state of affairs to evolve could be many. There is no doubt that forecasters usually attribute a significant percentage of their time to the so-called exploratory analysis, with the goal of identifying useful cause-and-effect relationships between the data and the state of affairs. This does not, however, preclude the possibility of overlooking certain important ingredients that contribute to the non-uniformity of nature. Due to this fact, the operational dynamical weather models, as an example that we have seen earlier, are constantly being updated for optimal performance. DOI: 10.1201/9781003203971-6

169

170

Solar Irradiance and Photovoltaic Power Forecasting

Hence, there is a practical difficulty, a rather obvious one, in making inferences about the future—one can never be sure how far into the future the current model could be deemed adequate. Unfortunately, we, as forecasters, have no option but to assume that past situations resemble the future insofar as rationality holds. On this point, the long-term representativeness of the data becomes all the more critical. In the elaboration of arguments for and against the use of some particular dataset, one central point has hitherto been that concerning the “representativeness” of that dataset. Similar to “goodness,” as discussed in Section 2.3, “representativeness” is another abstract concept that may benefit from invoking a philosophical thinking tool—reduction. It is herein argued that the representativeness of a dataset, as to its use in solar forecasting, should be assessed on three fronts: (1) coverage, (2) consistency, and (3) accessibility. Coverage can be subdivided into temporal coverage, spatial coverage, climate coverage, and sky-condition coverage; consistency constitutes that as with forecast horizon, with time scale, with space scale, and with forecaster’s internal judgment; and accessibility refers to how easy the data, as well as similar data of its kind, can be obtained. Aside from coverage, consistency, and accessibility, some other fundamental properties of a representative dataset may be thought of, for instance, bankability1 (Vignola et al., 2012). This latter property is, rather than forecasting, more concerned with solar resource assessment, which is another major branch of solar energy meteorology. The above introduction to the representativeness of a dataset reveals several topics needing discussion. The present chapter is, therefore, structured according to these topics. Three major types of solar irradiance data—ground-based measurement, remote-sensing data, and dynamical weather model output—are introduced and their complementarity is explained in Section 6.1. The three elements of representativeness of a solar forecasting dataset are reiterated, but with more details, in Section 6.2. Finally, a catalog of the most popular datasets relevant to solar forecasting research and practices is presented in the remaining sections.

6.1

COMPLEMENTARITY OF SOLAR DATA

Surface solar radiation data, just like the cases of many other atmospheric variables, comes from three complementary sources: First, irradiance may be measured by ground-based radiometers and reference cells; the second source is remote sensing data, from which satellite-derived irradiance databases are produced; and the third is the output of dynamical weather models. The characteristic differences among these three sources of data, for the most part, are in terms of their accuracy, assessibility, coverage, and resolution. It follows that the three sources of solar irradiance data are complementary, and one needs to seek a balance between these desirable yet incompatible characteristics. 1 The

essential quality of a solar radiation dataset that enables it for securing competitive financing for solar power projects.

Data for Solar Forecasting

6.1.1

171

GROUND-BASED INSTRUMENTATION

A ground-based radiometer or a reference cell measures surface solar irradiance at a point location with typically a 1-s sampling frequency. Ground-based measurement is the most accurate source of information, subject to the precondition of proper instrumentation and maintenance. One should note that the highest compliance standard to this precondition is far from being trivial to achieve, but only can come at a cost of about $200,000 USD, which is the setup cost for a research-grade radiation monitoring station, see Table 6.1 (Yang and Liu, 2020). Adding to the situation is the delicate science involved in performing radiometry, which has only been mastered by a tiny fraction of solar engineers—the reader can expect the most comprehensive account of radiometry from the book by Vignola et al. (2020) which, being the only one on the market, is certainly the most authoritative reference on the subject. Besides, for very high-caliber radiation monitoring programs such as the Baseline Surface Radiation Network (BSRN), the requirements for site selection and maintenance are so stringent that very few stations can fulfill the requirements. Photo 6.1 shows the setup of the Xianghe radiometry station, which became a BSRN station in 2005, but was closed in 2016, due to the unforeseeable and unstoppable urban expansion with tall buildings surrounding the station, which shaded the east side of the Xianghe station and disqualified it as a BSRN station. Table 6.1 Price breakdown of a Baseline Surface Radiation Network (BSRN) station. Item

#num

Unit price

Tot. price

Note

Pyranometer Pyrheliometer Pyrgeometer2 Sun tracker Ventilation unit Housing

7 3 5 2 6 2

7,000 3,000 6,000 25,000 1,500 5,000

49,000 9,000 30,000 50,000 9,000 10,000

Computer Miscellaneous

2 —

1,500 —

3,000 4,000

3 in the field, 4 spare 1 in the field, 2 spare 2 in the field, 3 spare 1 in the field, 1 spare 5 in the field, 1 spare 1 set of electrical cabinet with a data logger, power supplies, data transmission hardware in the field, 1 set spare 1 in service, 1 spare Mast for solar tracker, mounting material, cables

Total price





164,000



Note: All prices are in Euro, e. Source: Pers. comm. with Bernd Loose, Alfred-WegenerInstitut, 2019.

For these reasons, the density of ground-based radiation measurement stations is extremely sparse as compared to those of more fundamental weather variables, such 2A

pyrgeometer is a device that measures infrared (longwave) radiation

172

Solar Irradiance and Photovoltaic Power Forecasting

AERONET NET

BSRN RN

Figure 6.1 The BSRN station setup at Xianghe, China. The “station” refers to the rightmost group of instruments in the photo. [And the small tilted cylinder in the middle—a sunphotometer—is an Aerosol Robotic Network (AERONET) station, which measures aerosol optical depth at different wavelengths.] Photo courtesy of Xiang’ao Xia, Chinese Academy of Sciences, China.

as temperature or humidity—e.g., there are about 100 radiometry stations operated by the China Meteorological Administration, however, measurements of the other basic meteorological variables, such as temperature, humidity, pressure, or wind speed, are available at more than 50,000 (manned and unmanned) stations (Yang et al., 2022d). One cannot expect accurate solar radiation measurements by purchasing a 20-dollar photodiode from the local electronics store. The instrument that measures global and diffuse radiation is called pyranometer, whereas that measures beam radiation is called pyrheliometer. Pyranometers can be installed on horizontal and tilted surfaces to collect global horizontal irradiance (GHI) and global tilted irradiance (GTI), respectively. When a shading ball or a rotating shadowband is installed together with a horizontally installed pyranometer, shading the thermopile from direct sunlight, the instrument takes diffuse horizontal irradiance (DHI) readings. On the other hand, pyrheliometers must be installed on sun-trackers that follow the apparent movement of the sun for beam normal irradiance (BNI) measurements. The measurement uncertainty of both types of instruments varies considerably as a function of sun position, sky condition, and other meteorological conditions (Habte et al., 2016; Ferrera Cobos et al., 2018), but for practical simplicity, one can take the expanded uncertainty of best-quality GHI, DHI, and BNI measurements to be around 5.4%, 5.0%, and 2.0%, respectively, at a 95% confidence interval (Gueymard, 2009). Despite that there are radiometers that can measure both GHI and DHI simultaneously, and thus allow one to calculate BNI through the closure equation, the best-practice recommendation is to measure all three components individually for quality control purposes. As first pointed out in Michalsky et al. (1999) and reiterated by (Gueymard and Myers, 2009), the “optimal” way of pyranometry is the

Data for Solar Forecasting

173

component sum methodology, which constructs GHI using DHI and the horizontal projection of BNI (see Section 5.3). Aside from thermopile instruments, for photovoltaic (PV) system monitoring, control, and performance evaluation, it is customary to install a reference cell at the PV array tilt. Reference cells have a different spectral response, time response, temperature response, and angle-of-incidence response from the thermopile radiometers, and thus warrant post-measurement correction (Meydbray et al., 2012); in that, reference cells are suboptimal devices for radiometry. However, they may have the advantage of being capable of mimicking the spectral response of a PV panel, which is advantageous for estimating the effective irradiance on the PV inclination and the resulting PV power. A rather inconvenient fact about ground-based measurement is that even if the current best practices are followed, the uncertainty in the base data3 is still substantial—this echoes the quote by Frank Vignola at the beginning of this chapter. Solar forecasters, unfortunately, do not have any other fancy trick up their sleeves but assume these ground-based measurements as the indisputable “truth” during forecast verification. Therefore, the verification results should always be subjected to questioning as to the validity of the base data. Needless to say, whether data is used as input or for validation of solar forecast models, quality control of irradiance data must constitute an essential part of any solar forecasting work, as discussed in Section 5.3. 6.1.2

REMOTE SENSING

Suppose one is willing to trade part of the accuracy of base data with coverage, the option of satellite-derived irradiance presents. Infrared and visible images taken by geostationary and polar-orbiting satellites are an excellent source of information when it comes to sensing the top-of-atmosphere (TOA) reflectance. These images, which contain rich information on a wide range of atmospheric phenomena including clouds, can be subsequently converted to irradiance values by means of a cloud-toirradiance algorithm, in cooperation with other remote-sensed information such as aerosol properties or surface albedo. This process of deriving values of certain atmospheric parameters from satellite images is known as retrieval; sometimes the word “retrieval” also means the retrieved value itself. Because several geostationary weather satellites jointly provide a field-of-view that completely covers the lowand mid-latitude regions of the globe (±60◦ in latitude, see Fig. 6.2), remote-sensed irradiance retrievals are gridded, i.e., on a regular lattice. For high-latitude regions, where images of geostationary satellites do not resolve, retrievals from polar orbiters can be used, but due to the orbiting nature of those satellites, the temporal resolution of such data is lower (days to weeks) than that from geostationary satellites (subhourly). To help visualize the sensing procedure of polar orbiters, Fig. 6.3 shows the aerosol optical depth (AOD) data retrieved from the Multi-angle Imaging SpectroRadiometer (MISR) onboard the Terra satellite, with a 380-km swath and a 16-day 3 The term “base data” herein refers to those reference data on which forecasts and verification are based. It is the antonym of “modeled data.”

174

Solar Irradiance and Photovoltaic Power Forecasting

GOES−West

GOES−East

Meteosat Meteosat−East

Annual GHI [kWhm−2]

Himawari

500 1000 1500 2000 2500

Figure 6.2 Several geostationary satellite series jointly cover the low- to mid-latitude areas of the earth. (Other geostationary satellite series, such as Fengyun, are not shown for brevity.)

repeat cycle of its ground-track (Garay et al., 2020). From the left panel of Fig. 6.3, the MISR swaths can be clearly seen, and the right panel shows that the area of the globe can only be completely covered after a few days, substantiating the low temporal resolution of the data. 2019 January 1

AOD550

2019 January 1−16

0.1 0.2 0.3 0.4 0.5

Figure 6.3 Multi-angle Imaging SpectroRadiometer (MISR) level-3 aerosol optical depth at 550 nm (AOD550). (Left) 2019 January 1: the 14.56 orbits made during the day are visualized. (Right) 2019 January 1–16: data collected during a complete repeat cycle of the satellite’s ground-track is shown. Holes and white patches are missing data.

Satellite-derived irradiance data may be prone to bias and under-dispersion,4 and thus requires correction. The correction procedure is known as site adaptation, which is a procedure identical to that of measure–correlate–predict (MCP) as used in wind engineering; the difference is mostly in terminology. The central aim of site 4 Dispersion is a general notion of spread. In solar resource assessment, the spread of data is quantified through variance. Hence, saying that satellite-derived irradiance is under-dispersed is equivalent to saying that the variance of satellite-derived irradiance is smaller than that of ground-based observations. This is also related to the spatial scale of satellite-derived irradiance, which is larger than that of ground-based stations.

Data for Solar Forecasting

175

adaptation is to correct a long period of satellite-derived irradiance with respect to a short period (typically one year) of ground-based measurements. Until 2020, siteadaptation was perceived as a deterministic procedure (see Polo et al., 2020, 2016, for reviews), in that, analogous to a deterministic forecast, the site-adapted irradiance is simply a number rather than a predictive distribution. The works of Yang (2020c,d) extended such deterministic procedure into the probability space, which has henceforth allowed formal uncertainty quantification of the site-adapted irradiance in a similar manner as to how probabilistic forecasts have been handled. That said, there has been significant progress over the past decade in respect of the science of irradiance retrieval algorithms. As a result, researchers not only have started to advocate the dismissal of site adaptation during solar forecasting (Andr´e et al., 2019), but the prospect of using satellite-derived irradiance to replace completely the in situ measurements is also gaining traction. Denoting the satellitederived irradiance and ground-based measurements by S and G , an unphilosophical way to express their identity could be: If S and G are identical, whatever can be achieved with one is achievable with the other, and either may be substituted for the other without altering the conclusion about a proposition. In solar forecasting, we are concerned with three major tasks when it comes to univariate applications: (1) forecast generation, (2) forecast post-processing, and (3) forecast verification. Therefore, if one wishes to make an assertion about the identity between the two sources of data—in a univariate setting, of course—one must proceed to investigate the performance of S and G in terms of those three applications. Yang and Perez (2019) were most likely the first to investigate this topic. They delivered several case studies on forecast verification, in which both S and G were used to gauge the accuracy of the same set of numerical weather prediction (NWP) forecasts. To ensure the conclusion is objective and is not affected by subjective choices of error metric, the Murphy–Winkler distribution-oriented verification framework, which assesses various aspects of forecast quality based on the joint distribution of forecast and observation, were employed. Motivated by that work, the exact procedure of investigating the identity of the two sources of data, based on the Murphy–Winkler framework, was transferred by Yagli et al. (2020a) to a different application—generating univariate forecasts using both S and G . The third piece of the puzzle—forecast post-processing—was put into place by Yang (2019d), who post-processed several sets of NWP forecasts using both S and G . Unfortunately, the overwhelming consensus of these three works rejects the hypothesis that the two sources of data are interchangeable, in the strictest sense, that is, in terms of every aspect of quality outlined in the Murphy–Winkler framework. Similar results have also been obtained, at later dates, by other research groups (e.g., Jim´enez et al., 2022). Notwithstanding, the difference in forecasting, post-processing, and verification results caused by using satellite-derived irradiance in place of ground-based measurements is small relative to the forecast error, which offers some support to interchangeability in practice, with appropriate caveats. Particularly in the case of areal forecasting, satellite-derived irradiance is the only option when it comes to post-processing of NWP forecasts; this has been demonstrated by Watanabe et al. (2021).

176

6.1.3

Solar Irradiance and Photovoltaic Power Forecasting

DYNAMICAL WEATHER MODEL

The third source of irradiance data is dynamical weather models, which can be further split into two kinds, one is NWP and the other is reanalysis. In essence, NWP and reanalysis have no difference, except that the former is operational and issues forecast out to longer horizons whereas the latter deals with history and usually issues “forecasts” out to 12 h with a “frozen” model.5 As compared to ground-based measurements and satellite-derived irradiance, NWP and reanalysis fall short on accuracy and spatio-temporal resolution. They are, nevertheless, useful in at least two other aspects, in addition to the obvious one, which, of course, is to issue forecasts. The first of the two aspects is that the datasets generated by dynamical weather models often span, temporally, several decades, and geographically, the entire globe. In spite of that some satellite-derived irradiance databases, as we shall see shortly after, have global coverage, the majority of such databases are available only over the disk area corresponding to the field-of-view of one particular satellite. Moreover, when raw data captured by different satellites are merged, it often brings about artifacts at the boundaries, as if the images are stitched together very poorly, see Fig. 6.4 for one illustration.6 On that note, if one desires a high degree of spatial and temporal continuity in worldwide irradiance data, reanalysis would no doubt be the only choice that is currently available. The other aspect is that both NWP and reanalysis offer a fortune of output variables, which are all self-consistent in accordance with the physical laws of the atmosphere. That is something that can be hardly achieved by any synthetic datageneration method, and therefore possesses great value for studies that desire to examine such interactions among those variables. Forecasters, not just the solar ones, are profoundly attracted to the idea of using explanatory variables to account for the variable of interest. However, a satellite-derived solar radiation dataset seldom goes beyond offering the three main irradiance components and their clear-sky expectations, but leaves out other important determinants of PV power generation, such as temperature, surface wind, or aerosols. Therefore, NWP and reanalysis, providing forecasts or historical hindcast data of those ancillary information in a centralized manner, must be commended.

6.2 6.2.1

WHAT IS A REPRESENTATIVE DATASET? COVERAGE

Solar forecasters are concerned with the coverage of a dataset because a few seemingly accurate forecasts, made at a specific location and over a specific time period, need to be applied at more locations and conditions before they can be claimed to 5 Recall in Section 3.3.1, we have explained what is a “frozen” dynamical weather model, which is one that has a particular fixed setup, as opposed to the operational ones that are constantly modified to improve weather forecast accuracy. Since weather forecast accuracy is primarily measured by extreme weather events, the impact of a modification on solar forecast accuracy can be good or bad. 6 This discontinuity problem has persisted for more than two decades, and no one seems to have given any care to it thus far.

Data for Solar Forecasting

177

2

RMSE [W/m ]

[10.3,19.4] (19.4,22.8] (22.8,26.4] (26.4,29.1] (29.1,31.5]

(31.5,33.4] (33.4,35.5] (35.5,37.4] (37.4,39.2] (39.2,41]

(41,43] (43,44.8] (44.8,46.5] (46.5,48.5] (48.5,74.6]

Figure 6.4 Root mean square error of a 1-h-ahead na¨ıve reference forecasting method, over a period of one year, based on the CERES satellite-derived irradiance data. Artifacts at the boundaries of the fields-of-view of five different satellites are clearly visible.

be useful in decision-making. Section 2.10 suggests that science is entirely about inferring general laws from specific facts. On this point, if a forecaster needs to pitch to others the generality and therefore value of her method, gathering a reasonable amount of specific facts is essential. Moreover, it is not just the amount that matters, but the miscellany of those specific facts also weighs in. At the bare minimum, solar forecasts need to be made and verified at a handful of climatically diverse locations over at least a year,7 in order to demonstrate any potential for the method to be useful. Most of the existing works on solar forecasting, thankfully, conform to this guideline. Indeed, the generality of a solar forecasting method in geography is commonly ensured by selecting datasets with different K¨oppen–Geiger climate classifications (KGC). The main assumption with KGC is that different regions in a similar class share common climatic characteristics. So instead of testing a forecasting method at every location on earth, which would be impossible, one can now identify a location from a class, test the method there, and assume the behavior of the method can be extrapolated to other regions in that class. However, climates in KGC are determined based on seasonal temperature and precipitation, which have only a marginal faculty in discerning solar irradiance processes with different predictability (Liu et al., 2023; Yang, 2022b). We know, now, with high certainty, that irradiance in desert and semi-arid climates is easier to forecast than that in the temperate and continental cli7 As argued in Section 5.6.3, this guideline applies to hourly data. For data with higher temporal resolutions, such as 1-min data, the temporal coverage can be relaxed accordingly, e.g., to tens of days.

178

Solar Irradiance and Photovoltaic Power Forecasting

mates, which is easier to forecast than irradiance in the tropics, but other than that, the information acquired thus far about the predictability of solar irradiance has been quite limited (Yang et al., 2021c; Voyant et al., 2021). On account of that, AscencioV´asquez et al. (2019) has proposed the so-called K¨oppen–Geiger–photovoltaic (KGPV) climate classification to tone up KGC with particular inclusion of solar radiation, with the hope that the new classification could improve the comparability needed during performance evaluations of PV systems. Although KGPV was originally proposed for resource assessment purposes, it might be extended to forecasting. Equally important to the diversity in geography is the diversity in time. The most challenging aspect of solar forecasting is the identification of ramp events, during which the irradiance at a single point can swing up or down a few hundred W/m2 in a matter of seconds. In that, forecasters need to ensure their methods are adequate not only under the more stable clear and overcast skies but also under skies with moving and broken clouds. Arguments given to seasonal variations are similar in that respect, because forecast performance can vary largely at different times of a year. Although a vast majority of solar forecasting works, at present, do not segregate results based on seasons or sky conditions, it would not be difficult to find works that do (e.g., Yagli et al., 2019a; Huang et al., 2018; Lima et al., 2016). 6.2.2

CONSISTENCY

In meteorology, the notion of scale is vital. According to Bl¨oschl and Sivapalan (1995), the word “scale” refers to a characteristic length of a process, an observation, or a model. Formally, scale differs from extent, resolution, or support, on their individual account, but it would not be imprecise to think of them as the three connected elements constituting the one thought. In other words, a small scale often implies a small extent, high resolution, and short support, and a large scale implies a large extent, low resolution, and long support. Scale is not merely used to describe the division of space but also of time. Scale often couples the characteristic length of space with that of time. One well-accepted subdivision of scale was set forth by Orlanski (1975), in which three main categories of space scale8 were defined—macroscale (>2000 km), mesoscale (2–2000 km), and microscale (2 wk), synoptic and planetary scale (2 wk–1 d), mesoscale (1 d–1 h), and microscale ( 1.1. In another case, Perez et al. (2002) used an even simpler option: g(ν) = 0.02 + 0.98(1 − ν).

(7.42)

Clear−sky index, κ

Figure 7.11 compares these two functions. 1.2

Heliosat

0.9

Perez

0.6 0.3 0.0 0.0

0.5 Cloud index, ν

1.0

Figure 7.11 Heliosat’s piecewise function linking cloud index, ν, to clear-sky index, κ. The cloud index ν over a particular pixel is derived as (Rigollier et al., 2004): ν=

ρ − ρclr , ρmax − ρclr

(7.43)

where ρ is the reflectance (or apparent albedo) observed by the satellite for that pixel of interest, ρclr is the reflectance of the ground under clear skies, and ρmax is the reflectance of the brightest clouds. By the word “apparent,” we mean a cosinecorrected value of the observed radiance (see Rigollier et al., 2004; Perez et al., 2002). The concerning question here is that one has to carefully estimate the two

262

Solar Irradiance and Photovoltaic Power Forecasting

parameters from satellite observations representing the two extreme situations, that is, ρmax and ρclr , because they cannot be directly observed in real-time. The practical procedure to obtain ρmax and ρclr is to indirectly estimate them from ρ within a time window, which is typically taken to be a few weeks. Such historical data provides information on the so-called dynamic range, which represents the range of values a normalized pixel can assume at a given location from its lowest to its highest value. The dynamic range is conceptually simple, but its practical implementation requires some thought. First, the reflectance depends on the sun–pixel– satellite angle, which varies significantly with the time-of-day and varies slightly over consecutive days. Second, land cover changes also change the reflectance. The largest change of that kind stems from snow cover, but the seasonal growth of vegetation and changes in soil moisture can also affect the reflectance. Third, some places on earth rarely experience cloudless skies, and surface reflectance cannot be determined. Fourth, the reflectance measurements decrease over months to years due to satellite calibration decay. Therefore, most semi-empirical satellite methods (1) track a separate minimum reflectance for each time of day; (2) leverage independent snow cover datasets (Perez et al., 2010b) or other satellite data to adjust the methodology over snow cover;13 (3, 4) measure the minimum reflectance over a time frame of one to two months to ensure that there is at least one instance of clear conditions while minimizing seasonal changes in surface reflectance and sun–pixel–satellite angle. In any case, a wide range of methods for determining ρmax and ρclr is available from the existing literature, such as using a single value for the whole region of interest (Valor et al., 2023) or using the mean of several (10–20) highest values (Harty et al., 2019; Chen et al., 2022). There does not seem to be at the moment any concrete evidence suggesting some approaches are superior to others. The data-driven nature of semi-empirical models allows one to fine-tune the model, which can be accomplished by considering region-specific ground-based data. Two possibilities seem immediately obvious. One of those is to refit the model coefficients such as the ones in Eq. (7.41) using local data, which is what Huang et al. (2023) have done. The other option is to perform site adaptation, which constitutes a post-processing procedure (Fern´andez-Peruchena et al., 2020; Polo et al., 2020). Besides those, another style of refinements of semi-empirical models is downscaling in complex terrain (Ruiz-Arias et al., 2009), and geometrical corrections for satellite navigation errors and pixels close to the rim of the satellite disk. Be that as it may, one should be aware that semi-empirical models can only retrieve GHI but not the beam and diffuse components. Instead, BNI and DHI are calculated with a separation model (or decomposition model), which splits the satellite-derived GHI into two additive parts governed by the closure relationship. The separation models of choice in the satellite-based solar resourcing community are the DIRINDEX model (Perez et al., 2002) and the DISC model (Maxwell, 1987), both being hourly models due to legacy reasons. 13 Snow cover can cause the lower bound to be close to the upper bound, greatly limiting the dynamic range associated with clouds. Therefore, both the SolarAnywhere and SolarGIS models abandon the concept of Eq. (7.43) and instead determine CI from an empirical model based on the satellite infrared channels.

Base Methods for Solar Forecast Generation

263

Semi-empirical models fall in between physical models and purely data-driven (or fully empirical) models. Purely data-driven models directly regress the satellite records (and other auxiliary predictors) onto the measured GHI at the earth’s surface. These models emerge with the advent of modern machine learning (Ma et al., 2020; Jiang et al., 2019; Yeom et al., 2019; Kaba et al., 2018; Eissa et al., 2013). In fact, machine learning as a retrieval tool is not limited to just irradiance but has been used ubiquitously for countless meteorological variables such as AOD (Huttunen et al., 2016; Kolios and Hatzianastassiou, 2019), cloud-top temperature and cloudtop height (Yang et al., 2022f; Min et al., 2020), oceanic particulate organic carbon concentrations (Liu et al., 2021), or snow depth (Wang et al., 2020c). The reader is referred to Yuan et al. (2020) for a recent compendium of deep-learning applications in environmental remote sensing. Machine learning, as is in the case of all its other applications, depends heavily for its success upon the amount and quality of training data, which implies that their generalizability and representativeness is limited by the scarcity of groundbased GHI measurements and poor quality of input features. Some former works, as a matter of fact, consider only limited ground-based stations (e.g., Jiang et al., 2019) or daily measurements (Hou et al., 2020), which gravely restricts the proposed models for solar forecasting purposes which demand wide-area and (at least) hourly data. Nonetheless, as ground-based radiometers are becoming an essential requirement in large PV plants across the world, the accuracy of purely data-driven models could be expected to increase with the number and quality of available radiometers. Another common pitfall of data-driven irradiance retrieval algorithms is the complete abandonment of physical considerations (e.g., Jiang et al., 2019), even if some information such as solar zenith and azimuth angles is so undeniably important to radiation while being so easy to compute. Comparable to lacking suitable inputs, using excessive inputs is no better—e.g., Quesada-Ruiz et al. (2015) considered all 11 of the 3-km-resolution channels of the Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard Meteosat Second Generation (MSG) satellite—as irrelevant inputs may lead to overfitting, especially when it is already known that not all channels are relevant and of equal importance to radiation retrieval. In summary, if we are to maximize the accuracy and serviceability of data-driven irradiance retrieval models, ensuring both the abundance and relevance of training data is imperative. On the choice of machine-learning method, the preference is not clear because few works go to the extent of comparing multiple methods head-to-head, but general guidelines concluded from other solar engineering tasks do apply, such as tree-based methods having superior accuracies than other “shallow” network structures (Yagli et al., 2020a, 2019a). In the following pages of this section, we present a short case study on irradiance retrieval from FY-4A’s AGRI, and compare its accuracy, superficially, to some numbers reported in previous works. The method of choice is random forest, which is not intended to demonstrate state-of-the-art accuracy but rather to act as a baseline for future model developments. The content below follows in large part the work of Shi et al. (2023). The ground-based data used in the case study come from a total of 166 radiometry stations spanning two monitoring networks in China, namely, the China

264

Solar Irradiance and Photovoltaic Power Forecasting

50

Latitude [°]

40

30

20

10

80

90

100 110 Longitude [°] Training

Elevation [m]

120

130

Validation

2000 4000 6000 8000

Figure 7.12 Spatial distribution of 166 radiometer stations in China, among which 106 are used for training (as represented by circles), and the remaining 60 are used for validation (triangles).

Meteorological Administration (CMA; 128 stations) radiation observation network and the Chinese Ecosystem Research Network (CERN; 38 stations), over a course of one year (2018), see Fig. 7.12. The radiometers are regularly calibrated, and a simple quality-control sequence following Liu et al. (2017) is performed to ensure a basic level of soundness, such that the data can be used as training and validation targets. Since only one year of data is used, data from the 166 stations are split into a training set (106 stations) and a testing set (60 stations); both the training and testing sets cover all land-cover types, so as to ensure an all-inclusive fair validation. For input features, data from all 14 channels of AGRI are initially considered, alongside other auxiliary data, such as geographic location, elevation, or sun–earth distance. Data from those satellite pixels that are collocated with the ground-based stations are extracted, and the auxiliary variables calculated. Owing to the aforementioned need for feature selection, Pearson correlation is computed between each input feature and the target. Through removing those less correlated features, 11 variables are retained; they are: spectral bands of 0.55–0.75, 3.5–4.0, 6.9–7.3, and 11.5–12.5 μm, sun–earth distance, solar zenith, azimuth, and glint angles, latitude, longitude, and elevation.

Base Methods for Solar Forecast Generation

(b) Retrieved GHI [W/m2]

Retrieved GHI [W/m2]

(a)

265

900 600 300

0

500 1000 Observed GHI [W/m2]

300 200 100

0

100 200 300 400 Observed GHI [W/m2]

Figure 7.13 Scatter plots of (a) hourly and (b) daily GHI retrieved from FY-4A using a random forest model and ground-based measurements, at 60 validation stations, over 2018. The dotted lines show the identity lines, whereas the solid lines show the linear fit to the scatters. Brighter colors correspond to areas with higher point densities.

Table 7.2 Performance comparison of several satellite-derived irradiance databases over China developed using data-driven methods. Both RMSE and MBE are in W/m2 , whereas the correlation (R) is dimensionless. Method (satellite) CNN+MLP (MTSAT)

RTM+DNN (Himawari-8)

Hourly performance

Daily performance

RMSE = 90.38

RMSE = 58.1

MBE = 5.23

MBE = 5.0

R = 0.87

Reference

Jiang et al. (2019)

R = 0.91

RMSE = 106.4

RMSE = 33.3

MBE = 28.6

MBE = 15.1

R = 0.87

R = 0.89

Ma et al. (2020)

RMSE = 35.38 Random forest (Himawari-8)

Random forest (FY-4A)



MBE = 0.01

Hou et al. (2020)

R = 0.92 RMSE = 147.02

RMSE = 29.20

MBE = −5.64

MBE = −2.1

R = 0.85

This chapter

R = 0.95

Using the 11 input features and the targets from 106 stations, the training of random forest is straightforward. With the trained model and the new feature entries from the validation stations, hourly GHI at those stations can be retrieved. The

266

Solar Irradiance and Photovoltaic Power Forecasting

scatter plots of hourly and daily (averaged from hourly values) GHI retrievals and the validation measurements are shown in Fig. 7.13. One may immediately see the reduction in spread when the hourly retrievals are averaged, due to the cancellation of errors. For hourly GHI retrievals, the values of correlation coefficient (R), root mean square error (RMSE) and mean bias error (MBE) are 0.85, 147.0 W/m2 (or 35.2%), −5.6 W/m2 (or −1.4%), respectively, whereas the numbers for the daily retrievals are 0.95, 29.2 W/m2 (or 18.0%), −3.0 W/m2 (or −1.3%). Table 7.2 shows a comparison of the accuracy of the current retrievals with those reported in other studies. Although the sites and validation samples used in various works differ, and the comparison is therefore superficial, one may still gain some understanding of the typical error of data-driven irradiance retrieval methods. As compared to the accuracies of commercial products (such as Solcast) and products obtained using physical methods (such as NSRDB), as reported by Yang and Bright (2020), substantial improvements can be expected, but it is not the goal of this chapter to inquire further. To close this section, we present Fig. 7.14, showing the consecutive maps of irradiance derived from FY-4A’s AGRI, which captures cloud motion over the daylight hours of July 1, 2018. Indeed, as satellite-derived irradiances are spatially complete, one may forecast the next snapshot of the irradiance field by projecting the current snapshot forward, assuming the currently observed cloud motion persists. 7.2.2

FORECASTING USING SATELLITE-DERIVED IRRADIANCE

After elaborating on satellite-to-irradiance conversion techniques, we now proceed to make forecasts from the derived irradiance. Satellite-derived irradiance may be regarded as a temporal collection of lattice processes, each representing the 2D irradiance field at a particular instance. Alternatively, it is also possible to view satellitederived irradiance as a spatial collection of time series, each describing the temporal transient of irradiance at a fixed location. If the latter perspective is taken, all time series forecasting methods as applicable to in situ measurements can be deemed suitable (Blanc et al., 2017a). This puts to relevance a large number of univariate and multivariate methods, which are to be discussed in Section 7.3. That said, it should be also well understood by now that univariate forecasting—or more generally, forecasting using information from a single location—is not at all attractive in the case of satellite-derived irradiance, because spatial information is readily accessible, nor is there any excuse for not taking advantage of that. When time series forecasting methods are applied to satellite-derived irradiance, the usual housekeeping rules, such as training the model on clear-sky index instead of irradiance, are to be respected. The reader is referred to Yagli et al. (2022, 2020a); Singh Doorga et al. (2019); Huang et al. (2019a); Voyant et al. (2014); Dong et al. (2014); Dambreville et al. (2014) for some recent and some not so recent examples on satellite-based forecasting under a (multivariate) time series viewpoint. In contrast to the general time series forecasting procedures, which are statistical, the physical alternative, i.e., obtaining cloud motion, is more instinctive, and is what fundamentally motivates the involvement of satellite data in solar forecasting. Cloud motion vectors (CMVs) explicitly seek to capture the advection of irradiance

Base Methods for Solar Forecast Generation

267

2018−07−01 00:00:00

2018−07−01 01:00:00

2018−07−01 02:00:00

2018−07−01 03:00:00

2018−07−01 04:00:00

2018−07−01 05:00:00

2018−07−01 06:00:00

2018−07−01 07:00:00

2018−07−01 08:00:00

2018−07−01 09:00:00

2018−07−01 10:00:00

2018−07−01 11:00:00

GHI [W/m2]

250

500

750 1000

Figure 7.14 The evolution of the hourly FY-4A GHI retrievals by the random forest model, from 00:00 UTC to 11:00 UTC on July 1, 2018, over China.

field solely due to clouds. CMVs can be obtained in two primary styles, one through processing consecutive images and the other from NWP. Deriving motion vectors from a sequence of frames is a classical problem in computer vision, in which optical flow methods of various kinds such as the method proposed by Lucas and Kanade (1981) have been known for long and applied widely. In the meteorological community, however, the movement of a block of pixels has also often been identified by block matching and other statistical means that seek to determine the motion with the highest probability (Hammer et al., 1999). More specifically, by searching the best-matching blocks of pixels in two consecutive frames, e.g., as defined by the pair with the highest cross-correlation, displacement vectors can be obtained by connecting the centers of the blocks; one algorithm that does this quite efficiently is the three-step search (Li et al., 1994). In Section 7.2.2.1, the Lucas–Kanade method and the three-step search algorithm are explained, to typify the optical-flow-based and

268

Solar Irradiance and Photovoltaic Power Forecasting

block-matching-based approaches to CMV derivation. Then, Section 7.2.2.2 is concerned with obtaining CMVs from NWP, and several representative works are enumerated and reviewed. Considering the increasing attention placed upon learningbased solar forecasting, Section 7.2.2.3 discusses very briefly, on top of the traditional CMV-based methods, how to generate satellite-based forecasts with deep learning. 7.2.2.1

Cloud motion vectors from image processing

In a nutshell, both the Lucas–Kanade method and the three-step search algorithm operate in 2D space, as such they are concerned with movements in the x and y directions, and with time t. The intensity of a pixel may therefore be denoted as I(x, y,t). Furthermore, both methods use the assumption that the movement of all pixels in a small neighborhood of a point p in the frame is the same. With such preliminaries, the optical flow problem is first defined: The optical flow methods try to calculate the motion between two consecutive image frames, which are taken at times t and t + Δt at every pixel position. Optical flow methods can be categorized into dense optical flow and sparse optical flow, differing from each other by whether the flow vectors of all pixels in the entire frame are processed. Optical flow methods are differential methods that follow the brightness constancy constraint: I(x, y,t) = I(x + Δx, y + Δy,t + Δt), (7.44) that is, the brightness of the pixel (i.e., the lighting condition) will remain the same as it moves. Besides the brightness constancy constraint, another important assumption of optical flow is that the movement between the two frames is small, such that the image constraint at I(x, y,t) with the Taylor series can be developed to get: I(x + Δx, y + Δy,t + Δt) = I(x, y,t) +

∂I ∂I ∂I Δx + Δy + Δt + h.o.t., ∂x ∂y ∂t

(7.45)

where “h.o.t.” stands for higher-order terms. By truncating the higher-order terms, one obtains: ∂I ∂I ∂I Δx + Δy = − Δt, (7.46) ∂x ∂y ∂t or, after dividing both sides with Δt: ∂ I Δx ∂ I Δy ∂I + =− . ∂ x Δt ∂ y Δt ∂t

(7.47)

Defining U ≡ Δx/Δt and V ≡ Δy/Δt as the x and y components of the displacement (or optical flow) of I(x, y,t), and also defining Ix ≡ ∂ I/∂ x, Iy ≡ ∂ I/∂ y, and It ≡ ∂ I/∂t as the partial derivatives of the image I with respect to position x, y, and time t, the optical flow equation is: (7.48) IxU + IyV = −It . It is clear from Eq. (7.48) that U and V are what we need to solve, but the problem is one which is under-constrained, i.e., one equation with two unknowns. The

Base Methods for Solar Forecast Generation

269

good news is that we are working with a group of pixels, which provide a system of equations, from which U and V can be solved. Following the main assumption of the Lucas–Kanade method, that is, the optical flow of all pixels within a small neighborhood is constant, U and V must satisfy: Ix (p1 )U + Iy (p1 )V = −It (p1 ), Ix (p2 )U + Iy (p2 )V = −It (p2 ), .. . Ix (pn )U + Iy (pn )V = −It (pn ),

(7.49)

where p1 , . . . , pn are pixels within the neighborhood. Since the problem has now more equations than unknowns, it can be solved using least squares. With basic linear algebra, one yields: ⎛ n ⎞−1 ⎛ n ⎞ n   − ∑ Ix (pi )It (pi ) Ix2 (pi ) Ix (pi )Iy (pi ) ∑ ∑ ⎜ i=1 ⎟ ⎜ i=1 ⎟ U i=1 ⎟ ⎜ n ⎟. =⎜ (7.50) n n ⎝ ⎠ ⎠ ⎝ V 2 I (p )I (p ) I (p ) I (p )I (p ) − ∑y i x i ∑y i ∑y i t i i=1

i=1

i=1

An obvious condition for the above equation to work is that the first matrix on the right-hand side of the equation—also known as the structure tensor—is invertible. When this invertibility issue is interpreted with respect to optical flow, it suggests that one cannot reliably compute the flow for a block of pixels with very similar intensity. Whereas the Lucas–Kanade method typifies the sparse optical flow, the Farneb¨ack algorithm (Farneb¨ack, 2003) is a popular dense optical flow method that appears frequently in solar forecasting works (e.g., see Aicardi et al., 2022; KallioMyers et al., 2020). The technical specifics of the Farneb¨ack algorithm are omitted here, but it should be noted that a standard implementation of it can be found in OpenCV, which is a very powerful real-time optimized computer vision library, available in both C++ and Python. In principle, inasmuch as two consecutive images are provided as input, the optical flow routines can be executed and some motion vectors would result. Figure 7.15 shows the motion vectors derived using the Farneb¨ack algorithm based on the hourly FY-4A GHI retrievals at 06:00 UTC and 07:00 UTC on July 1, 2018, over China—in this case, images of irradiance are used for demonstration purposes. Practically, however, it is thought important to apply optical flow on clear-sky index field or cloud index field, such that the diurnal cycle of irradiance is better accounted for. In other circumstances where a radiative transfer model is available, the cloud and aerosol properties can also be used as input and combined with environmental (temperature and humidity) information from NWP to provide a (slightly) superior forecast. However, such a sophisticated approach is not frequently seen in the literature, and one possible reason is that applying optical flow on several variables may result in inconsistent sets of motion vectors. In contrast to optical flow, which solves the optical flow equation, the working principle of block matching is to search for a block of pixels in the second frame

270

Solar Irradiance and Photovoltaic Power Forecasting

Figure 7.15 Motion vectors derived using the Farneb¨ack algorithm based on the hourly FY-4A GHI retrievals at 06:00 UTC and 07:00 UTC on July 1, 2018, over China, cf. Fig. 7.14. (The lengths of vectors are not drawn according to the actual displacements but for better visualization.)

that best resembles the block of pixels of interest in the first frame. On this point, a measure of similarity has to be first defined, in that, some choices that can be easily thought of, such as cross-correlation coefficient (Yang et al., 2013b), mean absolute difference, or mean square difference (Kuehnert et al., 2013; Lorenz et al., 2004). Regardless of which measure is used eventually, there is a general trade-off in selecting the size of the block. On the one hand, the block of pixels needs to be large enough to contain sufficient cloud features to ensure a robust matching outcome. On the other hand, the block of pixels should be small enough, such that a single motion vector could suffice in describing all motions of the individual pixels within that block. Similar arguments apply for the search region—it should be large enough to cater for the strongest advection within unit time, but should be small enough to avoid spuriously (unphysically) large motions that may result from the coincidental high similarity found between two distant blocks. Determining the optimal choice therefore often requires one to test several options or be based on previously gathered empirical evidence (Roy et al., 2022; Kuehnert et al., 2013). For Europe, Kuehnert et al. (2013) determined the ideal parameters for the block-matching method as: a vector grid spacing of 43 km, an image area to be compared of 110 km × 110 km, and a maximum possible speed of cloud movement of 25 m/s. This agrees well with the findings of Lorenz et al. (2004), who suggested a vector grid spacing of 25 km and a pixel block of 90 km × 90 km. A general problem which one may face when performing block matching is the computation speed. An exhaustive search over all candidate blocks, each differing from the next (overlapping) candidate by just one row or one column of pixels, can be quite slow. In this regard, the three-step search is an elementary way to bring speed

Base Methods for Solar Forecast Generation

271

improvements over exhaustive search. Suppose the block of pixels to be matched is centered at coordinates (x, y) in the previous frame, the three-step search algorithm proceeds by matching that block to the nine blocks in the current frame centered at coordinates (x − 4rx , y − 4ry ), (x − 4rx , y), (x − 4rx , y + 4ry ), (x, y − 4ry ), (x, y), (x, y + 4Δy), (x + 4rx , y − 4ry ), (x + 4rx , y), and (x + 4rx , y + 4ry ), where rx and ry denote the search resolution in x and y directions; these initial nine blocks being searched are referred to as the (±4rx , ±4ry ) neighborhood. Once the first-round bestmatched coordinates are identified within the (±4rx , ±4ry ) neighborhood, the algorithm then reiterates the search in the (±2rx , ±2ry ) neighborhood of the first-round best-matched coordinates. Finally, with the second-round best-matched coordinates found in the (±2rx , ±2ry ) neighborhood, the third-round search is conducted within the (±rx , ±ry ) neighborhood. This process is illustrated in Fig. 7.16. rx ry

1

1

2

2

2

2

1

2

2

2

1

1

1

1

1

1

3

3

3

3

2

3

3

3

3

Figure 7.16 An illustrative diagram for the three-step search algorithm. The dotted arrows represent an example path for convergence, whereas the solid arrow denotes the final motion vector.

Thus far, we have discussed two classes of methods to derived CMVs from consecutive images. Next, how to generate forecasts is elaborated. Conceptually, generating forecasts based on CMVs is an extrapolation problem, that is, future clear-sky index or cloud index images (depending on which variable has been used to derive CMVs) can be created by applying motion vectors to the most recent image to extrapolate cloud movement. Stated differently, the extrapolation is performed by moving existing blocks of pixels along their corresponding motion vectors. Notwithstanding, since CMVs may diverge, advecting the field forward in time may result in “blind spots” or “overlapped spots,” i.e., locations without any advected information or with duplicated advected information. Therefore, in practice, either all advected pixels in

272

Solar Irradiance and Photovoltaic Power Forecasting

an area surrounding the target pixels are averaged/smoothed (Perez et al., 2010a) or the problem is reversed and a backtrajectory is followed from the forecast location backward in time (Kuehnert et al., 2013); the latter approach seems to be more amenable. Denoting the current image by I(x, y,t) and the unit-time displacement field by (u, v), the future image by I(x, y,t + 1) is constructed pixel-by-pixel with I(x − u, y − v,t). It must be noted that (u, v) can be non-integer, and in such cases the value of I(x − u, y − v,t) needs to be acquired through a sub-pixel bilinear (or some other form of) interpolation (Aicardi et al., 2022). For the two-step-ahead forecast image I(x, y,t + 2), it is constructed pixel-by-pixel with I(x − u, y − v,t + 1), so on and so forth, and for h-step-ahead forecast image: I(x, y,t + h) = I(x − u, y − v,t + h − 1);

(7.51)

this step-by-step construction is not the same as extrapolating I(x, y,t + h) from I(x, y,t) in one step. The above-mentioned procedure in fact reveals the fundamental assumption of satellite-based forecasting, that is, clouds travel with the motion vector of the air mass that surrounds them. In other words, clouds do not generate their own movement, but they travel like a neutrally buoyant balloon, and float according to variation in wind fields with location. More formally, this assumption is known as Taylor’s hypothesis. Taylor’s hypothesis—also referred to the hypothesis of frozen cloud advection— assumes that the cloud field is “frozen” or steady. In reality, Taylor’s hypothesis is violated by (1) the variation of cloud motion vectors in time and (2) the variation of the cloud field in time. Cloud motion vectors often undergo a rotational component that results in a curved or nonlinear trajectory. The variation of the cloud field in time comes from cloud formation and dissipation. For example, when air is cooled by advection into cooler surroundings (e.g., from warmer land to cooler oceans), the water vapor content would reach saturation and condense. Or, convection (vertical movement) can cause air to cool and water vapor to condense. Conversely, clouds can evaporate or dissipate when the air mass containing the clouds is advected to warmer surroundings. Therefore, whereas a linear trajectory is a reasonable assumption for short forecast horizons of the order of 1 h, the cloud trajectory typically increasingly diverges from the linear trajectory for longer forecast horizons, resulting in rapid deterioration in forecast accuracy. Aicardi et al. (2022) reported that the RMSE of satellite-based irradiance forecasting doubles as the forecast horizon increases from 1 to 5 h, irrespective of the method used to derive CMVs. Another minor caveat is that the assumption that clouds move with the air is violated in certain circumstances. Examples are lenticular clouds, where the air moves through the stationary cloud. The lenticular cloud is formed as moist air is advected up a mountain. As the air cools adiabatically the water vapor condenses. But as air descends the other side of the mountain its temperature increases adiabatically and the cloud dissipates. Stationary clouds can also be found near ocean–land or sea– land gradients where rapid cooling from a cold water body causes cloud formation. Another example is the dissipation of marine layer clouds in southern California in the morning, which is driven by solar heating and not affected by the wind field (Wu

Base Methods for Solar Forecast Generation

273

et al., 2018). In any case, conditions of clouds not moving with the air are rare and should not be of concern for most forecasters. The last point of discussion before we proceed to the next part is regarding the forecast resolution. It should be understood that solar forecasts from cloud advection can be generated at any temporal resolution, even 1 s. That is because the optical flow, i.e., U and V in Eq. (7.48), can be regarded as speed, and with arbitrarily small unit time, the displacements u and v in Eq. (7.51) can also be arbitrarily small. However, choosing a temporal resolution that is significantly finer than the spatial resolution of the images—converted to temporal resolution using the cloud speed—does not add any value to the forecast. In other words, the variability represented by the forecast time series does not reflect the true expected variability. For example, for a 1-km satellite pixel and a cloud speed of 10 m/s, the appropriate temporal resolution is 100 s. Therefore advecting satellite images at 1-s resolution is not sensible. In this particular case, a resolution between 1 to 5 min seems appropriate, with 1 min ensuring that all spatial granularity is maintained in the forecast and 5 min reducing the storage/processing requirements. 7.2.2.2

Cloud motion vectors from NWP

A shortcoming of the method presented in Section 7.2.2.1 is that the CMVs obtained for a particular instance using two consecutive images are static, i.e., they do not change as a function of the forecast horizon. Although such a steadiness assumption is consistent with Taylor’s hypothesis, it is, as mentioned earlier, almost always violated in practice. This difficulty may be mitigated by considering NWP, which routinely provides forecasts for the dynamic 3D wind field in the atmosphere. In other words, CMVs obtained from NWP models can be applied to the clear-sky index or cloud index field to create backtrajectories, substituting the static CMVs presented in Section 7.2.2.1. However, using NWP CMVs, though has its advantages, ought not to be regarded as problem-free. Horizontal wind fields often change significantly in speed and direction for different heights in the atmosphere. Two particular challenges therefore emerge: (1) the NWP wind forecasts are associated with their own errors, which, as mentioned in Section 7.1, could come from different sources; and (2) since the cloud height is unknown, it is not always clear from which height the CMV should be taken. On the second challenge, several scenarios could be thought of. First, if there is only one cloud layer in the NWP model, then it is indisputable that one should take the CMV from that same layer. Second, if there are several cloud layers in the NWP model, potential choices are then to use CMV from the cloud layer with the largest optical depth; or alternatively, to use information about cloud heights derived from the satellite data to choose the matching cloud layer. Third, if the NWP model forecasts no cloud but the satellite data suggests otherwise, cloud height derived from satellite data can be used directly, or some sort of vertical averaging could be performed. To give an example of how NWP CMVs are used, we review very briefly the Cooperative Institute for Research in the Atmosphere Nowcast (CIRACast) model,

274

Solar Irradiance and Photovoltaic Power Forecasting

which is one of the component models within a bigger forecasting suite called the Sun4Cast (Lee et al., 2017). CIRACast produces forecasts with a resolution of 5 min, over a horizon of 3 h. The forecasting step is executed on a satellite-derived cloud field, in that, the forecast cloud field is then converted to irradiance via the radiative transfer code of Pinker and Laszlo (1992). It should be recalled that both the Lucas–Kanade optical flow and block matching divide the consecutive images of interest into equal-size rectangles, but CIRACast does not do that. Instead, CIRACast divides groups of pixels by cloud features, in other words, nearby pixels that possess similar cloud features are advected together. More specifically, cloud properties, including cloud type, cloud-top height, effective radius, and cloud optical depth, are acquired using the PATMOS-x algorithm (Heidinger et al., 2014)—recall that PSM also applies this algorithm on GOES data in constructing the NSRDB. Once the cloud groups are identified, motion vectors at the centroids of individual groups are obtained, in that, steering wind information at the cloud-top heights for those centroids is obtained from an NWP model. This essentially allows height-specific nonhomogeneous wind vectors, which should result in better forecast accuracy than using a wind vector field from the same height. Irradiance forecasts from CIRACast have been validated at the Surface Radiation Budget Network (SURFRAD) locations, and the mean absolute error was found to range from 8–17% for intra-hour horizons. The reader is referred to Miller et al. (2018) for a complete documentation of CIRACast. Although CIRACast typifies state-of-the-art satellite-based solar forecasting, its main disadvantage is the assumption that cloud microphysics is invariant. Stated differently, the clouds in the model neither grow nor dissipate throughout the forecasting process, and no new clouds are formed. On this point, it should be noted that NWP data could complement the prediction of cloud formation and dissipation of the satellite-detected cloud field, as NWP simulates the drying or moistening of air masses. But to the best of the knowledge of the present authors, such an approach has not been attempted yet. Given that NWP cloud forecasts are challenged by their own issues, predicting formation, growth, and dissipation from NWP is also thought to be challenging. Although obtaining CMVs from NWP is simple in its core idea, due to the aforementioned limitations, applying such a forecasting method in its most basic form is not likely to result in stellar forecasts. As such, numerous innovations, which are conceptually attractive but procedurally sophisticated, have been proposed. For instance, Harty et al. (2019) made a combined use of two sets of CMVs derived from Lucas– Kanade optical flow and an NWP model via data assimilation. Furthermore, to handle the uncertainties originating from various sources, the idea of ensemble forecasting was employed, where each member of the ensemble consists of a randomly perturbed cloud index field and a randomly perturbed CMV field. This forecasting system proposed by Harty et al. (2019) is called the Assimilation of NWP winds and Optical flow CMVs (ANOC) system, which achieved 0.12 and 0.11 improvements in skill score as compared to forecasts based solely on optical flow CMVs and NWP CMVs, respectively, over a 39-day test period. Another innovation was presented by ArbizuBarrena et al. (2017), who enhanced the advection-only motion of clouds by adding

Base Methods for Solar Forecast Generation

275

diffusion. The main idea therein is to assimilate the cloud index maps derived from MSG images into a version of WRF, which can account for both advection and diffusion, both horizontally and vertically. 7.2.2.3

Deep learning on satellite data

Machine learning is now the dominant approach in the majority of forecasting domains; this is true at least judging from the sheer number of related publications documented each year. Machine learning can be applied to any dataset, and examples have been or are to be mentioned throughout this book. That said, leveraging spatiotemporal information from satellite images as input to machine-learning models is perhaps the least mature area of applications of machine learning in solar forecasting. The reason is rather simple: The training of many machine-learning models soon becomes infeasible as the dimensionality explodes, which is the case with satellite data, which can easily translate into hundreds of thousands of input variables. But matters are quite different nowadays. With the revolutionary development of deeplearning technology in the mid- to late-2010s, in particular, the convolutional neural networks (CNN), satellite-based data can now be handled and analyzed more cleverly than ever. In the last bit of this satellite-based forecasting section, some latest advances in applying deep learning to satellite data, in order to achieve spatial irradiance forecasts, are discussed. Deep learning is a subset of machine learning. Deep-learning methods are based on artificial neural networks (ANNs) with representation learning, and they have been widely adopted in a range of applications, as one of the most promising stateof-the-art computer science technologies. Possibly the most established and popularly employed ANNs in both research and application is the multilayer perceptron (MLP), which is a fully connected class of feedforward neural networks. Whereas MLP is a general-purpose network architecture, it faces a range of limitations during training (e.g., the number of weights rapidly becomes unmanageable for large images). As such, there have been many other networks with task-specific architectures proposed to resolve the limitations of MLP. Since the task concerning this book is forecasting, in which the data exists in the form of time series, the recurrent neural network (RNN), which is designed to handle time series data, ought to be deemed appropriate. However, given that satellite-derived fields are in the form of images, CNN is often perceived as more suitable. This echoes the discussion in the early part of this section—satellite data can be viewed either as a spatial collection of time series, or as a temporal collection of lattice processes, and the choice between these two perspectives would generally decide whether an RNN or a CNN should be selected. Our preference on this matter thus far suggests that the latter viewpoint (satellite data as a temporal collection of lattice processes) could better exploit the spatio-temporal information embedded in satellite images, as how optical flow and block matching are motivated to retrieve cloud motion, therefore, CNN is the main architecture of focus in this section. The neocognitron, which was inspired by the structure and working of the visual cortex, was first proposed in the early 1980s (Fukushima, 1980; Fukushima and

276

Solar Irradiance and Photovoltaic Power Forecasting

Miyake, 1982), and it gradually evolved into the CNN as known today. Although efficiently training a CNN was challenging back then, technology enablers that emerged over the past half a century or so, such as the increase in computational power, the availability of big training data, or smarter network training tricks, have certainly made CNN one of the most successful and easily implementable deep-learning tools, and there is now a gigantic body of literature associated with it. Whereas information regarding CNN may be found everywhere, we introduce CNN by explaining some of its most basic terminologies, such that readers with little or no background in deep learning can better follow other more advanced text on CNN. The most important building blocks of a CNN is the convolutional layer. There are multiple convolutional layers in a CNN; Fig. 7.17 visualizes two convolutional layers alongside a 9 × 9-pixel input image. Convolutional layers analyze input images using receptive fields, which function as feature extractors to derive image features/representations from different parts (i.e., receptive fields) of an image. In other words, cf. Fig. 7.17, neurons (i.e., “pixels”) in the first convolutional layers are not connected to every single pixel in the input image, but only to pixels within their receptive fields. Similarly, neurons in the second convolutional layers are not connected to every neuron in the first convolutional layer. This hierarchical nature is often contained in real-world images, which is one reason why CNNs work so well in learning data that are in the form of images. Conv. layer 2

Conv. layer 1

Input layer

Figure 7.17 A visualization of a CNN with two convolution layers, each with 3 × 3 local receptive fields.

In defining how the convolution is performed, three (groups of) hyperparameters are essential. Firstly, the height ( fh ) and width ( fw ) of the receptive field define the size of a unit sub-area of the image that is assumed to share similar image features or representations; in Fig. 7.17, fh = fw = 3 for both convolutional layers. In this regard, the neuron in the ith row and jth column of an upper layer is connected to neurons

Base Methods for Solar Forecast Generation

277

Figure 7.18 Reducing dimensionality using a stride sh = sw = 2. in the previous layer located in rows i to i + fh − 1 and columns j to j + fw − 1. In Fig. 7.17, the two neurons located at (1, 1) and (6, 3) of the first convolutional layer are connected to the 3 × 3 groups of pixels located within the (1 : 3, 1 : 3) and (6 : 8, 3 : 5) blocks of the input image. Next, the distance between two adjacent receptive fields is called the stride, which is expressed as sh and sw , representing the displacements in the height and width directions, respectively. If sh = sw = 1, an n × n previous layer would result in an upper layer of size (n − 2) × (n − 2). But if the receptive fields are more spaced out, i.e., sh = sw > 1, the resultant upper layer would contain fewer neurons, or equivalently, would be smaller in dimension. In the case of a 9 × 9 input image, if sh = sw = 2, the first convolutional layer would have a size of 4 × 4, as shown in Fig. 7.18. Lastly, in certain situations where a specific output dimensionality is desired, one may add pixels of zeros around the input image, of which the procedure is known as zero padding. Figure 7.19 gives an example with a zero padding of p = 2, in that, with the stride sh = sw = 2 unchanged, the size of the convolutional layer increases to 6 × 6.

Figure 7.19 Increasing dimensionality using zero padding with p = 2, while stride remains at sh = sw = 2.

As is the case of a regular neural network, each neuron of a CNN is connected to neurons in the previous layer by weights. A neuron’s weight can be represented

278

Solar Irradiance and Photovoltaic Power Forecasting

as a small image that has the same size as the receptive field. This small image is called a convolution kernel. As the kernel slides through the image (following some particular stride), at each overlapping position, one computes first the elementwise multiplication of the kernel and the receptive field, and then computes the sum. Figure 7.20 exemplifies this process with a 9 × 9 image matrix I and a 3 × 3 kernel K , with a stride sh = sw = 1. Owing to this convolution operation, all neurons in the upper layer share the same set of weights, and this upper layer obtained thereof is called a feature map, which highlights the areas in an image that are most similar to the filter. To give perspective on the effect of different kernels, Fig. 7.21 shows the feature maps obtained by applying four different kernels to a 200 × 100-pixel image of the FY-4A GHI retrievals at 07:00 UTC on July 1, 2018 (the image is a subset of the one displayed in Fig. 7.15). The four 3 × 3 kernels, K 1 , . . . , K 4 , are: ⎛ ⎞ ⎛ ⎞ 0 1 0 0 0 0 K 1 = ⎝0 1 0⎠ , K 2 = ⎝1 1 1⎠ , 0 1 0 0 0 0 ⎛ ⎞ ⎛ ⎞ −1 −1 −1 0 −1 0 K 3 = ⎝−1 8 −1⎠ , K 4 = ⎝−1 5 −1⎠ , (7.52) −1 −1 −1 0 −1 0 which correspond to vertical, horizontal, edge detection, and sharpen kernels, respectively. Whereas these four kernels are arbitrarily selected for visualization purposes, a CNN automatically finds during training the most useful kernels for its task, and it learns to combine them into more complex patterns. ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

⎞ 0 0 0 0 0 0 1 1 0

1 0 0 0 0 1 1 0 0

1 1 0 0 1 1 0 0 0

1 0 0 0 ×1 ×0 ×1 1 1 0 0 ×0 ×1 ×0 1 1 1 0 ×1 ×0 ×1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 I

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ∗ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠



⎛ ⎜ ⎜ ⎜ ⎜ 1 0 1 ⎜ ⎜ ⎟ ⎜ 0 1 0 ⎟ = ⎜ ⎜ ⎝ ⎠ ⎜ ⎜ 1 0 1 ⎜ ⎜ ⎝ ⎛



K

1 1 1 1 3 3 1

4 2 2 3 3 1 1

3 4 3 3 1 1 0

4 3 4 1 1 0 0

1 3 1 1 0 0 0

1 0 1 0 0 0 0

0 0 0 0 0 0 0

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

I ∗K

Figure 7.20 Convolution operation diagram of a 9 × 9 image matrix I and a 3 × 3 kernel K , with a stride sh = sw = 1. Up to this stage, we have deliberately kept for simplicity the representation of each convolutional layer as a thin 2D layer, but in reality, there is usually more than one kernel involved. These different convolution kernels act as different filters creating feature maps representing something different. On this point, a more realistic convolutional layer in fact has a 3D representation. Figure 7.22 shows the architecture of the LeNet-5, which is an introductory and widely known CNN architecture,

Base Methods for Solar Forecast Generation

279

(a) Vertical

(b) Horizontal

(c) Edge detection

(d) Sharpen

Figure 7.21 Applying four different kernels to get four feature maps. The original image is a 200 × 100-pixel subset of the FY-4A GHI image displayed in Fig. 7.15. Convolution 28 × 28 × 6

Convolution 10 × 10 × 16

Convolution FC layer 1 × 1 × 120 10



Input image 32 × 32

Avg. pooling 14 × 14 × 6

Avg. pooling 5 × 5 × 16

FC layer 84

Figure 7.22 The LeNet-5 architecture.

as proposed by LeCun et al. (1989). In this architecture, there are three convolutional layers and some other components that are to be discussed shortly after. In the first convolutional layer (or C1 ), the 32 × 32-pixel input are convolved into six 28 × 28 feature maps, whereas in the second convolutional layer (or C3 ), the previous stage information is convolved into sixteen 10 × 10 feature maps. Besides using more kernels, the input images are sometimes composed of information from multiple channels, e.g., a color image containing red, green, and blue channels, or a satellite snapshot contains several channels relevant to irradiance retrieval. In this case, the kernel becomes a 4D tensor that is not apt for direct visualization. The training process as to obtaining the elements of the 4D tensor is complex, but bottom-level routines and code snippets are continuously being populated in PyTorch and TensorFlow among other frameworks, and can be found online with ease.

280

Solar Irradiance and Photovoltaic Power Forecasting

Beyond the convolutional layer, another vital component of CNN is the pooling layer. The main task of pooling layers is to subsample the input image so as to reduce the computational burden, save computer memory, and shrink the parameters to be estimated. One additional benefit of using pooling layers is that pooling allows the network to tolerate mild image shifts. Similar to the case of the convolutional layer, there could be multiple pooling layers in a CNN architecture, and they may be either local or global. Whereas a local pooling layer aggregates small clusters of neurons (or pixels) with a typical dimension of 2 × 2, the global pooling layer acts on all the neurons of the feature map. Commonly applied pooling methods are just maximum and average, which are comprehensible through their naming—max pooling uses the maximum value with each receptive field, and average pooling computes the average. The pooling operation is similar to how the convolution kernel works, in that, it is controlled by hyperparameters including the dimension of the pooling kernel, stride, and padding. To give perspective, LeNet-5 (see Fig. 7.22) has two local pooling layers, both having a pooling kernel with a dimension of 2 × 2, a stride of 2 × 2, and no padding. To close the discussion on the typical CNN architecture, it should be noted that CNNs are ANNs, as such they require an activation function at each layer. The most commonly used activation functions for CNNs are rectified linear unit (ReLU), sigmoid, and hyperbolic tangent (tanh). Besides the activation function, the last essential component of a CNN is the fully connected (FC) network, which is no different from a regular MLP, which acts as a regression model or a classifier, depending on the learning task. Because the convolution and pooling layers produce feature maps, the maps at the last layer need to be flattened before being passed to the FC layer(s). More advanced CNN architectures may include additional layers, such as the batch normalization layers or dropout layers, of which the former helps stabilize the learning process and prevent the problem of internal covariate shift, and the latter uses a regularization technique to prevent overfitting. Likewise, one may also replace MLP with other regression models (or classifiers). For instance, when the data permits, it is possible to combine two deep-learning methods, namely, CNN and RNN, where the latter is typified by the long short-term memory (LSTM) network, which, by itself, has been a popular regression tool for solar forecasting. Table 7.3 summarizes the hyperparameter setting of LeNet-5, whereas Table 7.4 provides those of AlexNet (Krizhevsky et al., 2012), which is another classic CNN architecture that won the 2012 ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). With these tables one should notice that the architectures are deep as contrasted to a typical “shallow” ANN architecture with only one input layer, one hidden layer, and one output layer. However, if one is to continue going deeper, other advanced training tricks would soon have to be involved. The GoogLeNet (also known as the inception network; Szegedy et al., 2015), for instance, has 22 layers, which were made possible through sub-networks called inception modules, which contain in themselves convolution and pooling layers with different hyperparameters (and thus the name “inception”). The GoogLeNet won the 2014 ILSVRC. One year after the proposal of GoogLeNet, the residual network (ResNet) was proposed by He et al. (2016), who were the winners of the 2015 ILSVRC. ResNet consists of

Base Methods for Solar Forecast Generation

281

152 layers, and the training depends upon a technique called skip connection, which feeds the same information into a layer and to the output of a layer located deeper in the stack. With these examples, it is evident that CNN, or deep learning in general, is an extremely fast-advancing field, and this rapidity is likely to propagate to other applications that rely on CNN. The matter here is one of adaptation.

Table 7.3 Hyperparameters of LeNet-5. Layer

Type

Maps

Size

Kernel size

Stride

Activation

In C1 S2 C3 S4 C5 F6 Out

Input Convolution Avg. pooling Convolution Avg. pooling Convolution Fully conn. Fully conn.

1 6 6 16 16 120 — —

32 × 32 28 × 28 14 × 14 10 × 10 5×5 1×1 84 10

— 5×5 2×2 5×5 2×2 5×5 — —

— 1 2 1 2 1 — —

— tanh tanh tanh tanh tanh tanh RBF

Note: LeCun et al., 1989

Table 7.4 Hyperparameters of AlexNet. Layer

Type

Maps

Size

Kernek size

Stride

Activation

In C1 S2 C3 S4 C5 S6 C7 F8 F9 Out

Input Convolution Avg. pooling Convolution Avg. pooling Convolution Avg. pooling Convolution Fully conn. Fully conn. Fully conn.

3 96 96 256 256 384 384 256 — — —

224 × 224 55 × 55 27 × 27 27 × 27 13 × 13 13 × 13 13 × 13 13 × 13 4096 4096 1000

— 11 × 11 3×3 5×5 3×3 3×3 3×3 3×3 — — —

— 4 2 1 2 1 1 1 — — —

— ReLU ReLU ReLU ReLU ReLU ReLU ReLU ReLU ReLU Softmax

Note: Krizhevsky et al., 2012

Indeed, one difference between solar forecasting and image classification is that the latter attracts much higher attention from computer scientists, which results in standard datasets and regular competitions that are conducive to fair benchmarking. In solar forecasting, on the other hand, the disparate datasets and error metrics spanning the present literature, as also have been and will be mentioned at other places in

282

Solar Irradiance and Photovoltaic Power Forecasting

the book, make it difficult to uphold the reported superiority of forecasting models. Consequently, it does not really add value to the current discussion if the accuracies of various deep-learning models on solar forecasting are quoted. As mentioned earlier, applying deep learning in satellite-based solar forecasting is a much underdeveloped domain, the few available works at the time of writing include (Gallo et al., 2022; Qin et al., 2022; Nielsen et al., 2021; P´erez et al., 2021; Choi et al., 2021; Yeom et al., 2020; Jiang et al., 2019). Instead of scrutinizing each of these works, we close this section by analyzing just one reference, namely, the work of P´erez et al. (2021), which is thought to be procedurally representative. P´erez et al. (2021) adopted a CNN+MLP approach to forecast intra-day irradiance in 15-min steps out to 6 h, using spatio-temporal information derived from MSG, which has a resolution of 15 min and 3 km. The multi-step-ahead forecasting, which is not very commonly seen in machine-learning-based solar forecasting works, was realized by applying the same network architecture to different training targets that lag the forecast-issuing time by different margins. For a forecast-issuing time t, the inputs to the CNN are images holding the irradiance field from 10 immediate past instances, i.e., at t − 1, . . . ,t − 10. The training targets, depending on the horizon h, can either be in situ irradiance measurements or satellite-derived irradiance at time t + h. The proposed model, in the latter case, resembles satellite-based forecasting with CMVs, in that, it does not rely on ground-based data, which is suitable for locations with little or no historical measurements. What should be highlighted is that P´erez et al. (2021) suggested that the satellite images inserted into the CNN has a resolution of 35 × 35 pixels, which translates to an area of 100 × 100 km2 . This area is comparable to the block of pixels used to derive a single CMV in the block-matching method of Kuehnert et al. (2013); Lorenz et al. (2004). This potentially indicates a deficiency in the CNN model, because too small an image would be unable to capture fast cloud motion. On this point, although P´erez et al. (2021) mentioned—without providing evidence—that increasing the size of input images did not bring substantial accuracy improvement, this could be due to the fact that the larger images are still too small as compared to the typical CMV-based forecasting, which involves optical flow and block matching within a domain of thousands of km. Indeed, most existing satellite-based forecasting works using deep learning do not entertain, with some exceptions (Nielsen et al., 2021), head-to-head comparisons with physics-based methods, which prompts future studies along this direction.

7.3 7.3.1

FORECASTING WITH GROUND-BASED DATA DATA-DRIVEN FORECASTING WITH INFORMATION FROM A SINGLE SPATIAL LOCATION

The chance for novices to begin solar forecasting with time series and machinelearning methods is rather high. In the case of the former, fundamental theory in time series forecasting was already established in the 1970s, when Box and Jenkins first pioneered the autoregressive integrated moving average (ARIMA) framework. Although other time series frameworks such as exponential smoothing (ETS) arrived

Base Methods for Solar Forecast Generation

283

later than the ARIMA one, they were nevertheless mature when solar forecasting emerged as a topic of concern for engineers. In the case of the latter, machine learning, and lately, deep learning, started to gain massive popularity owing to the rapid acceptance of general-purpose high-level programming languages, such as Python or R, with high code readability and open-source community support. Today, anyone with an entry-level programming background should be able to learn to operate time series and machine-learning forecasting tools within a few hours. What people often overlook is the underlying mathematics that powers the forecasting routines. The most fundamental application of time series and machine-learning forecasting is to apply it to endogenous data, i.e., a single time series. Although these approaches to describe and thus forecast stochastic processes have deep roots and rich mathematical properties—for instance, the latest version of the book by Box et al. (2015) stands at 712 pages—exploring those would result in nothing but poor solar forecasting performance, for reasons that have been repeatedly stressed throughout this book. Forecasting solar time series with no exogenous input can be typified by two papers by Reikard (2009) and Pedro and Coimbra (2012), which have both contributed to the subsequent tsunami of studies on forecasting using information from a single spatial location; such methods can be called local data-driven methods. Since those works presented only univariate methods, it is undemanding at all, at least from a practical viewpoint, to expand the univariate methods, and to include into the models more locally collected information, such as those regularly observed ground-level weather variables (i.e., 2-m temperature, 10-m wind, surface pressure, or humidity). The most representative modeling style is to use locally gathered basic weather parameters and lagged versions of the PV power measurements to explain PV power at a future time—this kind of forecasting seems practically attractive, since many PV plants are equipped with power sensors and basic weather stations. Of course, now, it is true, that solar forecasting with local data-driven methods has disappeared at large, owing to the lack of reward in pursuing those lowperforming methods. Few journals would still publish this kind of work, regardless of how mathematically complex the methods may be, and thus there is no motivation for solar forecasters in academia to pursue local data-driven methods further. That said, there is perhaps a surviving aspect of these local data-driven methods, that is, forecast benchmarking. For instance, Voyant et al. (2022) drew the connections between several classic time series methods and some most popular standards of reference in meteorological forecasting. The forecasting benchmarks advocated therein mark the simplest possible models with the lowest data requirement but decent performance, and thus are commendable. Insofar as the housekeeping rules outlined in Chapter 5 are respected, such as performing quality control or forecasting on clear-sky index, those carefully chosen benchmarks allow some degree of fairness in forecast comparison. 7.3.2

SPATIO-TEMPORAL STATISTICAL METHODS

Although data-driven methods are not very attractive if they are confined to information from a single spatial location, their utility can be greatly enhanced when

284

Solar Irradiance and Photovoltaic Power Forecasting

information from neighboring locations is supplemented. This type of information set—several focal locations are to be forecast, with numerous neighboring locations providing correlated measurements—attracts the application of spatial statistics, which is also known as geostatistics. In essence, spatio-temporal statistical modeling is interested in spatio-temporal random process {Y (ss;t) : s ∈ Ds ⊂ Rd , t ∈ Dt ⊂ R}, where the process variable of interest Y is a function of d-dimensional spatial location and one-dimensional time. The random process Y (ss;t) evolves through a spatiotemporal index set of Ds × Dt , which can be either discrete or contiguous. Satellite data, as seen in the previous section, is discrete in both space and time, as such the data forms a time series of lattice processes, i.e., each snapshot is a spatial lattice process. On the other hand, if the process under modeling is continuously distributed by nature, such as solar irradiance or water vapor, it may be considered as a time series of geostatistical processes. Spatio-temporal statistics is a difficult topic on its own, but is profoundly useful for meteorology. In fact, some of the most renowned statisticians working on spatiotemporal statistics have advanced degrees in meteorology. As a subject, spatiotemporal statistics has four main purposes: (1) making predictions in space, which can be regarded as smoothing and interpolation in a d-dimensional space; (2) making predictions in time, that is, forecasting; (3) assimilating of observations with deterministic models, which is what NWP is doing; and (4) performing inference on parameters that explain components of the process under scrutiny. Traditionally, descriptive models of spatio-temporal processes focus on characterizing the firstand second-order moments of the process, i.e., means, variances, and covariances. Under a modern take, spatio-temporal processes are often modeled in a dynamic framework, in which the time evolution is considered explicitly. One simple way to distinguish descriptive from dynamical models is that the former focuses on identifying the marginal probability distribution of the process, whereas the latter the conditional probability distribution. A fundamental modeling philosophy of spatio-temporal modeling of random processes is to divide the process into several additive components, each representing the variation of the process at some particular scale—such a model is known as a component-of-variation model. Most generally, one may express the process Y (ss;t) as: (7.53) Y (ss;t) = μ(ss; θt ) + γ(t; θs ) + κ(ss;t; θs;t ) + δ (ss;t). In this equation, μ(ss; θt ) represents the large-scale variation in space, or the spatial trend component, which depends on potentially time-varying parameters θt . Analogous, γ(t; θs ) is the large-scale variation in time, or the temporal trend of the process, with potentially spatially varying parameters θs . The κ(ss;t; θs;t ) term denotes the interaction, which suggests spatio-temporal dependence. Lastly, δ (ss;t) represents the small- and/or microscale variation in the process, which is often assumed to have zero mean. These four terms are independent of each other. Among the terms, κ and δ are considered to be random, and μ and γ are often assumed to be fixed, and can be estimated through, for example, linear regression. In that, the particular componentof-variation model in Eq. (7.53) is known as the spatio-temporal mixed-effect model.

Base Methods for Solar Forecast Generation

285

That said, not all terms in Eq. (7.53) need to be included during modeling, and the choice solely depends on the modeler’s belief. Component-of-variation models may be handled with both descriptive (i.e., solved using kriging; Section 7.3.2.1) and dynamic frameworks (Section 7.3.2.2). 7.3.2.1

Descriptive spatio-temporal models

Central to descriptive models is the covariance function,   C(ss, r ;t, q) = Cov Y (ss;t),Y (rr ; q) , s , r ∈ Rd , t, q ∈ R,

(7.54)

which mathematically models how the covariance between two samples drawn from the process behaves. One should be aware that not any function can be a covariance function, rather, a covariance function must satisfy: (1) C(ss, r ;t, q) = C(rr , s ; q,t), for all s , r ∈ Rd , t, q ∈ R, and (2) positive definiteness. Additionally, the covariance function is also assumed to be second-order stationary, which means the expectation of Y (ss;t) is constant, and the covariance is only a function of the spatio-temporal lag:   Cov Y (ss;t),Y (rr ; q) = C(hh; τ), (7.55) where h = s − r and τ = t − q are the distances (or lags) in space and time between the two samples. When one of the lags of the stationary covariance function C(hh; τ) is restricted to zero, i.e., C(hh; 0) or C(00; τ), the covariance function is said to be purely spatial or purely temporal, respectively. When both lags are zero, C(00; 0) is a constant, and the ratio of C(hh;t)/C(00; 0) gives the correlation function. Lastly, if a spatio-temporal covariance function can be written in the form C(hh; τ) = Cs (hh) ·Ct (τ),

(7.56)

where Cs (hh) and Ct (τ) are purely spatial and purely temporal covariance functions, it is said to be separable. Similarly, if the function satisfies C(hh; τ) = C(−hh; τ) = C(hh; −τ) = C(−hh; −τ),

(7.57)

it is said to be fully symmetric. Separability is a special case of full symmetry. Therefore, when the covariance matrix is not fully symmetric, it cannot be separable. The reader is referred to Montero et al. (2015); Gneiting (2002); Cressie and Huang (1999) for a detailed explanation and specific examples of covariance functions. The technique for making predictions of the component-of-variation models, with a descriptive perspective, is called kriging. In a spatio-temporal setting, kriging has many variants, which differ from one another in terms of how the spatio-temporal process is decomposed. The most basic model which one may assume is the so-called additive measurement error and process model. Given some noisy observations made at n space–time coordinates,14 namely,  

Z = Z(ss1 ,t1 ), . . . , Z(ssn ,tn ) , (7.58) 14 Note that n is the total number of space–time coordinates, which, if there is no missing data, comprise of observations made on ns lattice points and over nt time instances, that is, n = ns × nt .

286

Solar Irradiance and Photovoltaic Power Forecasting

kriging seeks to predict the value of the spatio-temporal process at some unknown spatio-temporal index, denoted as (ss0 ,t0 ). More specifically, in this model, the noisy data is written in terms of the latent spatio-temporal process of interest plus a measurement error: (7.59) Z(ssi ;ti ) = Y (ssi ;ti ) + ε(ssi ;ti ), where i = 1, . . . , n, and ε(ssi ;ti ) is a zero-mean measurement error term that is independent of the latent process. Additionally, the latent process is also assumed to be additive, with a mean structure μ and a zero-mean small-scale variation δ , cf. Eq. (7.53): Y (ss;t) = μ(ss;t) + δ (ss;t). (7.60) In vector notation, Eqs. (7.59) and (7.60) become: Z = Y +ε, Y = μ +δ,

(7.61) (7.62)

 

 

where Y = Y (ss1 ;t1 ), . . . ,Y (ssn ;tn ) , ε = ε(ss1 ;t1 ), . . . , ε(ssn ;tn ) , et cetera. Depending on the information available to the modeler, the mean structure may be assumed to be: (1) known, (2) constant but unknown, or (3) modeled in terms of (functions of) known covariates, i.e., μ(ss;t) = f (ss;t) β , where f (·; ·) is a vector of functions that maps the covariate space to R p , and β ≡ (β1 , . . . , β p ) is the vector of coefficients. To give perspective, if latitude, longitude, and temperature are taken as covariates of the irradiance process, the covariate space is three-dimensional. Furthermore, suppose irradiance varies as a function of geographical coordinates, their product, temperature, and its square, then: ⎞ ⎛ latitude(ss) ⎟ ⎜ longitude(ss) ⎟ ⎜ 5 ⎜ (7.63) f (ss;t) = ⎜latitude(ss) × longitude(ss)⎟ ⎟, ∈ R . ⎠ ⎝ temperature(ss;t) temperature(ss;t)2 The coefficients β weigh the contributions of the (functions of) covariates as to describing the mean structure. The reader is referred to Yang and Gueymard (2019) for a solar engineering example of this type of modeling of the mean structure. It is worth mentioning that the above three different assumptions about the mean structure correspond to what is usually referred to as (1) simple, (2) ordinary, and (3) universal kriging, respectively. The expression for the conditional distribution of Y (ss0 ;t0 ) given all observations is well known (see Wikle et al., 2019; Cressie and Huang, 1999):   −1 −1 Z ∼ N f (ss0 ;t0 ) β + c

Z − F β ), c0,0 − c

(7.64) Y (ss0 ;t0 )|Z 0 C Z (Z 0 C Z c0 , 

   where F = f (ss1 ;t1 ), . . . , f (ssn;tn ) ∈ Rn×p , c 0 = Cov Y (ss0 ;t0 ), Z ∈ Rn×1 , C Z = Z ) ∈ Rn×n , and c0,0 = V Y (ss0 ;t0 ) . If β is known, for instance, obtained via Cov(Z

Base Methods for Solar Forecast Generation

287

fitting, then the simple kriging predictor:   −1 Z = f (ss0 ;t0 ) β + c

Z −Fβ ) Y (ss0 ;t0 ) = E Y (ss0 ;t0 )|Z 0 C Z (Z

(7.65)

provides the mean, whereas the variance is:   −1 Z = c0,0 − c

σY2 (ss0 ;t0 ) = V Y (ss0 ;t0 )|Z 0 C Z c0.

(7.66)

The square root of σY2 (ss0 ;t0 ) gives the spatio-temporal simple kriging standard error. If β is unknown, it needs to be estimated via generalized least squares. In this case, the universal kriging predictor is given by:   −1 Z − F β , (7.67) Y (ss0 ;t0 ) = f (ss0 ;t0 ) β + c

0 CZ where

 −1 β = F C −1 F C −1 Z F Z Z.

(7.68)

The associated universal kriging variance is: −1 σY2 (ss0 ;t0 ) = c0,0 − c

0 C Z c 0 + K,

(7.69)

where

−1     

−1

−1 s F K = f (ss0 ;t0 ) − F C −1 c C F ;t ) − F C c f (s , 0 0 0 0 Z Z Z

(7.70)

which denotes the additional uncertainty introduced to the model due to the estimation of β . Inspecting Eqs. (7.65) through (7.70), the only unattended terms thus far are c 0 and C Z . Indeed, these covariances are acquired through the aforementioned spatiotemporal covariance functions. Selecting and parameterizing covariance functions present as the most difficult problems in spatial statistics, and there is a rich literature providing admissible classes of covariance functions (e.g., Stein, 2005; Ma, 2003; Gneiting, 2002; Cressie and Huang, 1999). However, regardless of the statistical properties of these covariance functions—such as separability or full symmetry— whether the final choice is able to represent reality is often not possible to justify. Besides, another major challenge in using kriging resides in the dimensionality involved. The covariance matrix C Z is n × n, and when the space–time coordinates are numerous, such as in the case of remote-sensing data, inverting the matrix becomes an issue. Various techniques and tricks have been proposed to address the computational issues during kriging (e.g., Genton, 2007; Furrer et al., 2006), among which the fixed-rank approach seems particularly useful (Cressie et al., 2010; Cressie and Johannesson, 2008). In short, the so-called fixed-rank kriging (FRK) is able to obtain the inverse of the size-n × n covariance matrix C Z through inverting a size-r × r matrix with r  n, leveraging some adequate basis functions.15 FRK routines are available in R (Zammit-Mangion, 2020). 15 The take is conceptually similar to Fourier or wavelet transform, where the process of concern is represented by linear combinations of basis functions. What is more tedious is that the basis functions need to work in concert within the kriging framework, which results in lengthy statistical derivations and expressions that are quite intimidating to the unfamiliar.

288

Solar Irradiance and Photovoltaic Power Forecasting

In solar forecasting, kriging has already been utilized to model and forecast irradiance collected by sensor networks of difference scales (Jamaly and Kleissl, 2017; Aryaputera et al., 2015; Yang et al., 2014a, 2013a). Since the numbers of spatial locations are typically fewer than tens of stations, dimensionality is not really an issue. Notwithstanding, owing to the stationarity assumption of simple kriging, some modeling tricks are needed, for the irradiance field is often not stationary but possesses certain directional properties due to the cloud motion induced by wind. On this point, Yang et al. (2013a) performed space warping using the Sampson–Guttorp method (Sampson and Guttorp, 1992), to arrive at a stationary field; Aryaputera et al. (2015) assumed the cross-wind correlation function to be fully symmetric and adjusted the along-wind correlations using polynomial functions; and Jamaly and Kleissl (2017) proposed to use the Lagrangian covariance function developed by Schlather (2010), which accounts for the station separation distance in a moving frame according to cloud speed. As of the time of writing, all spatio-temporal kriging applications in the field of solar forecasting have been restricted to sensor-network data, and the literature has yet to see kriging applied to satellite-derived irradiance, possibly due to inexperience with FRK or other computational methods that can handle highdimensional data. 7.3.2.2

Dynamical spatio-temporal models

The descriptive spatio-temporal models discussed above seek to characterize the spatio-temporal process of concern in terms of its mean function and covariance function. This modeling approach narrates the underlying probability structure of the process by defining and estimating the covariance (or equivalently, correlation) between two values at any pair of space–time indexes. Although the descriptive models are able to provide a good fit to the observational data, and have the ability to mathematically describe the spatio-temporal variability present in data, they may be difficult to interpret physically. In contrast, the dynamic approach of spatio-temporal modeling seeks explicit representation of the evolution of the process, and thus can be easily interpreted. To start with a toy example, consider a temporal-only autoregressive (AR) process, which explicitly writes the current value of the time series as a linear combination of its p most-recent past values: p

Yt = ∑ ϕiYt−i + εt .

(7.71)

i=1

Dynamically, the AR(p) model says that the current value is equal to the product of some “propagated factors” with the p past values, plus an innovation error. This mechanistic approach of modeling allows easy comprehension and simulation. Formally, dynamical spatio-temporal models (DSTMs) view the data as a time series of spatial processes, and the AR kind of models can be extended to a spatio-temporal context. Because the time is almost always discrete, it is customary to write the time series of a latent spatial process as {Yt (ss) : s ∈ Ds ⊂ Rd ,t = 1, 2, . . . }, or Yt (·) for short, which is the subject of concern of DSTMs. Recall the additive measurement error

Base Methods for Solar Forecast Generation

289

and process model given by Eqs. (7.59) and (7.60), when this two-level modeling approach, first of data and then of process, is generalized to a DSTM context, the data model is written as:   (7.72) Zt (·) = Ht Yt (·), θ d,t , εt (·) , where Zt (·) denotes the noisy measurements at time t; θ d,t is the vector of datamodel parameters, which is allowed to vary in time; and εt (·) is the data-model error for time t. The data model in Eq. (7.72) accounts for the measurement error, and function Ht connects the measurements Zt (·) to the latent process Yt (·). As for the process model, which accounts for the evolution of Yt (·), it may be written as:   (7.73) Yt (·) = M Yt−1 (·), · · · ,Y0 (·), θ p,t , · · · , θ p,1 , δt (·), · · · , δ1 (·) , where the current state of the process depends upon all of its past states, the timevarying process-model parameters that control the process evolution, and a series of process-model errors that are independent of each other. Clearly, the level of complexity suggested by Eq. (7.73) is not at all amenable to inference and modeling, because the entire model needs to be revised each time some new information becomes available. Therefore, a fundamental assumption in DSTM is the Markov assumption, which states that only the recent past is responsible for explaining the present (Wikle et al., 2019; Cressie and Wikle, 2015). As such, under the first-order Markov assumption, Eq. (7.73) reduces to:   (7.74) Yt (·) = M Yt−1 (·), θ p,t , δt (·) . In general, the process model may be linear or nonlinear, and the latent process may or may not be Gaussian, although the former cases (i.e., linear and Gaussian) are usually assumed. Before we proceed to give specific examples of the process model, the seemingly redundant data model as shown in Eq. (7.72), in contrast to the easily comprehensible Eq. (7.59), should be clarified. First, it should be noted that there is a possibility that the set of measurement locations and the set of spatial indexes at which the process is to be inferred do not overlap. As such, it necessitates a mapping matrix to relate the latent process to the observations. Second, if the measurements contain systematic bias, or if one wishes to model Yt (·) as a zero-mean process, the bias/offset has to be specified, which could be as simple as a constant term, or a combination of (functions of) covariates. In view of these, defining the n-dimensional process  

vector Y t ≡ Yt (ss1 ), . . . ,Yt (ssn ) and the m-dimensional measurement vector Z t ≡  

Zt (rr 1 ), . . . ,Yt (rr m ) , a linear data model takes the form: (7.75) Z t = bt + H t Y t + ε t , where bt ∈ Rm×1 and H t ∈ Rm×n are the additive offset term and the mapping matrix, respectively. In other circumstances beyond the linear data model, nonlinearity could be introduced as the modeler sees fit. For instance, one possible treatment is to accommodate a transformation on the process vector, i.e., Z t = bt + H t Y tλ + ε t , where −∞ < λ < ∞ performs a power transformation on the elements of Y t .

(7.76)

290

Solar Irradiance and Photovoltaic Power Forecasting

Moving on to the process model, its simplest form assumes a linear transition between the past and current states.16 Suppose inference is to be conducted over a finite set of prediction spatial locations, Ds = {ss1 , . . . , s n }, then under the first-order Markov assumption, the linear process model is given by: Yt (ssi ) =

n

∑ mi jYt−1 (ss j ) + δt (ssi ),

(7.77)

j=1

where mi j ’s with i, j = 1, . . . , n are the transition weights. This model indicates that the current process value at a location of interest is a linear combination of the immediate-past process values over Ds . In matrix notation, the model is: Y t = M Y t−1 + δ t ,

(7.78)

 

where the transition matrix M ∈ Rn×n has elements mi j , δ t = δt (ss1 ), . . . , δt (ssn ) is usually zero-mean and Gaussian with covariance C δ . Evidently, Eq. (7.78) is a vector autoregressive model of order one, or VAR(1). The total number of parameters in Eq. (7.78) is n2 + n(n + 1)/2, where n2 corresponds to the n-by-n entries of M and n(n + 1)/2 denotes the number of diagonal and half of off-diagonal entries (with the other half being symmetrical) in the error covariance matrix C δ . Unlike the data model, in which the mapping matrix H is usually known, the high-dimension parameters in the process model call for attention. There are two distinct approaches to reduce the dimensionality in the process model. One of those is to reduce the parameter dimension, and the other is to reduce the state dimension. On the one hand, parameter dimension reduction seeks representation of M and C δ using much fewer latent parameters. For instance, entries of M may be replaced by some transition kernels with parameter θ p , whereas C δ can be modeled using an appropriate spatial covariance function with parameter θ δ . On the other hand, state dimension reduction seeks some low-rank representation of the process itself. For example, Y t = F t β + Φα t + ν t ,

(7.79)

where F t β describes the effects of large-scale non-dynamical covariates (cf. the mean structure of universal kriging), ν t is an error process that could be confounding with δ t , and Φ ∈ Rn×nα is a matrix of basis vectors corresponding to the latent dynamic coefficient process α t , with α t = M α α t−1 + δ t .

(7.80)

Since the number of basis functions used to define Φ is typically small, with nα  n, it is much easier to estimate M α ∈ Rnα ×nα than the original M . Regardless of which variant of the above-mentioned data and process models is selected, the corresponding parameter estimation methods, such as the method of moments or the 16 Nonlinear process models exist, but their discussion is herein omitted to save space. Interested readers may visit Section 5.4 of Wikle et al. (2019) for some examples.

Base Methods for Solar Forecast Generation

291

expectation–maximization algorithm, are already mature and available in R (see Wikle et al., 2019), thus do not constitute a major problem. The dynamic approach to spatio-temporal modeling and forecasting is evidently more popular than the descriptive approach in solar forecasting, for works that involve AR-type of models with spatio-temporal exogenous inputs are quite numerous (e.g., Amaro e Silva and Brito, 2018; Bessa et al., 2015b; Dambreville et al., 2014; Yang et al., 2014a). This may be attributed, in large part, to the interpretable nature of the dynamic approach. Notwithstanding, few if not no existing works properly respect the rigorous statistical procedure of constructing DSTMs. In particular, the departure (or rather gap) of the existing spatio-temporal statistical solar forecasting literature from the formal DSTM modeling in the statistics literature is most apparent in two aspects: (1) the lack of uncertainty quantification, and (2) instead of modeling the process as a whole, the formulation restricts prediction to a single focal location and is thus reduced to a multiple regression formulation. On the first aspect, almost all spatio-temporal statistical solar forecasting works failed to consider the potential spatial correlation in the small-scale-variation error term. In other words, the δ t and/or ν t terms in the above DSTM formulation are deliberately omitted during model building, but are replaced with a homogeneous error. Additionally, most works only produce deterministic forecasts. On the second aspect, one can see from Eq. (7.74) that DSTM focuses on the evolution of the entire process, whereas spatiotemporal solar forecasting models focus just on one location s 0 . What this implies is that the existing spatio-temporal solar forecasting models are not efficient, because if n locations are to be forecast, n models have to be built, in clear contrast to DSTM, which requires just one model for all locations. In what follows, we should gather those spatio-temporal forecasting models that do not employ the componentof-variation viewpoint under the head of “non-geostatistical models.” 7.3.2.3

Non-geostatistical models

In a nutshell, non-geostatistical models are no different from multiple regressions, in that, the predictand is the variable of interest at the focal location at a future time, whereas the predictors are the lagged versions of variables of interest within a certain geographical proximity to the focal location. The variable of interest could be irradiance, clear-sky index, or normalized PV power, depending on what data is available. Benavides Cesar et al. (2022) conducted a survey of non-geostatistical spatio-temporal solar forecasting models. However, as the survey also contains broad ranges of geostatistical models as well as physics-based models, the review organization presented therein is somewhat artificial, for it bins the literature by subjects (i.e., statistics, machine learning, and physics) rather than by modeling philosophy (i.e., descriptive versus dynamic), which necessarily results in substantial overlaps. At any rate, some examples of non-geostatistical modeling concepts are reviewed here. Foremost to be noted is that non-geostatistical models use regression to account for the correspondence between the variable value at the focal location and its spatiotemporal neighbors, a straightforward means to improve the predictive performance

292

Solar Irradiance and Photovoltaic Power Forecasting

is variable selection and shrinkage. In this regard, ridge regression and the least absolute shrinkage and selection operator (lasso) regression may be deemed as the two most basic methods. Typifying the applications of ridge and lasso regressions are the works by Yang et al. (2022e); Yang (2018d); Yang et al. (2015b), who used the data from the Oahu Solar Measurement Grid (OSMG), recall Section 6.4.3. Since the stations within the OSMG network are densely populated (17 stations within 1 km2 ) and sampling at very high frequency (1 s), the effective selection of spatio-temporal neighbors, to be used as predictors for the irradiance at the focal location, constitutes an important concern. The prevailing wind direction has been shown useful for selection, as the irradiance measurements at upwind locations are known a priori more important than those at downwind locations (Yang et al., 2015b). Nonetheless, owing to the sheer number of predictors from just those upwind stations, lasso-type regressions can still be limited by the curse of dimensionality and the insufficient degree of freedom. As such, a pre-selection would have to be conducted, e.g., through comparing the similarities between the potential predictors and the target; this preselection followed by lasso regression idea has led to better forecasts than the version of lasso without pre-selection (Yang, 2018d). Lastly, the conventional lasso is based upon least squares regression, and by replacing the squared error with pinball loss, the model becomes a penalized quantile regression, which is able to extend the deterministic style of prediction to the probabilistic space (Yang et al., 2022e). Indeed, the ability to capture the cloud (or cloud shadow) motion has a decisive effect on the forecasting performance of sensor-network-based approaches. On this point, innovative designs of sensor networks are thought beneficial to building nongeostatistical spatio-temporal models. One interesting design has been depicted by Chen et al. (2019a). The target to be forecast is the power from a 5-MW PV plant covering an area of 0.15 km2 . In order to sense the omni-directional cloud movement that may affect the plant’s output, a total of 32 irradiance sensors were laid out along two 60-m-apart concentric circles centering at the plant, with 16 sensors distributed evenly on each circle. This particular design allows two kinds of predictor selection. First, it allows one to compute the correlation between each sensor along the circles with the sensor at the center of the plant, which is similar to the use case of OSMG of Yang (2018d). Second, by correlating the exterior–interior sensor pairs, this network allows one to capture the ramp events. More specifically, if Taylor’s hypothesis of frozen cloud advection were to hold, then the clear-sky index time series at the center of the PV plant could be predicted exactly from the time-shifted signals at the particular station pair that has detected the ramp event. Generally speaking, a vast majority of non-geostatistical spatio-temporal solar forecasting models are essentially investigating the role of lagged exogenous variables, in both space and time, in improving the accuracy of solar forecasting. Works that employ this philosophy are quite ample in the literature (e.g., Agoua et al., 2019; Lorenzo et al., 2015; Zagouras et al., 2015; Lonij et al., 2013). However, one thing common to these works is that such spatial dependence among sites is often observed at sites with several ground stations in the immediate vicinity. The dynamics of the atmosphere, however, cause cloud cover to change and therefore the maximum cross-correlation decreases with time and distance. For example, Lipperheide et al.

Base Methods for Solar Forecast Generation

293

(2015) forecast by “advecting” the spatial distribution of power output at the perimeter of a PV plant over an area of 539 m × 1807 m. But despite significant efforts to deduce the underlying physics by deriving cloud motion vectors, the forecast skill over a persistence forecast peaked at only 18% at 10-s-ahead forecasts. Similarly, Amaro e Silva and Brito (2018) leveraged spatio-temporal data for forecasting using linear regression over a scale of 1 km at Hawaii and 100 km at East Midlands, UK. Forecast skills peaked at around 30% for Hawaii for 1 min ahead forecasts, but were only 10% to 15% for the East Midlands. To that end, there is significant potential for furthering spatio-temporal forecast research using high-resolution high-updaterate satellite data and virtual test bed based on large eddy simulation (Kurtz et al., 2017)—physics is still absolutely key to solar forecasting.

Solar 8 Post-processing Forecasts “Each problem that I solved became a rule, which served afterwards to solve other problems.” — Ren´e Descartes

The literal meaning of the word “post-processing” refers to the kind of processing that is being done after some initial result has been obtained. When we consider postprocessing in a forecasting context, it should rightfully include all sorts of computation and analysis that are being performed after the base forecasts are generated. It is on this account that things, such as converting irradiance forecasts to photovoltaic (PV) power forecasts or estimating the economic value of the forecasts, should no doubt be viewed as legitimate forms of post-processing. Nonetheless, because actions of these sorts make alterations to the forecast quantity—irradiance is changed to PV power or dollar value—they ought to be regarded as separate processes, rather than incremental ones, from the initial one. Therefore, in this chapter, we shall adopt a narrower definition of the word “post-processing” and refer it strictly to processing performed to further improve the utility of forecasts, after some initial forecasts have been generated while keeping the forecast quantity unchanged. Stated differently, global horizontal irradiance (GHI) forecasts after post-processing should still be GHI forecasts, and PV power forecasts after post-processing should still be PV power forecasts. Furthermore, the scope of post-processing to be discussed in this chapter is confined to what concerns single geographical entities. In other words, collective post-processing of all forecasts within a region using spatial statistics is not considered here, but in a later chapter. With such clarifications on the scope, it is now possible to come up with a typology of post-processing. So long as a typology is intended to be used to differentiate one class of methods from another, different forecasters may group methods based on distinct criteria. There are those who favor distinguishing post-processing methods based on whether or not, or what kind of, exogenous information is involved. There are also those who may find a straightforward division based on the univariate or multivariate nature of post-processing methods appropriate. All these choices are valid and may present themselves as logical in a review, yet, they fail to acknowledge the practical relevance of having a typology. “How a statistical post-processing procedure contrasts a machine learning one in its functional ability,” or “which part of the multivariate post-processing methodology overlaps with the univariate one,” is the kind of question that may render these typologies deficient. The most desirable form of typology should be one in which the constituents are mutually exclusive and collectively exhaustive. In this regard, we must delve deeper into the true motive of having forecasts post-processed, by asking what can we gain from post-processing. DOI: 10.1201/9781003203971-8

294

Post-processing Solar Forecasts

295

Utility gain by the means of post-processing can be concluded from at least two aspects. One of those is that post-processing is able to reduce, if not remove, the systematic departures from the objective truth (i.e., bias, under- or over-dispersion) embedded in the initial forecasts, to improve the forecast quality. If we are to devise a typology based on the different characteristics of forecast quality to which post-processing could contribute to improving, then, forecast post-processing can be categorized into, for example, those improving bias and those improving dispersion. Unfortunately, this strategy is unable to result in categories that are mutually exclusive, since many post-processing methods improve several characteristics of forecast quality concurrently. The other and arguably more important aspect of utility gain is that post-processing is able to convert forecast from the form which is preferred by the forecaster to one which is preferred by the forecast user. For instance, all weather forecasters must agree that ensemble—a form of probabilistic representation of forecast—is beneficial, but currently, power system operators require by default deterministic (or point) forecast. Under this circumstance, one needs to summarize an ensemble into a single value—a probabilistic-to-deterministic conversion. In another case, the forecast user may wish to quantify the uncertainty of a set of deterministic forecasts. Conversion from deterministic forecasts to probabilistic ones is then necessary. It is to these impulses of converting forecasts from one form to another that Yang and van der Meer (2021) responded, and thereby recommended a typology based on the direction of forecast conversion; post-processing methods can be classified into the following four types: • deterministic-to-deterministic (D2D) post-processing; • probabilistic-to-deterministic (P2D) post-processing; • deterministic-to-probabilistic (D2P) post-processing; • probabilistic-to-probabilistic (P2P) post-processing. This chapter, hence, structures its discussion through this typology. Owing to the cardinal position of post-processing in the field of solar forecasting, this chapter is quite bulky.

8.1

AN OVERVIEW OF CONVERSION BETWEEN DETERMINISTIC AND PROBABILISTIC FORECASTS

Although deterministic forecasts are referred to as “best-guess” forecasts by weather forecasters, owing to the limited accuracy of analysis and our incomplete understanding of the atmospheric processes, even the best deterministic forecasts may consist of systematic bias. Since the accuracy of the analysis is often not within the control of forecasters, such systematic bias is referred to as model-led bias. Statistical ways to remove model-led bias include regression and filtering, which constitute two major classes of D2D post-processing methods. Additional to those, forecast users are

296

Solar Irradiance and Photovoltaic Power Forecasting

frequently challenged by the need to change the temporal resolution of forecasts— e.g., many grid operators require 5- and 15-min forecasts for intra-day load following, whereas operational numerical weather prediction (NWP) forecasts are mostly hourly. This process is formally known as downscaling, which may be regarded as the third class of D2D post-processing. A probabilistic forecast can be expressed as a predictive distribution, a set of quantiles, a prediction interval, or an ensemble of member forecasts. The general aim of P2D post-processing is thus to summarize the probabilistic information into a single value, which can best represent the “centroid” of the forecast. This may seem, at first, a trivial task, since one can simply draw out the mean from the predictive distribution, or compute the arithmetic mean of ensemble members. Indeed, if the underlying predictive distribution is symmetrical (e.g., Gaussian), the strategy of drawing out the mean is known a priori appropriate. Similarly, if the ensemble members are equally likely, averaging is associated with no obvious deficiency, and in fact should be encouraged. Nevertheless, when the actual predictive distribution is asymmetrical, which implies that its mean and median do not coincide, the optimal choice of statistical functional (i.e., a point summary) of a predictive distribution becomes conditional. In the case of a poor man’s ensemble, if experience or strong evidence suggests some members are more accurate than others, one may assign heavier weights to the more accurate members. In any case, summarizing predictive distribution and combining forecasts are two classes of P2D post-processing methods. The idea of D2P post-processing might be counter-intuitive to some novice forecasters, because it involves “creating information out of nowhere”—generating predictive distribution from a point value. This is especially true when physical forecasting methods, such as those based on camera or satellite imagery, are used. Indeed, from just one deterministic forecast, it would be impossible to infer the uncertainty associated with it. But when there are sufficiently many past forecasts made, forecasters are able to extract uncertainty information from past experience. For example, analog ensemble searches for past weather patterns (i.e., past forecasts) that resemble the current one, and thus treats past observations that correspond to those matching weather patterns as probable alternative realizations of the current forecast. Similarly, by identifying the present weather condition, the method of dressing adds past errors under similar conditions onto the current forecast. Last but not least, quantile regressions of various kinds, which can be collected into the third class of D2P post-processing methods, are capable of linking forecast–observation pairs in a probabilistic fashion by involving some special loss function. One should note, however, that regression, as a statistical method, carries a notion of uncertainty by nature. In that, all regression techniques are essentially probabilistic. Nonetheless, to distinguish from D2D post-processing, (quantile) regressions applied in a D2P post-processing setting are herein termed probabilistic regression. Lastly, P2P post-processing is useful because: (1) the initial probabilistic forecasts may be over- or under-dispersed, that is, the coverage probability of the forecasts may be larger or smaller than the nominal one, which calls for calibration; and (2) there may be multiple probabilistic forecasts present, which need to be combined

Post-processing Solar Forecasts

297

into a final probabilistic forecast. Calibration can be done by converting an ensemble forecast to one with a parametric predictive distribution under some objective function, or by dressing more members or kernels onto an ensemble forecast, so as to create additional diversity in the ensemble. In probabilistic forecast combination, one can aggregate multiple predictive distributions, multiple sets of predictive quantiles, or multiple prediction intervals. Combining probabilistic forecasts is a fairly new subject area (Winkler et al., 2019), and there have not been many solar forecasting works that deal with such issues. However, as there is a growing need for better uncertainty quantification of solar forecasts, it is only a matter of time before P2P post-processing becomes mainstream. Overall, the four types of post-processing techniques, as well as the specific classes of methods associated with each type of technique, are summarized in Fig. 8.1. Post-processing

D2D

P2D 1. Regression 2. Filtering 3. Downscaling

D2P 4. Summarizing predictive distribution 5. Combining deterministic forecasts

P2P 6. Analog ensemble

9. Calibrating ensemble forecasts

7. Method of dressing 8. Probabilistic regression

10. Combining probabilistic forecasts

Figure 8.1 Typology of post-processing. The letters “D” and “P” refer to deterministic forecasts and probabilistic forecasts, respectively, indicating the direction of conversion.

8.2

NOTATIONAL CONVENTION

Before we proceed to review various post-processing methods, some notation issues are to be taken care of. In forecast post-processing research, the total data available to a forecaster must be split into a fitting sample and a verification sample, each containing numerous data points. Throughout the chapter, these two samples are indexed with i = 1, . . . , n and t = n + 1, . . . , n , respectively. If generic random variables representing forecast and observation are denoted with capital letters X and Y , particular realizations of these random variables, depending on whether they come from fitting or verification samples, are denoted with small letters xi , xt , yi , and yt . If the raw forecasts are multivariate, e.g., they contain irradiance, wind speed, tem( j) perature, et cetera, a superscript is used to differentiate each variable, i.e., xi or ( j) xt , where j = 1, . . . , m indexes the variables. If the raw forecasts are in the form of

298

Solar Irradiance and Photovoltaic Power Forecasting

size-m ensembles, their components can be accessed by subscripts xi j or xt j , where j = 1, . . . , m indexes the ensemble members, i.e., component forecasts. (Since multivariate ensemble has rarely been used in a solar forecasting context, the index j is occupied twice.) When a post-processed forecast is issued in a deterministic fashion, it is denoted using y t , in which the hat symbol differentiates it from the actual observation yt . It should be noted that the actual observation yt is but one realization of the random variable Yt . Besides i–n and j–m indexing pairs, we shall employ k–o as a third pair in situations where additional indexing is required. Greek letters η, ξ , and ζ are used to denote mathematical functions of various kinds. This is because we wish to reserve f and g strictly for denoting probability density functions (PDF). Similarly, capital F and G are reserved for denoting cumulative distribution functions (CDF). In combining distributional forecasts, the forecaster starts with m component predictive density (or distribution) functions, these are written as fi j or ft j (or Fi j or Ft j ), depending on whether the components come from fitting or verification samples, and the combined forecast, which is also a predictive density (or distribution) function, is denoted with gi or gt (or Gi or Gt ). It should be noted that the “true” predictive density (or distribution) function cannot be known, and thus the “hat” notation is saved for gt and Gt , throughout this chapter. In combining quantile forecasts, the τ th quantile is denoted as qτ , which, at times when combining takes place, can be further indexed as qτ,i j or qτ,t j . The quantile after combining is denoted as qτ,i or qτ,t . Again, because the “true” quantile is unknown, the “hat” notation is saved. Finally, in combining prediction intervals, which are described by a pair of upper and lower bounds written as u and l, their indexing also follows the same convention. Post-processing being a set of statistical procedures demands many other symbols to be described in a fully mathematical way, such as the mean and variance operators E and V, regression coefficients β0 , β1 , . . . , βm , or combining weights w1 , . . . , wm . These symbols, which are thought unlikely to cause confusion, are to be introduced as we move along.

8.3

D2D POST-PROCESSING

A sizable part of the current literature on solar forecast post-processing falls within this category. The underlying reason may be that early attempts to produce solar forecasts often rely solely on physics-based methods, such as those based on camera, satellite, or NWP, which are mostly deterministic. Though we have clearly demonstrated throughout the book why generating deterministic forecasts is no longer the state-of-the-art, but historically, deterministic forecasting has played a dominant part in forming the current perspectives about what should be done instead, and on this account, if on no other, it deserves to be borne in mind. When different aspects of quality of these deterministic forecasts become the subject under scrutiny, a natural expectation placed on post-processing is to improve those aspects of quality, for instance, reducing the bias or lowering the squared deviation. What this implies is that the post-processed forecasts, which must be characterized by the same set of aspects of quality, would also be deterministic.

Post-processing Solar Forecasts

8.3.1

299

REGRESSION

Regression presents itself as a conspicuous strategy for performing D2D postprocessing. In that, the forecaster would collect a set of past forecasts and a set of corresponding observations, establish a mapping between them, and use the established mapping to correct all forth-coming forecasts. We may state the working principle of regression as a post-processing tool in a more precise way: It establishes a correspondence between initial forecasts and observations, with predictand being the observation and predictor being the initial forecast; when new values of the predictors (or new raw forecasts) are fed into the regression model with fitted parameters, they are converted to post-processed forecasts. In the simplest case, one may consider a univariate linear regression model in the form of: Yi = β0 + β1 xi + εi , (8.1) where xi is the realization of the covariate X at time i, εi ∼ N (0, σ 2 ) is a Gaussian error with expectation zero and variance σ 2 that does not change with time (i.e., homoscedastic), and Yi ∼ N (β0 + β1 xi , σ 2 ) is the response random variable. With n pairs of raw forecasts x1 , . . . , xn and the corresponding observations y1 , . . . , yn , which are collectively known as the training sample, the regression parameters β0 and β1 can be estimated by minimizing the residual sum of squares: n

arg min ∑ (yi − β1 xi − β0 )2 .

(8.2)

β0 ,β1 ∈R i=1

Once the unknown coefficients are estimated and denoted using β 0 and β 1 , the forecaster gains the capability of post-processing any new xt : y t = β 0 + β 1 xt ,

(8.3)

in which y t denotes the post-processed forecast. One may subsequently explore variations of the above regression in several directions. First and foremost, it is possible to extend the univariate setting to a multivariate one by including more predictors, such as forecasts of other weather variables or some transformed versions of the forecasts. This procedure may be exemplified by the earlier work of Lorenz et al. (2009), who applied an approach called model output statistics (MOS) as first termed by Glahn and Lowry (1972) to a solar forecasting context. In that work, a fourth-degree polynomial regression was used to post-process NWP-based solar forecast: m

y t = β 0 + ∑ β j xt , ( j)

(8.4)

j=1

  ( j) where xt ∈ cos4 Zt , . . . , cos Zt , ϕt4 , . . . , ϕt are the raw forecasts, with Z being the forecast zenith angle and ϕ being the forecast clear-sky index; yt in this case denotes the model-led bias; and β 0 , . . . , β m are model coefficients fitted via least squares.

300

Solar Irradiance and Photovoltaic Power Forecasting

In this particular case, there are eight predictors (indexed by j), which include the transformed versions of both GHI and zenith angle forecasts. It is also worth noting that the predictand of this regression is not GHI, but rather the bias of GHI forecasts, although the results might not be drastically different if GHI is used as the predictand directly, since the involvement of zenith angle is able to reduce zenith-dependence to some extent. Another very popular upgrade of the linear-regression-based D2D postprocessing is nonlinear regression, in which neural networks of various kinds and structures have been widely used by solar forecasters as post-processing tools. For instance, in terms of correcting deterministic NWP forecasts, Pereira et al. (2019) considered an artificial neural network (ANN), which takes several forecast variables from the European Centre for Medium-Range Weather Forecasts’ (ECMWF’s) Integrated Forecast System as input,1 and the bias in GHI as output. To that end, the trained ANN model acts as a regression equation, estimating the bias in the GHI forecast based on the ECMWF forecast variables at the time stamp that requires post-processing. It turned out that if one skims through the literature, contributions of this nature—post-processing NWP with ANN—can be found in abundance (e.g., Theocharides et al., 2020; Lauret et al., 2016; Lima et al., 2016), and the adopted approaches do not differ very much from one work to another. When such a primitive combination is no longer able to provide the amount of scientific novelty required for publication, researchers could always turn to other combinations, such as NWP plus stepwise regression, as done in Verbois et al. (2018) and Verzijlbergh et al. (2015), or any other method that is capable of performing regression, including deep learning (e.g., Kumari and Toshniwal, 2021), which has hitherto been a very welcoming topic by energy journals. That said, many of these post-processing models tend to ignore the salient features of solar irradiance and meteorology facts during training, by dumping raw NWP variables into machine-learning algorithms without performing exploratory analysis or variable transformation. One therefore has ground to believe that when more domain knowledge is integrated into the models, the performance could be boosted further. The last variant of the linear-regression-based D2D post-processing which we should wish to discuss is the regime-switching model. The performance of NWP models, or solar forecasting models in general, is often season-, weather-, and skycondition-dependent, which can be treated as categorical conditional variables. In this regard, it is logically attractive to associate a different set of post-processing model parameters with each condition. For example, Mejia et al. (2018) argued that summertime NWP post-processing should not be mixed with the rest of the year, since summertime cloud variability, over the United States Southwest (Nevada, Arizona, and New Mexico), is strongly related to and thus impacted by the North American monsoon, which is usually accompanied by a synoptic-scale wet spell called “moisture surge,” which, in itself, can be observed or predicted to a high degree of certainty. Meteorological knowledge of this sort is surely more amenable than what 1 These variables include clearness index, solar zenith angle, mean air temperature, relative air humidity and total column water (i.e., vapor, cloud water, and cloud ice).

Post-processing Solar Forecasts

301

“black-box” machine-learning models are able to offer, but a head-to-head comparison of these two schools of thought has hitherto been circumvented by the believers of either side. In any case, aside from regime-switching models, another possible treatment of the time-varying cloud and solar variability is to employ a rollingwindow setup, which refits the model coefficients every certain period of time. This results in the next class of methods in D2D post-processing. 8.3.2

FILTERING

The regression approach is, by nature, a batch post-processing strategy. In other words, once the coefficients of a post-processing model are determined, the model can be used for post-processing, as long as the new predictors (i.e., raw forecasts) are available. In clear contrast to regression, another major strategy of D2D postprocessing is filtering, which is sequential in nature. By “sequential” we mean the filtering procedure post-processes forecasts in an iterative fashion as observations become progressively available. The most iconic filtering procedure ought to be Kalman filtering, which has a wide range of engineering applications. In short, Kalman filtering is a recursive procedure that estimates the state of a dynamical system from a series of noisy measurements. Notwithstanding, this type of definition is unfriendly to forecast practitioners, who are primarily interested in the application itself rather than the detailed mathematical properties of the Kalman filter, which deserve a book chapter of their own. On this point, this section should restrict its discussion to a basic application of Kalman filtering as a D2D post-processing tool, whereas the reader is referred to Box et al. (2015) for an in-depth statistical viewpoint, and to Petris et al. (2009) for a programming guide, on Kalman filtering. Central to Kalman filtering are two equations, one called the measurement equation, which describes how noisy observations are related to the unknown state of the system, and the other state equation, which describes how the state evolves in time. The main idea behind the two-equation description of dynamical processes is that the underlying states are unobservable, and thus need to be estimated. In solar forecast post-processing, the forecast variable of concern is often a scalar quantity, e.g., GHI. Hence, only measurement equation in a univariate setting is required, which is given by: (8.5) Yt = xt β t + εt , where Yt is the forecast random variable at time t, β t is a time-dependent vector with the state variables as elements, xt is the measurement matrix consisting just one row of elements in this case, representing the noiseless connection between state and observation, and εt ∼ N (0, σ 2 ) is a homoscedastic Gaussian noise.2 As for the state equation in solar forecast post-processing, the following reduced form of state 2 It is thought useful to clarify things a bit at this stage. In the usual textbook notation (e.g., Makridakis et al., 2008), the state variables are denoted using X t and the measurement matrix is denoted using H . However, in the present case, writing state variables as X t confuses those with forecasts, which is why β t is used instead.

302

Solar Irradiance and Photovoltaic Power Forecasting

evolution is usually sufficient: β t = β t−Δt + η t ,

(8.6)

which suggests that the current state β t depends only upon the previous state β t−Δt and an error term η t with convariance matrix Qt . Here, Δt marks the step size of filtering, which is not necessarily 1. Comparing Eq. (8.5) to the earlier Eq. (8.1), one can notice that the measurement matrix takes the same role in Kalman filtering as predictors in regression. In other words, in solar forecast post-processing, the measurement matrix is composed of (transformed) forecast variables. For instance, Diagne et al. (2014) opted for x t = (1, ϕt , cos Zt ) , whereas Pelland et al. (2013) chose only a normalized version of forecast GHI3 and an intercept as input. This is why the notation in Eq. (8.5) has been modified. On the other hand, as is the case of all solar forecasting works, the clear-sky index ought to be involved during model construction, which implies that the forecast variable Yt is the forecast bias in the clear-sky index. Let κt and ϕt denote the observed and forecast clear-sky index at time t, and ct denote the clear-sky expectation of the variable of interest at time t, which could be GHI, beam normal irradiance (BNI), or PV power, one can write: yt = ϕt − κt =

biast . ct

(8.7)

In other words, the “measurement” in the Kalman filtering post-processing framework is the observed difference between the forecast and actual clear-sky index, which is why yt in Eq. (8.7) is used without the “hat” symbol. Clearly, if the post-processing model as described in Eq. (8.5) is to be of use, one must first estimate β t . Since the unknown state β t changes with time, as indicated by the subscript t, its estimation procedure is recursive. First, it is useful to introduce two types of estimates, one a priori and the other a posteriori. We write β t|t−Δt to denote the a priori estimate of the state at time t using information up to t − Δt, and β t|t as the a posteriori estimate of the state at time t after the observation for time t becomes available. To account for the uncertainty in the state estimates, the covariance matrix is the key notion, in that, Pt|t−Δt and Pt|t denote the a priori and a posteriori estimates of the covariance matrix, respectively. With these preliminaries, the Kalman filtering procedure iterates between the prediction step and the update step through the following equations: β t|t−Δt = β t−Δt|t−Δt , Pt|t−Δt = Pt−Δt|t−Δt + Qt ,  −1 K t = Pt|t−Δt xt xt Pt|t−Δt xt + σ 2 ,

(8.8) (8.9) (8.10)

3 Pelland et al. (2013) used GHI divided by 1000 W/m2 , which is quite an awkward choice now that we think of it, but we shall not deviate too far from the main purpose.

Post-processing Solar Forecasts

303





β t|t = β t|t−Δt + K t yt − xt β t|t−Δt ,

(8.11)

Pt|t−Δt . Pt|t = (II − K t xt )P

(8.12)

The mathematical derivation of Eqs. (8.8)–(8.12) is well known and thus not reiterated. However, it is still useful to exemplify the iterative process. For a set of given initial values β 0|0 and P 0|0 , one is able to attain the a priori estimates through Eqs. (8.8) and (8.9)—if Δt = 1, then these estimates are β and P . Subsequently, 1|0

1|0

when the new observation x 1 becomes available, the a posteriori estimates β 1|1 and P 1|1 can be computed via Eqs. (8.10)–(8.12). And the process repeats and estimates β and P , so on and so forth. In each iteration, the filtered yt (or the 2|1

2|1

post-processed forecast bias) is given by: y t = xt β t|t−Δt ,

(8.13)

which can be subsequently back-transformed into the post-processed variable of interest, that is, GHI, BNI, or PV power, depending on the case. From the subscripts, it is also clear that the post-processed forecast at time t is available at time t − Δt, which is ex ante. This procedure is summarized in Fig. 8.2, and the initial values and parameter settings as used by Zhang et al. (2022) are also listed. Initial State β 0|0 = (0, 0, 0)

P 0|0 = diag(1) Qt = diag(0.05) σ = 0.1

Prediction β t|t−Δt = β t−Δt|t−Δt Pt|t−Δt = Pt−Δt|t−Δt + Qt Update  −1 K t = Pt|t−Δt xt xt Pt|t−Δt xt + σ 2   β t|t = β t|t−Δt + K t yt − xt β t|t−Δt

t − Δt ← t

Pt|t−Δt Pt|t = (II − K t xt )P

Figure 8.2 A typical recursive procedure of Kalman filtering used in solar forecasting. A future time to be filtered is stamped t. The “prediction” step makes estimations on this future state β t|t−Δt and future error covariance Pt|t−Δt , based on the current information. When time moves forward—i.e., the previous-round “future” time (or t) becomes the “current” time (or t − Δt, which lags the “new future” time by Δt)—the “update” stage updates the Kalman gain K t , states β t|t and error covariance Pt|t . Symbol diag(v) denotes a square diagonal matrix with v on the diagonal. Since β 0|0 has a length of 3 in this case, the size of P 0|0 and Qt is 3 × 3. Careful readers may have already noticed the limitation of Kalman filtering as a solar forecast post-processing tool, that is, it is unable to perform multi-step-ahead filtering. More specifically, suppose the step size Δt = 2, with the initial values β 0|0

304

Solar Irradiance and Photovoltaic Power Forecasting

and P 0|0 , the prediction equations would lead straight to β 2|0 and P 2|0 , skipping the estimates at t = 1. The ability to perform multi-step-ahead filtering is key to operational solar forecasting (see Section 5.6), in which multiple forecasts spanning the entire operating period must be post-processed at one go. Unfortunately, this issue has been swept away by Occam’s broom (recall Section 2.6) in the early works on solar forecast post-processing using Kalman filtering. It was not until the work of Yang et al. (2017c) that this issue was formally revealed to solar forecasters. Recently, Yang (2019c); Zhang et al. (2022) revisited the issue of multi-step-ahead Kalman filtering, and proposed a workaround by constructing two filters, one for odd time stamps and one for even time stamps. More generally, for h-step-ahead Kalman filtering, h filters need to be constructed. However, this approach breaks one dynamical process into two or more processes, which is thought to be unnatural, and thus calls for more research (pers. comm. Pierre Pinson, Technical University of Denmark, 2021). Although Kalman filtering can be easily regarded as the most representative method for filtering-based post-processing, there are other methods available. For instance, median filter, a method widely used in digital image processing for its ability to preserve sharp edges, has also been considered in solar forecast post-processing. Stated simply, the median filter scans a signal entry by entry, replacing each entry with the median of neighboring entries, i.e., entries within a sliding window. Blanc et al. (2017a) generated 1-min-ahead BNI forecasts over a horizon of 15 min, using a two-camera system. However, it was found that the forecast time series often contains large spikes, which are very likely to induce high errors, especially when the forecast spikes are misaligned with the actual ramps. A median filter with a window size of 5 min was applied and was found to be able to improve the correlation between forecast and observation from 0.65 to 0.76. In conclusion, such filters could effectively stabilize the variance of forecasts. 8.3.3

DOWNSCALING

Downscaling refers to the course of converting a large-scale low-resolution process, which could be temporal, spatial, or spatio-temporal, to one that is on a smaller scale and a higher resolution. Downscaling is required in situations where the original forecast resolution is lower than that mandated by the forecast user, e.g., the State Grid Corporation of China (SGCC) requires day-ahead solar forecast to be submitted at a 15-min resolution (Yang et al., 2021a), whereas the NWP forecast is often hourly. Processes on small scales should by default contain more features, which are not directly observable at large scale, but influence the large-scale processes in an aggregated or cumulative way. Take irradiance on a partly cloudy day for example, the variability of hourly GHI is much lower than that of 1-min GHI, since the latter contains far more high-frequency ramps (i.e., features) than the former. Furthermore, the overall integrated value of hourly GHI would not be as high as it could be had that day been a clear-sky day, which is the cumulative effect of cloudy instances over that day.

Post-processing Solar Forecasts

305

It follows that the main challenge of performing downscaling resides in deriving those small-scale features from large-scale processes. Of course, as one can infer from Sections 6.2.2 and 6.6.2.1, downscaling can be performed using NWP itself, of which the process is known as dynamical downscaling. In dynamical downscaling, forecasts from global or mesoscale models such as ECMWF or North American Mesoscale (NAM), are used as initial and boundary conditions for high-resolution regional models, such as Weather Research and Forecasting (WRF) or Application of Research to Operations at Mesoscale (AROME) model. However, the type of downscaling concerning the current chapter is of a statistical nature, i.e., without NWP. In that, one is tasked to reconstruct the high-frequency time series from a low-frequency one. This task, in mathematics, is known as an inverse problem, in which one starts with the effects and then calculates the causes—e.g., guessing the object from its shadow. The goodness of the downscaled forecasts can be evaluated, in the main, on three heads, each containing a set of evaluation criteria that is more stringent, and thus more difficult to reach, than the previous one. The first and easiest of these sets of criteria is concerned with the consistency in statistical properties, the second with the consistency in aggregation, and the third with consistency in temporal and/or spatial order. We shall explain these three forms of consistency, as well as the existing ways of downscaling, in the following paragraphs. Consistency in statistical properties refers to whether or not the downscaled forecasts are statistically indifferentiable from the actual observations. Statistical properties of two time series may be gauged using a series of measures, of which the values do not depend on a uniquely ordered set of data points,4 such as mean, variance, stability (variance of block means), lumpiness (variance of block variances), autocorrelation, or transition probability. Stated differently, if the downscaled forecasts are able to resemble the high-resolution measurements in terms of these statistical properties, then the downscaled forecasts are said to be good. One strategy that can be easily thought of to achieve this is to derive the transition probability matrix, which describes the probability of irradiance or clear-sky index evolving from one state to another, from a historical time series of high-resolution irradiance measurements. Subsequently, when initial values, that is, the original forecasts, become available, one can generate high-resolution forecasts according to the derived law governing the transition; the downscaled forecasts in this case are characterized by a Markov chain, which has hitherto been a popular choice for synthetic generation of solar irradiance (e.g., Bright et al., 2015; Shepero et al., 2019). On this subject, Bright (2021) has edited a book on synthetic solar irradiance generation, which includes almost all proposals that have been made thus far on this topic. Consistency in aggregation consists of the need for the downscaled highresolution forecasts to be averaged into the original low-resolution forecasts. The literature has shown two approaches for aggregate-consistent downscaling, one direct and the other indirect. The direct approach exploits the basic properties of the 4 For example, the mean of x , x , and x is the same as that of x , x , and x , where the order does not 1 2 3 2 3 1 matter.

306

Solar Irradiance and Photovoltaic Power Forecasting

Dirichlet distribution. The PDF of the Dirichlet distribution is: 0 / o

∑ αk

Γ f (y1 , . . . , yo ; α1 , . . . , αo ) =

k=1 o

∏ Γ(αk )

o

α −1

∏ yk k

,

(8.14)

k=1

k=1

where y1 , . . . , yo are realizations of random variables Y1 , . . . ,Yo , which belong to the standard (o − 1)-simplex, in that, ∑ok=1 yk = 1 and yk ≥ 0, ∀k ∈ {1, . . . , o}; α1 , . . . , αo are concentration parameters, controlling the uniformity of samples on the simplex. With this property, for any given irradiance value, the downscaled irradiance is a sample point from the (o − 1)-simplex, which would satisfy aggregation consistency by default (Frimane et al., 2020). The key for this direct approach is therefore how to design the concentration parameter, such that the downscaled values would comply with the physical laws of the irradiance process; the design philosophy is rather intricate, and the reader is referred to Frimane et al. (2020) for details. As compared to the direct approach, the indirect approach is much simpler. In that, one can generate the high-resolution forecasts using any sensible methods, and the downscaled highresolution and the original low-resolution forecasts can be reconciled with generalized linear regression, and thus become aggregate consistent (Yang et al., 2017b,c)— this method is known as forecast reconciliation or hierarchical forecasting, the concept of which is detailed in Chapter 12. Another notable indirect approach has been demonstrated by Yang et al. (2019), who used a fast pattern-matching algorithm to search for past low-resolution observations that resemble the current low-resolution forecasts, and then treat the corresponding past high-resolution observations as the current high-resolution forecasts. The underlying principle is one of analog, which is discussed further in Section 8.5.1. At any rate, neither of the above two forms of consistency is related to the temporal or spatial order of downscaled forecasts, which motivates the third form of consistency gauging the goodness of the downscaled forecasts. This third form of consistency, it is a fact, challenges virtually all downscaling models available in the literature, including dynamical downscaling with NWP. To give perspective, Fig. 8.3 shows an hourly forecast series, a 15-min downscaled series, as well as the actual 15-min observations. Day-ahead forecasts were issued during the 12Z runs on July 14 and 15, 2021, by the ECMWF’s High-Resolution model, for Bondville, Illinois (40.052◦ N, −88.37◦ W, 230 m); the method used for downscaling follows Frimane et al. (2020), who kindly provided the R code; and the ground-truth comes from the grid-cell-collocated Surface Radiation Budget Network (SURFRAD) station, which can be accessed via the SolarData package in R (Yang, 2018c). It can be seen from Fig. 8.3 that the 15-min downscaled forecast time series is clearly more variable than the original hourly forecasts. Based on visual judgment, that variability seems to agree with the variability of the actual measurements. Nevertheless, the temporal transient of the downscaled series does not agree with the actual one, despite their potential consistency in statistical properties. This means that if such downscaled forecasts are put to verification, the resultant error would be quite large due to the

Post-processing Solar Forecasts

307

grossly misaligned ramps. 1000

ECMWF 1−h forecast Downscaled 15−min forecast

GHI [W m2]

750

SURFRAD 15−min observation

500

250

0 Jul 15, 00:00

Jul 15, 12:00

Jul 16, 00:00 Time

Jul 16, 12:00

Figure 8.3 ECMWF’s 1-h forecasts (solid), downscaled 15-min forecasts (dashed) and

measured 15-min GHI (dotted) at Bondville, Illinois (40.052◦ N, −88.37◦ W, 230 m), over a ˆ two-day period, namely, July 15–16, 2020. Code courtesy of Azeddine Frimane.

In principle, the most desirable situation for downscaling is this. For a given large-scale process, there exist some known auxiliary variables at small scales that have some explanatory power over the main process of interest. Suppose a forecaster is tasked to downscale hourly GHI forecasts, if the 1-min cloudiness, aerosol, and ground albedo over the forecast period are known exactly or to a large extent, the downscaling would be quite accurate, since these small-scale auxiliary variables have profound influence on how GHI would behave. Unfortunately, such information is never available in reality. We must therefore reevaluate what is the most sensible way to assess the goodness of downscaling. Temporal (and/or spatial) alignment with observations is no doubt an important trait of downscaled forecasts, insofar as achieving high forecast accuracy is of interest. However, owing to the irresolvable difficulty in acquiring future auxiliary variables at small scales, a single downscaled time series would inevitably consist of misaligned ramps that could exaggerate the error. Nevertheless, if several instances of the downscaled time series, based on the same original forecast, can be obtained, they jointly form a predictive distribution, which offers a probabilistic perspective on how the future small-scale process might evolve. Stated differently, the ultimate pursuit of downscaling should be calibration and sharpness, rather than accuracy. If an ensemble of downscaled forecasts has maximum sharpness and is calibrated, its merit can be confirmed at once. Under such cases, should one need a single-valued best-guess forecast, one can always summarize the predictive distribution through P2D post-processing, which is discussed next.

308

8.4

Solar Irradiance and Photovoltaic Power Forecasting

P2D POST-PROCESSING

P2D post-processing deals with drawing out a single-valued best guess from a probabilistic representation of a forecast. This procedure is relevant to operational solar forecasting, during which grid operators often explicitly request point forecasts for legacy reasons, whereas the raw form of forecasts is probabilistic, e.g., when ensemble NWP is used as a basis for solar forecast generation. Some may argue that there is some redundancy in generating probabilistic forecasts and then performing P2D post-processing, as opposed to directly generating deterministic forecasts. This concern can be dismissed by considering the argument of Gneiting (2011), who first set the premise that forecasts are only desirable if the future is uncertain, and then advanced to the conclusion that the most useful forecasts are necessarily probabilistic. From a performance viewpoint, many forecasters have also noticed that making probabilistic forecasts first and post-processing them into deterministic forecasts is often more advantageous than directly making deterministic forecasts, both inside (e.g., Yang, 2020f; Nikodinoska et al., 2022) and outside (e.g., Gneiting et al., 2006; Makridakis et al., 2020) the solar forecasting literature. There are two main strategies for P2D post-processing, and the choice depends, to a considerable extent, upon the form of probabilistic representation: If the probabilistic forecast is a predictive distribution, the forecaster should choose a statistical functional (i.e., a real-valued summary), such as the mean or a quantile of the predictive distribution, according to the scoring function that is specified ex ante; if the probabilistic forecast comes as an ensemble, the forecaster should seek an objective function of which the minimization can optimally combine the ensemble members.5 A common misconception among forecasting novices is that such tasks are trivial. If that is indeed true, then the exceedingly bulky literature on forecast combination would not even be existing. Hence, it is critically important to understand the technical depth and complications pertaining to P2D post-processing. 8.4.1

SUMMARIZING PREDICTIVE DISTRIBUTION

Forecasts are evaluated using a scoring function, which is denoted with S. Wellknown examples of scoring functions for deterministic forecasts are squared error, absolute error, absolute percentage error, and relative error. Given a particular deterministic forecast x and the materialized observation y, the squared error is S(x, y) = (x − y)2 , the absolute error is S(x, y) = |x − y|, the squared percentage error is S(x, y) = (x − y)2 /y2 , and the relative error is S(x, y) = |(x − y)/x|. When a set of deterministic forecasts x1 , . . . , xn is to be evaluated against observations y1 , . . . , yn , the scores for individual forecasts are averaged, i.e., S = (1/n) ∑ni=1 S(xi , yi ), resulting in accuracy measures (or the expected scores) that are familiar to solar forecasters, e.g., 5 The literature has been generous in presenting other forms of probabilistic representations, which are quantile forecasts (Yang et al., 2020b; Pedro et al., 2018) and prediction intervals (Voyant et al., 2020; Quan et al., 2014). Nonetheless, since a prediction interval is simply a pair of quantiles, and quantiles can be regarded as members of an ensemble, one should be convinced that the present content is also applicable to P2D post-processing of quantiles and prediction intervals.

Post-processing Solar Forecasts

309

mean square error (MSE) or mean absolute error (MAE), see also Section 9.1.6 In many situations, the choice of scoring function is specified ex ante, i.e., it is known before the forecast is generated. For example, according to the bulletin “Technical requirements of power forecasting system for PV power stations” (NB/T 32011–2013), China Southern Power Grid (CSG) penalizes solar forecasts submitted to the operators based on RMSE, whereas Northeast China Grid (NEG) penalizes forecast submissions based on relative error (Yang et al., 2021a). Hence, when a forecaster is tasked to summarize predictive distributions under one of those directives, it is desirable to know which statistical functional is able to maximize utility, or equivalently, minimize penalty. Although earlier explorations are available (e.g., Murphy and Daan, 1985), this problem was first formalized by Gneiting (2011), and the advice was, and still is, for the forecaster to choose a statistical functional for which the scoring function is consistent. Mathematically, let the interval I be the potential range that the forecast quantity can take—for irradiance, one can assume the valid range to be from 0 to the solar constant, i.e., I = [0, 1361.1] W/m2 , or any pair of bounds used by Long and Shi (2008)—and let the probability distribution F be concentrated on I. In that sense, a scoring function would be a mapping S : I × I → [0, ∞),7 and a functional is a potentially set-valued mapping F → T(F) ⊆ I.8 Then, the scoring function S is consistent for the functional T if: EF [S(x∗ ,Y )] ≤ EF [S(x,Y )] ,

(8.15)

for all F, all x∗ ∈ T(F), and all x ∈ I. The scoring function is strictly consistent if it is consistent and equality occurs only if x ∈ T(F). On the other hand, a functional is elicitable if there exists a scoring function that is strictly consistent for it. Inequality (8.15) implies that for an S that is consistent for T, the expected error of the forecast generated by evaluating T is smaller than the expected error of any other forecast (the expectation is taken over all F). In the original paper by Gneiting (2011), several examples of elicitable functionals are enumerated together with the consistent scoring functions for them. Particularly notable are the facts that the squared error scoring function is consistent for the mean, the absolute error scoring function is consistent for the median, the squared percentage error and relative error scoring functions are consistent for the β -medians9 with β = −2 and β = 1, respectively, and the pinball loss, that is,  τ(y − x), for y ≥ x, (8.16) Sτ (x, y) = ( {y ≤ x} − τ) (x − y) = (1 − τ)(x − y), for y ≤ x, 6 One may subsequently take square root of certain averaged score S, to allow the score to have the unit √ of the forecast variable, e.g., if S is MSE, S gives RMSE. 7 Stated plainly, this expression says that the scoring function, which takes two inputs each with range I, outputs a value that is within the range [0, ∞), where the square bracket denotes that the end point is included and the round bracket denotes that the end point is excluded. 8 This expression says that the functional T with input F outputs a value that is within I. 9 For a random variable Y with a probability distribution F and a density function f on the positive half of the real line and a finite fractional moment of order β , its β -median is the median of another random variable whose density is proportional to yβ f (y).

310

Solar Irradiance and Photovoltaic Power Forecasting

where τ ∈ (0, 1), is consistent for the τ th quantile. As this stage, the earlier question on how to optimally summarize predictive distributions for operational solar forecasting purposes based on an ex ante penalty scheme can be answered. For PV plants within CSG, it is clear that the optimal way of summarizing the forecaster’s predictive distribution would be retrieving the mean, and for NEG the β -median with β = 1. Because the notions of consistency and elicitability are exceptionally vital, they are further elaborated with a case study.

Table 8.1 Penalty triggers and calculation methods for solar power forecast submissions in the day-ahead markets of four regional grids in China. xt and yt are forecast and observed power output [MW] at time t, respectively; Cap [MW] is the rated capacity; δ [h] is a penalty coefficient; P [/MWh] is the penalty for unit energy; and E is the error calculated using the region-specific metric. Different grids adopt different δ and P values, but that should not concern the current discussion. Grid CCG CSG

ECG

NEG

Penalty trigger 1 96 |xt − yt | ∑ Cap > 15% 96 t=1 1 2 96 1 2 31 (xt − yt )2 > 15% ∑ Cap 96 t=1 1 2 2 1 96  xt − yt 2 3 > 20% ∑ yt 96 t=1    xt − yt   > 15% max  xt 

Penalty [] (E − 15%) × Cap × δ × P (E − 15%) × Cap × δ × P (E − 20%) × Cap × δ × P ⎧

⎪ ⎨P × |1.15xt − yt |dt if 1.15xt < yt ⎪ ⎩P ×



|0.85xt − yt |dt

if 0.85xt > yt

Table 8.1 shows the penalty triggers and calculation methods for solar power submitted to the day-ahead markets of four regional grids in China (Yang et al., 2021a). The second column of the table details how penalties are triggered under various regional-grid codes. In all four regional grids, errors of the submitted dayahead forecasts—15-min resolution (96 points per day)—are calculated on a daily basis, and if the forecast error exceeds the percentage specified, the penalty scheme is triggered. Given the fact that regional grids of China are managed by different entities, who are responsible for setting the standard and code for their own grid, the penalty triggers differ quite a bit from each other: The penalty triggers for Central China Grid (CCG), CSG, the East China Grid (ECG), and NEG are based on absolute error, squared error, squared percentage error, and relative error, respectively. Once the penalty is triggered, the cost for additional error on top of the allowable threshold is calculated through methods listed in the third column of Table 8.1. The amount of penalty incurred depends on the rated capacity of the power plant (Cap), a penalty

Post-processing Solar Forecasts

311

coefficient (δ ) specified by the grid operators, and a monetary penalty for unit energy (P), which is tied to electricity price. In any case, the diversity in penalty triggers makes a perfect case study for our purpose. In order to empirically analyze the effect of choice of statistical functional on forecast evaluation, the 50-member perturbed GHI forecasts from ECMWF’s Ensemble Prediction System (EPS) over a two-year period (2019–2020) are collected at three mid-latitude sites, at which ground-based measurements are available and quality controlled. Forecast verification is carried out using four accuracy measures: root mean square error (RMSE), MAE, mean relative error (MRE), and root mean square percentage error (RMSPE). Firstly, the average performance of these ensemble members is studied. For each site, denoting the jth member forecast at time t as xt j , and the corresponding observation as yt , the average RMSE of all member forecasts—during daylight hours—is given as: 4 1 n 1 50 1 50 RMSE = RMSE = (8.17) j ∑ ∑ n ∑ (xt j − yt )2 , 50 j=1 50 j=1 t=1 and other average scores can be evaluated analogously. These scores are tabulated in Table 8.2, which serves as a baseline of comparison with the performance of the P2D post-processed forecasts. One can interpret the entries of Table 8.2 as the errors of an “average” member.

Table 8.2 Evaluation of ECMWF’s EPS member forecasts at the three selected sites. Each entry represents the average score for 50 members, under that scoring function, for that station. RMSE and MAE take the unit of W/m2 , whereas MRE and RMSPE are in %. Stn.

RMSE

MAE

MRE

RMSPE

1 2 3

125.67 83.29 117.56

80.83 49.99 75.70

31.21 14.60 28.01

128.78 61.20 120.58

Next, deterministic forecasts elicited from the ensembles are considered. With respect to the penalty schemes outlined in Table 8.1, the mean, median, β -median (with β = 1), and β -median (with β = −2) are elicited from each predictive distribution. Denoting the ensemble predictive distribution at time t with Ft , the four functionals can be written as mean(Ft ), median(Ft ), med(1) (Ft ), and med(−2) (Ft ). At each site, the RMSE of the elicited forecasts (during daylight hours) using the mean functional is given as: 4 1 n (8.18) RMSE = ∑ [mean(Ft ) − yt ]2 , n t=1

312

Solar Irradiance and Photovoltaic Power Forecasting

and the calculation of errors of forecasts elicited using other functionals, under other scoring functions, is analogous. Because there are four functionals and four scoring functions, forecast–observation pairs from each site are subjected to 16 measures of performance, as listed in Table 8.3.

Table 8.3 Evaluation of four statistical functionals for eliciting deterministic forecasts from ECMWF ensemble forecasts at three sites. RMSE and MAE take the unit of W/m2 , whereas MRE and RMSPE are in %. For each site, column-wise best results are in bold. Stn.

Functional

RMSE

MAE

MRE

RMSPE

1

mean(Ft ) median(Ft ) med(1) (Ft ) med(−2) (Ft )

111.3 113.1 116.5 121.5

74.2 73.5 75.9 77.1

26.9 26.9 26.0 64.0

120.9 120.9 133.3 89.5

2

mean(Ft ) median(Ft ) med(1) (Ft ) med(−2) (Ft )

75.9 76.5 76.5 82.6

47.5 47.1 47.5 49.6

13.3 13.2 12.9 19.2

58.5 58.4 62.6 49.4

3

mean(Ft ) median(Ft ) med(1) (Ft ) med(−2) (Ft )

105.3 106.8 107.9 117.4

70.1 69.6 70.0 75.5

24.3 24.5 23.4 49.0

115.1 116.7 125.2 93.5

Two conclusions can be drawn at once. Comparing Tables 8.2 and 8.3, the first insight is that the P2D post-processed forecasts perform generally better than the average performance of member forecasts. This echoes the statement at the beginning of this section, that is, a deterministic forecast results from P2D post-processing is often more advantageous than directly generating a best-guess forecast. The second conclusion from Table 8.3 is that choosing the correct elicitable functional to summarize predictive distributions has a marginal but beneficial effect on verification, in that, the average score based on a scoring function that is consistent for a functional is always smaller than those based on inconsistent scoring functions, e.g., the RMSEs of forecasts elicited using the mean functional are always smaller than those elicited using median and β -medians. Hence, the statistical theory regarding consistency and elicitability is empirically verified. That said, a pitfall of using this statistical theory in practice must be warned at this stage. One should bear in mind that Inequality (8.15) is defined over the expectation, for all F. Suppose in a size-n verification set there are o distinct predictive distributions, which implies o ≤ n, the optimal result as a beneficiary of consistency and elicitability is only guaranteed if the each predictive distribution Fk , k ∈ {1, . . . , o}, corresponds to numerous observations, i.e., o  n. Stated differently, if each Fk is

Post-processing Solar Forecasts

313

only materialized a few times, i.e., o is comparable to n, the optimality of forecasting the correct statistical functional may not hold. This is due to the random nature of how the observations are materialized. In an extreme case with just one forecast– observation pair, e.g., F ∼ N (0, 1) and y = 1, forecasting the mean clearly results in a bigger error than forecasting x = 1. Recalling that the penalty triggers in Table 8.1 are based on just 96 samples, the chance of the optimal forecasts triggering the penalty, while some other forecasts not triggering it, is fairly high. And, in turn, forecasting according to the consistency and elicitability theory may be susceptible to a higher penalty than forecasting with functionals for which the scoring function is inconsistent, as demonstrated by Yang and Kleissl (2023). So far the discussion has been focusing on the case in which a directive on the choice of scoring function is specified by the forecast user. (This would also be the scenario of forecast competitions, in which the organizers are the ones who specify the scoring function.) In other circumstances, however, the forecaster may conceive her own directive that is in the form of a functional. For instance, in a research setting, a forecaster is often encouraged to report both the deterministic and probabilistic forecasting results, to demonstrate on the whole the predictive performance of a model of interest. In this regard, the forecaster has the freedom over the choice of the statistical functional, and choosing an elicitable functional would be wise. In parallel, it is also critical for the scoring function used during verification to be consistent for the chosen functional, since the expected score is minimized only in that way. In any case, understanding the consistent scoring function and elicitable functional is able to alleviate much extant controversy in the current literature as to how deterministic forecasts should be made and verified (Gneiting, 2011). 8.4.2

COMBINING DETERMINISTIC FORECASTS

Combining is a general strategy that can be applied to all sorts of predictions, but the primary focus of this strategy has been on the gathering and aggregation of deterministic forecasts (Winkler et al., 2019). Combining deterministic predictions has a long history which can be dated back to Galton (1907), who combined 787 estimates of the dressed weight of a fat ox from the competitors of a weight-judging competition at the annual show of the West of England Fat Stock and Poultry Exhibition— by finding the median of those estimates—and attained a final estimate that is very close to the actual weight. Over the past 100 years, forecast combination has led to an eclectic mix of studies and debates, for a simple reason—the combined forecast is very often found superior to the best member (or component) forecast. One recent evidence of the benefits of forecast combination is the M4 Competition (Petropoulos and Makridakis, 2020), the most renowned forecasting competition in modern history (Hyndman, 2020), in which all winning methods leverage forecast combination (Petropoulos and Makridakis, 2020). Given the significance of forecast combination, there is an overwhelming literature. We shall not attempt, at any rate, to survey the literature on combining, which is now too voluminous to do so. Instead, several carefully selected milestone reviews in chronological order (Bates and Granger, 1969; Clemen, 1989; Armstrong, 2001;

314

Solar Irradiance and Photovoltaic Power Forecasting

Timmermann, 2006; Wallis, 2011) and some most recent discussions (Atiya, 2020; Lichtendahl and Winkler, 2020; Shaub, 2020; Petropoulos and Svetunkov, 2020; Ingel et al., 2020; Fiorucci and Louzada, 2020; Pawlikowski and Chorowska, 2020; Montero-Manso et al., 2020; Jaganathan and Prakash, 2020) are listed here, which should act as a starting point for anyone who wishes to delve deeper into this bulky subject. Before discussing forecast combination in more detail, terminology and notation should be first clarified. First and foremost, the words “combining” and “ensemble” can be used interchangeably in forecasting. There is, however, a slight difference in preference: “combining” is more popular in the statistical forecasting community, whereas “ensemble” is preferred by the weather community. Next, the individual models entering the combination process are called component models or members, and their forecasts are referred to as component forecasts or member forecasts. For a forecast time stamp t, there must be m component forecasts, denoted with xt1 , . . . , xtm . The general form of forecast combination starts with specifying a combination formula Gθ which connects component forecasts to observations: yi = Gθ (xi1 , . . . , xim ),

(8.19)

where yi is the observation at time i, θ = (w1 , . . . , wm ) are potentially time-varying weights that need to be estimated, and Gθ (·) is the function which aggregates the component forecasts (in most cases, Gθ is a linear function). With the estimated 1 , . . . , w m ) , the final or the combine forecast for time t is obtained weights θ = (w as: (8.20) y t = Gθ (xt1 , . . . , xtm ) . The simplest strategy for combining forecasts is to use heuristics, which means that the weights are not estimated from data, but are simply assigned by forecasters; this is discussed in Section 8.4.2.1. If the weights are estimated using linear statistical models, the combination is said to be linear; this approach is discussed in Section 8.4.2.2. Alternatively, when nonlinear models are used to regress observation on the component forecasts, the combination is said to be nonlinear, of which the approach is discussed in Section 8.4.2.3. Lastly, in Section 8.4.2.4, the most important concern in combining forecasts, namely, ensuring diversity in component forecasts, is discussed in the context of solar forecasting. Regardless, it is emphasized here again that there may be a conflict between the theory on which combining strategies are based and the ideal under which the member forecasts are generated. For instance, for dynamical ensemble with equally likely members, selecting any set of weights other than the 1/m would be quite disconcerting, because that either indicates the ignorance of the forecaster or suggests a deviation of the generation mechanism of member forecasts from its ideal. Another example is on hybrid ensemble, which comprises several sets of dynamical ensemble, or one dynamical ensemble and several deterministic forecasts from different models. In this scenario, a combination of equal and unequal weights is justified. More specifically, equal weights are to be assigned to those interchangeable members, and unequal weights are to be assigned to members from different models.

Post-processing Solar Forecasts

8.4.2.1

315

Training-free heuristics

The question of whether equal weights should be used during forecast combination is known as the forecast combination puzzle (Claeskens et al., 2016). Statisticians are often concerned with optimality in statistical models, in that, the weights of a model are determined based on some training data and a specified loss function. However, in forecast combination, the fact that optimal weights often perform worse than equal weights has been investigated both in theory and in practice (Chan and Pauwels, 2018; Smith and Wallis, 2009). The explanation resides in part on the empirical evidence that weights estimated using statistical methods may be highly dispersed in distribution (Bates and Granger, 1969) and/or unstable (Clemen, 1989). Stated more plainly, the contributions to the final forecasts from individual forecasters are dynamic, and the amount of contribution from the same member may be quite different from one instance to another. In fact, the forecast combination puzzle has also been noted by Armstrong (2001), who promoted the forecasting principle that “use equal weights unless you have strong evidence to support unequal weighting of forecasts.” On this point, training-free heuristics must not be regarded as na¨ıve or no-skill; instead, they ought to be tested against every time a more sophisticated combining approach is proposed (Nikodinoska et al., 2022). In solar forecasting, the performance of time series and machine-learning models is highly influenced by local weather and sky conditions. Based on a study by Yagli et al. (2019a), in which 68 machine-learning models are tested at 7 locations, no model was even remotely found to be able to consistently outperform its peers. This observation makes forecast combination all the more attractive, but at the same time, greatly limits one’s confidence in using unequal weights, since the comparative performance of component models at unseen sites is generally unclear. In such situations, calculating the arithmetic mean of component forecasts is a safe option. Besides calculating the mean, summarizing component forecasts by other elicitable functionals is appropriate, if there is an ex ante scoring function (Kolassa, 2020), as also discussed in Section 8.4.1. Not all forecasters are skillful, and there may be certain very poor forecasts within the pool of component forecasts. To mitigate the risk of having significant errors due to these outlier member forecasts, trimming and winsorizing are two commonly adopted methods (Jose and Winkler, 2008). Whereas the trimmed mean can be written as m−k 1 y t = (8.21) ∑ xt,( j) , m − 2k j=k+1 the winsorized mean is:

  m−k 1 kxt,(k+1) + ∑ xt,( j) + kxt,(m−k) , y t = m j=k+1

(8.22)

where xt,( j) denotes the jth ordered component forecast,10 0 < k < m/2 is the number of component forecasts to be trimmed or winsorized from each side of 10 Sorting

the component forecasts from smallest to biggest.

316

Solar Irradiance and Photovoltaic Power Forecasting

the ordered forecasts. For example, suppose there are seven component forecasts, {0, 100, 300, 370, 380, 550, 1000}, and let k = 2, then, the trimmed mean is (300 + 370 + 380)/3 = 350, and the winsorized mean is (2 × 100 + 300 + 370 + 380 + 2 × 550)/7 = 335.71. In short, both trimming and winsorizing are able to lower the sensitivity of the mean operator to extreme values. That said, Armstrong (2001) recommended using trimmed averaging only when there are at least five component forecasts available. This fact is related to the important aspect of diversification in combining forecasts. Some works in the solar forecasting literature combine as few as two (e.g., Andr´e et al., 2019) or three (e.g., Huang et al., 2018) component forecasts, whereas others (e.g., Haupt and Kosovi´c, 2017) combine seven or more models. At the moment, very few solar forecasting studies have quantified the correspondence between forecast accuracy and the number of component models, but this is a worthwhile topic for future research. Last but not least, it has been observed that some solar forecasting papers have attempted to use ad hoc training-free algorithms for combining, e.g., Liu et al. (2019) proposed a recursive arithmetic averaging method which was used to combine three component forecasts. Nevertheless, these ad hoc algorithms lack theoretical justification and are rarely tested with extensive data; hence, they are not recommended. 8.4.2.2

Linear combination

Suppose there is some evidence pointing to the potential benefits of using unequal weighting, forecasters are then concerned with assumptions which have to be made when obtaining such weights. Generally, methods for linear combination can be categorized into variance–covariance methods and regression-based methods, which are summarized in Diebold and Lopez (1996). A typical variance–covariance combining method uses the inverse MSE of component forecasts as weights, that is, the importance of each component model is assumed to be inversely proportional to its MSE, then, for any forecast time t: / 0−1 m m 1 1 xt j , (8.23) y t = ∑ ∑   j=1 MSE j j=1 MSE j  j is the estimated MSE of the jth component model based on n samples of where MSE observations yi , i = 1, . . . , n, and the corresponding component forecasts xi1 , . . . , xim . When combining is regarded as a regression problem, and the relationship can be assumed to be linear, then m j xt j , 0 + ∑ w (8.24) y t = w j=1

1 , . . . , w m are estimated regression coefficients. In both classes of meth 0 , w where w ods, the forecaster needs to estimate the unknown weights using historical observations, which are not always available, e.g., when performing forecasting for a newly installed PV site. Owing to the simplicity and flexibility of time series and machine-learning toolboxes available nowadays, it is not uncommon for solar forecasters to collect a large pool of component forecasts. When the component models are numerous, a natural

Post-processing Solar Forecasts

317

strategy is to consider shrinkage and variable selection. Shrinkage and variable selection methods are well-suited for situations where high levels of multi-collinearity are present. The presence of multi-collinearity is frequently encountered in solar forecasting, for the different component models are often trained using the same input data. In any case, the reader is referred to the books by Hastie et al. (2009); Miller (2002) for methods and techniques available of regression with shrinkage and variable selection, which is also known as penalized regression. One excellent example of solar forecast combination with penalized regression is the work by Thorey et al. (2015), who considered a 158-member ensemble from 6 weather centers, including ECMWF, China Meteorological Administration, UK MetOffice, Korea Meteorological Administration, Centro de Previs˜ao Tempo e Estudos Clim´aticos, and M´et´eo-France. On the other hand, there are also ill-motivated solar forecasting works that leverage shrinkage and variable selection methods in situations that do not warrant the use of penalized regression. For instance, Nikodinoska et al. (2022) proposed to use a time-varying elastic net to combine just five forecasts. Though the verification result showed that the time-varying elastic net is able to outperform simple averaging, neither have the authors attempted to compare their time-varying elastic net to regressions without shrinkage (such as ordinary least squares regression), nor have they cared to show the results of non-time-varying (i.e., static) elastic net. However, those possibilities may be equally if not more effective than the proposed time-varying elastic net. This incomplete verification procedure, may it be intentional or plain oversight, limits the validity of their proposition on using a time-varying elastic net. When we are examining any (verbatim or modified) transfer of existing datascience techniques to the solar forecasting problem, we should never be diverted either by the proclaimed novelty of the application, or by what journal the research was published in. Instead, we must focus only and solely on the prerequisites pertaining to the application of a technique, and whether such prerequisites are met by the problem at hand. Circling back to penalized regression, it is known to be most useful under the presence of multi-collinearity and high dimensionality in training data. Thus, one should always explore whether or not the data possesses these properties before diving straight into implementation. For solar forecast combination, besides running a large number of time series and machine-learning-based forecasting models, or collecting a large number of poor man’s ensemble forecasts, another way to obtain a rich collection of component forecasts is to consider spatio-temporal forecasts from sensor networks (Yang, 2018d; Yang et al., 2015b) or neighboring satellite pixels (Yagli et al., 2022). 8.4.2.3

Nonlinear combination

Just like in the case of regression-based D2D post-processing, there are many variations and extensions to the regression-based forecast combination, many of which are nonlinear. For instance, the caret package in R contains over 100 machine-learning models for regression, with most of them being potentially applicable for forecast combination, whereas the scikit-learn library offers the Python version of these

318

Solar Irradiance and Photovoltaic Power Forecasting

models. Furthermore, with the toolboxes becoming more and more mature and userfriendly, the latest versions allow the models to be implemented with a few lines of code, if not just two lines (one for training and one for testing), and the automatic parameter tuning capacity of both software tools have also become quite strong. On this point, once the tools are set up properly, the straightforward call of functions makes machine learning far less impressive than it was 20 years ago. In the two-part work by Rodr´ıguez-Ben´ıtez et al. (2020); Huertas-Tato et al. (2020), a case study on intra-hour and intra-day solar forecasting at four Spanish sites located in the Iberian Peninsula was carried out. Part one of the work dealt with generating component forecasts, in which four models, namely, satellite-based forecasting using cloud motion vector, clear-sky persistence, WRF-Solar, and CIADCast (a hybrid satellite-NWP model), were employed to generate both GHI and BNI forecasts for horizons ranging from 15 min to 6 h in 15-min steps. In part two of the work, the component forecasts were combined via support vector regression (SVR), with either linear or nonlinear kernels. Two combination strategies were used: one of those fitted an SVR model for each horizon, and the other fitted a single SVR model for all horizons. While the differences between these two strategies were marginal, they were both found to be able to lower the RMSE of the best component model— up to 17% for GHI and 16% for BNI. 8.4.2.4

Combining solar forecasts with different forecast-generating mechanisms

Diversifying ensemble forecasts is a scheme that has been so extensively tested, yet it is often overlooked in solar forecasting. Theoretical discussions on why combining works so well are not rare (e.g., Atiya, 2020; Thomson et al., 2019; Davis-Stober et al., 2014), but despite the long-lasting investigation and debate, the precise condition that can guarantee the combined forecast to outperform the best component forecast is still not clear. However, some insights can be obtained mathematically— e.g., it can be shown that the combined forecast through simple averaging is able to outperform an average member, whereas the situation is less obvious but similar for unequal weights. We shall briefly explain both statements in what follows, and then suggest what can be done in solar forecast combination to increase the chance of having good combining results. Let X j and Y denote the random variables representing the jth component forecast and the observation, respectively, and let w j denote the weight associated with X j , by assuming there is no intercept and the combination is convex, i.e., w j ≥ 0 and ∑mj=1 w j = 1, one can arrive at the following result: 0 / m

MSE

∑ w j X j ,Y

j=1

weighted sum of component MSEs



=

m



∑ mw2j MSE(X j ,Y )

j=1

a positive term



!



m−1

m

∑ ∑

j=1 k= j+1

 ! 5" #2 6 E w j (X j −Y ) − wk (Xk −Y ) . (8.25)

Post-processing Solar Forecasts

319

The reader is referred to Appendix A.2 for a full derivation of Eq. (8.25). In this equation, the first term is a weighted sum of component MSEs, whereas the second term is positive, due to the sum-of-squares expression. When w j = 1/m, i.e., the combination takes equal weights, Eq. (8.25) becomes: / 0 m # " 1 m 1 m−1 m MSE ∑ w j X j ,Y = ∑ MSE(X j ,Y ) − 2 ∑ ∑ E (X j − Xk )2 . (8.26) m j=1 m j=1 k= j+1 j=1 It is obvious that the first term is "just the simple average of component MSEs, # " # whereas the second term is large if E (X j − Xk )2 is large. The term E (X j − Xk )2 is the expected squared difference between the jth and kth component forecasts, # in that, " the more diverse the component forecasts are, the larger the E (X j − Xk )2 term, and the smaller the MSE of the combined forecast. When w j ’s are unequal, things are trickier. Following the well-known bias– variance decomposition of MSE, one can write: 0 / m

∑ w j X j ,Y

MSE

j=1

/ =V

 0!  /

variance of forecast error



0

m

∑ w jXj

/

+ V(Y ) − 2Cov

j=1

m

∑ w j X j ,Y

+ E

j=1

squared bias



m

0

∑ w jXj

− E(Y )

! 2 .

j=1

(8.27) Denoting the standard deviation of X j as σ j and the correlation between X j and Xk as ρ jk , the variance of combined forecast can be further decomposed into: /

0

m

∑ w jXj

V

/

m

02

∑ w jσ j

=

j=1

−2

j=1

m

m−1

∑ ∑

w j wk (1 − ρ jk )σ j σk .

(8.28)

j=1 k= j+1

Considering the weighted power mean inequality, which states that for any r > s, and r, s = 0, (ω1 ar1 , + · · · + ωm arm )1/r ≥ (ω1 as1 , + · · · + ωm asm )1/s , (8.29) where a1 , . . . , am and ω1 , . . . , ωm are positive reals with ∑mj=1 ω j = 1. Take ω j = w j , a j = σ j , r = 2, and s = 1, one yields: 4 m

m

j=1

j=1

∑ w j σ 2j ≥ ∑ w j σ j ,

and thus

/ V

m

∑ w jXj

j=1

0 ≤

m

∑ w j σ 2j − 2

j=1

m−1

m

∑ ∑

j=1 k= j+1

w j wk (1 − ρ jk )σ j σk ,

(8.30)

(8.31)

320

Solar Irradiance and Photovoltaic Power Forecasting

which suggests that the variance of the combined forecast is no larger than the weighted averaged variance of the component forecast, and the less correlated the component forecasts are, the smaller the variance of the combined forecast is. This decrease in variance contributes positively to (i.e., reduces the size of) the MSE. V(Y ) is the variance of observation, which is irrelevant to the size comparison between combined and individual forecasts. The covariance term in Eq. (8.27) is: / 0 m

Cov

∑ w j X j ,Y

=

j=1

and the expectation term is: / E

m

∑ w jXj

j=1

0 − E(Y ) =

m

∑ w j Cov(X j ,Y ),

(8.32)

j=1

m

∑ w j [E(X j ) − E(Y )] .

(8.33)

j=1

In other words, the covariance of the combined forecast is the weighted average covariance of component forecasts, and the bias of the combined forecast is the weighted average bias of component forecasts, which is likely to be small since a good collection of component forecasts tends to bias towards opposite directions. Thus far, we have shown that, as to the individual factors of the bias–variance decomposition, the combined forecast with unequal weights does not seem to possess any disadvantage. Rather, there are obvious advantages if: (1) component forecasts are diverse, and (2) their directions of bias are different. Nonetheless, when we consider the question of what constitutes an average component forecast, the idea remains opaque. One could define an “average” component forecast to be one with all statistical properties equal to the weighted average properties of all ensemble members, such as what we have done above. However, such a definition can be argued as being unnatural. But one thing certain is that ensuring the diversity of and reducing the bias in component forecasts are beneficial. Reducing bias is a straightforward task; as long as some historical forecast– observation pairs are available, one can perform a simple linear correction to each component forecast to remove the bias. In contrast, imparting diversity is less than obvious. Clearly, the inclusion of many correlated forecasts has a small effect on diversity, which implies the limitation of using just a large collection of machinelearning and statistical forecasting models on the same data. What solar forecasters should do is to consider hybrid ensembles with forecasts from both physical and data-driven models (e.g., Zhang et al., 2022; Huertas-Tato et al., 2020; Yang and Dong, 2018), or more generally, generating forecasts using models with distinctive prediction mechanisms. Another notable trick of increasing diversity is to use component forecasts generated at different time scales. This idea exploits the differences in variability of the same process over different frequencies, and has been shown to be useful in both solar forecasting and elsewhere (Yang et al., 2017c; Kourentzes and Petropoulos, 2016; Kourentzes et al., 2014; Andrawis et al., 2011).

Post-processing Solar Forecasts

8.5

321

D2P POST-PROCESSING

Statistics is the science of uncertainty, and a major aspect of it, therefore, is placed on quantifying various uncertainties associated with parameter estimates and model predictions made using statistical methods. Regression, as a time-honored statistics subject, is no exception. In the solar forecasting literature, however, there seems to be a common false impression that forecasts from regression models are deterministic— this claim can be justified easily by the myriads of regression-based forecasting papers that do not report probabilistic results. With the recent machine learning “gold rush,” the problem of lack of uncertainty quantification has been amplified by the deterministic-only implementations in many off-the-shelf tools. As a result, only a small fraction of solar forecasting works address probabilistic forecasting— according to the review of Yang (2019a), only 18% of the surveyed papers considered probabilistic solar forecasting. On top of that, other legacy reasons in solar forecasting as discussed Chapter 3 also contribute to the deterministic-forecast-dominated literature across the 2010s. Things are no doubt turning the other way now, as probabilistic forecasting is being valued more than ever in the field of solar forecasting. Nevertheless, there are still many situations in which all one has is a single time series of deterministic forecasts, e.g., NAM forecasts downloaded from the National Centers for Environmental Prediction (NCEP) server are deterministic, and satellite-based forecasts using advection of the cloud field are also deterministic. In such situations, methods that allow uncertainty quantification based on a single deterministic forecast time series become highly relevant. We refer to such methods as D2P post-processing, which constitutes the subject of this section. All D2P post-processing strategies make the assumptions that (1) weather patterns repeat, such that past experience is able to be used to adjust the current forecast; and (2) model forecasts repeat, such that it is advantageous to issue similar forecasts given similar weather patterns. In this regard, the degree to which weather patterns can be described and the sensitivity of the forecasting model to different weather patterns jointly account for how successful a D2P post-processing strategy can be. Generally speaking, the better the weather patterns are described, the better the quality of the found matches would be; and the more sensitive the forecasting model can respond to a weather pattern, the more likely it is able to issue a tailored forecast for that weather pattern. This, of course, does not rule out the potential danger of overfitting, particularly when the size of the historical forecast–observation database is small. Hence, D2P post-processing, like almost all problems in which design and optimization are involved, is one of balance. 8.5.1

ANALOG ENSEMBLE

The term “analogs”—states of the atmosphere (or weather patterns) that resemble each other—was coined by Lorenz (1969) in his early work on atmospheric predictability. The resemblance between two analogs can be regarded as that one analog is the same as the other plus a small superposed error. When several analogs are

322

Solar Irradiance and Photovoltaic Power Forecasting

searched and found according to a given weather pattern, in order to form an ensemble, the diversity among them can be taken to account for the uncertainty associated with that given weather pattern. The aim of analog ensemble (AnEn) therefore is to identify past analogs of the current weather pattern, such that the uncertainty of the current forecast can be deduced from historical data corresponding to those past analogs. To that end, one can view analog ensemble as a similarity search method, and such method is not unique to weather forecasting, though it is often called by other names elsewhere—e.g., similar-day method in load forecasting (Hong and Fan, 2016), nearest neighbors in machine learning (Pedro and Coimbra, 2015), and motif discovery in signal processing (Yeh et al., 2016). The deep-learning-based weather models, such as Pangu-Weather (Bi et al., 2023) and GraphCast (Lam et al., 2022), which emerged in the early 2020s, are also performing analog forecasting but in very advanced styles (Bauer et al., 2023). The original formulation of AnEn, as it is still found in most publications today, is based on the matching of forecasts. Nevertheless, inasmuch as what constitutes a weather pattern, there is no reason to limit the application of AnEn to just forecasts. Yang and Alessandrini (2019) summarized three possible options of AnEn, each having a different definition and use of analogs: 1. Analogs are defined through observations, in that the search is conducted between the current and historical observations, and whatever happened immediately after the historical analogs is assumed to be what would happen immediately after the current analog (as in Ayet and Tandeo, 2018); 2. Analogs are defined through forecasts, in that the current forecast is matched to historical ones, and whatever happened actually in the past (i.e., the corresponding historical observations) are taken as the probable alternative realizations of the current forecast (as in Alessandrini et al., 2015a); 3. Analogs are defined through both observations and forecasts, in that the matching takes place between the current forecast and historical observations, and the best-matched candidates are used directly as the probable alternative realizations of the current forecast (as in Yang et al., 2019). These three distinct options of AnEn can be chosen based on the available data. Option 1 can be used in situations where forecasts are not available, but it requires a long historical observation time series. Option 2 can be used when forecast and observation are both available, and it requires long historical time series for both forecast and observation. Option 3 is adequate if the forecast time series is short but the observation time series is long. Besides these three options, other variants do exist (e.g., Wang et al., 2017c), but regardless of how analogs are defined and exploited, the principal assumptions as stated in the beginning of this section remain the same. In what follows, the most representative use of AnEn, which is when it is applied to NWP forecast, is first discussed in Section 8.5.1.1. Then, several other novel uses of AnEn are enumerated in Section 8.5.1.2. Finally, some practical challenges as to the implementation of AnEn are revealed, and solutions proposed, in Section 8.5.1.3.

Post-processing Solar Forecasts

8.5.1.1

323

AnEn with NWP output

The first major work on AnEn in the field of solar forecasting was conducted by Alessandrini et al. (2015a), who also delivered a sister paper on wind forecasting in the same year (Alessandrini et al., 2015b). Based on weather forecasts issued by the Regional Atmospheric Modeling System (RAMS), solar power forecasting at three PV plants in Italy was performed (Alessandrini et al., 2015a). In that particular setting, the analogs are defined through four forecast variables (GHI, BNI, cloud cover, and temperature at 2 m) and two calculated variables (zenith and azimuth angles of the sun), and once the matching historical forecasts are found, the corresponding values of historical PV generation are taken directly as the PV power forecast. Mathematically, denoting the number of variables as m and the forecast time stamp as t, Alessandrini et al. (2015a) defined the forecast weather pattern (Xt ) as: ⎛

(1)

x t˜ ⎜ t− ⎜ Xt = ⎝ ... (m) xt−t˜

(1)

xt−t˜+1 .. . (m) xt−t˜+1

··· .. . ···

(1)

xt .. . (m) xt

··· .. . ···

(1)

xt+t˜−1 .. . (m) xt+t˜−1

⎞ (1)

xt+t˜ .. ⎟ ⎟ . ⎠ . (m) xt+t˜

(8.34)

To increase the chance of identifying meaningful analogs—recall that the quality of weather pattern description affects the forecasting performance—instead of using just the forecast at time t, the temporal neighbors around t (controlled by halfwindow size, t˜) are also included in the definition. From a pragmatic viewpoint, t˜ = 1 is usually sufficient. It is then clear from the expression that the weather pattern is a 2D array, of which the dimension is based on the number of variables and the window size around the forecast time stamp. Using similar notation, an analog based on the past forecasts centered at time i, Ai , can be expressed as: ⎛

(1)

x t˜ ⎜ i− ⎜ Ai = ⎝ ... (m) xi−t˜

(1)

xi−t˜+1 .. . (m) xi−t˜+1

··· .. . ···

(1)

xi .. . (m) xi

··· .. . ···

(1)

xi+t˜−1 .. . (m) xi+t˜−1

⎞ (1)

xi+t˜ .. ⎟ ⎟ . ⎠ . (m) xi+t˜

(8.35)

By computing similarities between Xt and all Ai ’s, several candidates with highest similarities are considered to be the analogs of Xt , see Fig. 8.4. Subsequently, the past observations that correspond to the top-match analogs jointly form an ensemble forecast. In the language of computer scientists, Xt is called the query, i.e., the pattern to be matched, and Ai , for i = 1, 2, . . . ,t is called the data, i.e., the database that contains all historical patterns. In all similarity-search problems, a metric is required to gauge the proximity between two objects. On this point, the weighted Euclidean distance: 1 2 t˜   m 2 ( j) ( j) 2 d (Xt , Ai ) = ∑ w j 3 ∑ xt+k − xi+k , (8.36) j=1

k=−t˜

was used by (Alessandrini et al., 2015a), who optimized the weights w j from the

Historical data, Ai : 1 2

3

4

324

Query

Query, Xt :

5

6

7

8

9

10

17

18

19

20

A6

11

12

13

14

15

16

21

22 A22

23

24

25

26

27

28

29

30

A24

Figure 8.4 The current weather pattern Xt —represented by a matrix—is matched to all historical patterns in the database (in this case, Ai , i = 1, . . . , 30). Top analogs (in this case, four) are found. Subsequently, the corresponding historical observations (in this case, y6 , y20 , y22 , and y24 ) are grouped and used as an ensemble forecast.

Solar Irradiance and Photovoltaic Power Forecasting

A20

Post-processing Solar Forecasts

325

data, e.g., by minimizing the continuous ranked probability score (CRPS).11 This distance metric is a popular choice in works on AnEn (Junk et al., 2015a,b), whereas other distance metrics are also available. For instance, Dav`o et al. (2016) used 4   t˜ m ( j) ( j) 2 (8.37) d (Xt , Ai ) = ∑ ∑ w j xt+k − xi+k , k=−t˜

j=1

in which the two summations in Eq. (8.36) are swapped, whereas Yang and Alessandrini (2019) considered another alternative, in which the lagged and lead versions of the same weather variable are simply regarded as distinct variables: 1 2 p   2 ( j) ( j) 2 d(Xt , Ai ) = 3 ∑ w j xt − xi , (8.38) j=1

where p = (2t˜ + 1) × m is the total number of variables. Regardless of which distance metric is opted, it is of paramount importance that the inputs are normalized, for obvious reasons. 8.5.1.2

Other novel uses of AnEn

Alessandrini et al. (2015a) define analogs to be a length-(2t˜ + 1), m-variate forecast time series issued by NWP, which can be written into (2t˜ + 1)-by-m matrices, as shown in Eqs. (8.34) and (8.35). If those matrices are rewritten into p-by-1 column vectors, they are:  

(1) (1) (m) (m) xt = xt− , ∈ R p×1 , (8.39) t˜ · · · xt+t˜ · · · xt−t˜ · · · xt+t˜  

(1) (1) (m) (m) x i = xi− , ∈ R p×1 . (8.40) t˜ · · · xi+t˜ · · · xi−t˜ · · · xi+t˜ Suppose n samples of x i , i = 1, · · · , n, can be extracted from the historical database, they stack into a training matrix: ⎛ ⎞ ⎛ (1) (1) (m) (m) ⎞ x1−t˜ · · · x1+t˜ · · · x1−t˜ · · · x1+t˜ x1 (1) (1) (m) (m) ⎟ ⎜x ⎟ ⎜ x2−t˜ · · · x2+t˜ · · · x2−t˜ · · · x2+t˜⎟ ⎜ 2⎟ ⎜ n×p ⎟ ⎜ X =⎜ . ⎟=⎜ . .. .. .. ⎟ , ∈ R , (8.41) .. .. .. ⎝ .. ⎠ ⎝ .. . . . ⎠ . . . (1) (1) (m) (m) x

n xn−t˜ · · · xn+t˜ · · · xn−t˜ · · · xn+t˜ and the response vector, y = (y1 , . . . , yn ) , contains the corresponding observations to those analogs. This setup must be recognized by all who know basic machine learning, in that, once the relationship between X and y is established, xt , t∈ / {1, . . . , n}, acts as a new sample whose response can then be predicted. The particular algorithm in machine-learning literature that matches the principle of AnEn, as mentioned earlier, is one called nearest neighbors. 11 Minimizing CRPS is one of the universally accepted approaches for parameter estimation for probabilistic forecasting models; in the rest of the book, we will encounter this minimization several times.

326

Solar Irradiance and Photovoltaic Power Forecasting

From the viewpoint of machine learning, the features in the training matrix are not at all limited to weather variables from NWP, but any combination of features that is capable of differentiating weather patterns may be deemed appropriate. Features engineered from satellite imagery and sky camera snapshots are able to capture cloud dynamics, and thus have been used to good advantage by Yang et al. (2020b), for forecasting intra-day and intra-hour irradiance by the means of AnEn. Features may also come from a pyranometer network, in which the readings at the upwind pyranometers can act as features for irradiance forecasting at downwind stations; an AnEn-based spatio-temporal forecasting case study is available from Yang et al. (2022e). In other cases, the selected features are not necessarily observations, but rather summary statistics of the observations. For instance, Watanabe and Nohara (2019) considered a set of seven statistical features—mean, standard deviation, skewness, kurtosis, linear regression coefficient, lag-1 autocorrelation, and sample entropy—retrieved from historical satellite-derived irradiance time series, which can be mapped to cloud properties. Then, with a new set of cloud properties, which can be derived from the current satellite image, the corresponding time series features can be predicted via AnEn, which can subsequently be used to predict irradiance time series. Speaking of time series features, one must acknowledge the contributions from the statisticians (e.g., Wang et al., 2006; Kang et al., 2017; Hyndman et al., 2015), who have introduced very comprehensive lists of features that can accurately encapsulate the dynamics and characteristics of time series. When it comes to images or other high-dimensional data, features useful to AnEn can also be obtained via dimension-reduction technique, such as the principal component analysis (Yang et al., 2017a) or quadtree (Yang et al., 2016). In deriving features pertaining to solar applications, one often overlooked aspect is the scale of the irradiance process (Orlanski, 1975; Yang, 2020e). More specifically, given a certain temporal scale (e.g., on the order of minutes) on which the irradiance process evolves, there is a certain spatial scale (e.g., on the order of kilometers) that matches with it. This correspondence between temporal scale and spatial scale is often established through the so-called “decorrelation distance.” Correlation between two time series in space decreases with their separation, and diminishes beyond a certain distance—this distance is known as the range of a variogram in spatial statistics (Cressie and Wikle, 2015) and known as the compact support of a correlation function in atmospheric science (Gneiting, 1999). Consequently, ignoring the decorrelation distance by including features beyond that tends to deteriorate the performance of AnEn, because more noise is added to the feature space. To that end, the features should only be sought within the correlation mask, see Fig. 3 of Ayet and Tandeo (2018) and Fig. 8 of Yang et al. (2014a). One last thing that demands attention from the reader is that the features are best extracted from deseasonalized variables, such as clear-sky index, since diurnal cycles in irradiance and PV power exaggerate correlation, and thus give a false sense of similarity; this necessity of seasonality removal before modeling building has been mentioned all too many times throughout this book, but it is a pivotal fact, and one which differentiates solar forecasting from every other forecasting tasks.

Post-processing Solar Forecasts

8.5.1.3

327

Practical issues of AnEn

The merit of AnEn lies in its capability to turn deterministic forecasts into probabilistic ones, which is needed when dynamical ensembles are not available. At the moment, there are not many investigations performed in the field of solar forecasting that compare these two types of ensembles in a head-to-head fashion, so their relative performance remains largely unclear. One can, nevertheless, discuss the advantages and disadvantages of AnEn in contrast to dynamical ensemble from a practical point of view. The most obvious advantage of AnEn is its simplicity, which saves many forecast practitioners from the highly demanding task of running ensemble NWP. This would undoubtedly increase the uptake of the probabilistic representation of the weather in solar forecasting. The disadvantage, as revealed at the beginning of this section, is that AnEn requires a “frozen” model to operate, which means that the model should be able to issue similar forecasts given similar weather patterns. Since NWP models undergo regular updates in terms of the selected physics packages and schemes—recall Section 6.6.1—the requirement on “frozen” model cannot be met at any rate in reality, so one has no other choice but to relax this requirement. Compared to that, a more consequential disadvantage of AnEn is that it requires a long time series of historical ground-based data to operate, which is exceedingly rare. Fortunately, as techniques and accuracy for retrieving irradiance from remote-sensing images improve, it may be possible to use satellite-derived irradiance in place of ground-based observation (Yang and Perez, 2019; Yang, 2021b, 2019d)—satellitederived irradiance databases are typically available over a few decades and covering all locations within the ±60◦ latitude band, which could resolve the issue of data shortage. Another practical issue with AnEn consists of the computation sources that are required to perform pattern matching. As engineers of the internet era, with the computing power on a smart phone being stronger than that available for the Apollo Moon Landing, one would not usually expect similarity search over some time series to be an issue. This is nevertheless an actual problem. Similarity search demands computing the all-pair Euclidean distances (also known as similarity join Yeh et al., 2016), for a length-n database and a length-m query, time complexity for a bruteforce search is O(nm), which scales quadratically. On this point, when m and n are both large and when the process needs to iterate many times to post-process forecasts for different horizons and locations, Cervone et al. (2017) had no way to perform AnEn in reasonable time but to leverage the Yellowstone supercomputer at the National Center for Atmospheric Research (NCAR), which has 141,140 cores. Since not everyone has a supercomputer at home, it is of interest to seek ways that can reduce the time complexity of similarity search. In particular, Yang and Alessandrini (2019) and Yang (2019g) demonstrated a pair of ultra-fast similarity-search algorithms for AnEn applications; the choice of these algorithms depends on what form the query matrix takes. Both algorithms have been compared to a standard AnEn implementation of NCAR, and are found to be able to reduce the computational time by two orders of magnitude.

328

8.5.2

Solar Irradiance and Photovoltaic Power Forecasting

METHOD OF DRESSING

The philosophy of the method of dressing is to examine the behavior of a forecasting model in the past, identify those instances that can provide guidance on predicting how wrong the current forecast may be, and thus dress the observed errors of those past instances onto the current forecast, so as to quantify the uncertainty associated with the new forecast. Just like analog ensemble, the method of dressing also exploits the repeated weather patterns—whatever happens now may have had happened numerous times in the past. But different from analog ensemble, the method of dressing does not leverage the past observations directly, but rather through exploiting the errors. The procedure of dressing multiple past errors onto the same forecast is, in principle, no different from bootstrapping, a class of commonly used resampling techniques in statistics. To give a more concrete example of what the method of dressing does, consider the 1-step-ahead persistence on clear-sky index, where the observation made at t − 1 is simply taken as the forecast for time t. To quantify the uncertainty associated with that forecast, m past persistence forecast errors, for time stamps t − 1,t − 2, . . . ,t − m are superposed to the forecast for t. This seemingly na¨ıve approach has in fact appeared in Gneiting et al. (2007), a much-celebrated work on probabilistic forecasting. It should be clarified, nevertheless, that the method of dressing using persistence is not to be mixed up with persistence ensemble (PeEn)—a popular benchmark for AnEn—which directly takes the m past forecasts as ensemble members. The above procedure of dressing results in an empirical distribution, and one can also choose to sample error from an assumed parametric, mixture, or copula-based error distribution; these two approaches constitute the main options of the method of dressing. To give perspective on how the latter option of the method of dressing proceeds, consider a forecast made for time t, and the error of which is assumed to be normally distributed with mean zero and variance σ 2 . Then, the forecaster draws m samples from N (0, σ 2 ) and dresses those on the forecast. 8.5.2.1

Dressing via empirical distributions

The method of dressing via empirical distribution is straightforward in theory, since it only requires a set of deterministic forecasts and their respective observations to operate, whereas the particular choice of forecasting model does not matter. As bootstrapping has hitherto been a popular technique for estimating the variance and the distribution associated with a parameter or a quantity, it is often used to construct prediction intervals in the context of time series modeling, see David et al. (2018) for a solar case study, and B¨uhlmann (1997, 2002) for the statistical theory supporting the method used therein. Although the method of dressing is easy to implement, the quality of the empirical predictive distribution constructed using past errors is often not guaranteed for one simple reason: The forecast error of solar irradiance or related variables such as clear-sky index or PV power is almost always nonhomogeneous, i.e., the variance of error would change from one instance to another. This has also been noted by Pinson and Kariniotakis (2010), who argued that it is unlikely for the prediction errors from the recent past to be representative of the current uncertainty. The necessity of conditional sampling then follows.

Post-processing Solar Forecasts

329

The error of a solar forecasting model is conditioned on an eclectic mix of factors, ranging from time-of-day to cloudiness, from aerosol loading to surface albedo, from air mass to temperature to precipitation to surface pressure, and the list goes on. In the most general sense, the better the conditioning variables are able to narrate the forecast situation, the more likely the errors from similar situations in the past are able to resemble the current error. The most primitive approach of using conditioning variables is to sample errors from the same time-of-day as the forecast time stamp, to account for the diurnal variation in solar irradiance. But anyone who is remotely familiar with solar forecasting would see the immediate limitation in such technique—time-of-day is insufficient to narrate a forecast situation. When we try to include more conditioning variables in the formulation, there seems to be some inherent difficulty. The logical basis for selecting conditioning variables for the method of dressing is this: Since the errors to be dressed need to resemble that of the forecast time stamp, which corresponds to a future time, the conditioning variables need to be, by themselves, forecasts. NWP models provide a wide range of output variables, so relevant forecasts of these conditional variables can be acquired. But in the case of shorter-range forecasting without NWP, it demands the forecaster to generate separate forecasts of the conditioning variable, which could be time-consuming, and the accuracy is usually not warranted if simple non-meteorological forecasting methods are used. It is to that effect that existing works in solar forecasting which use the method of dressing mostly limit their choice of conditioning variables to a set of quantities that are obtainable by calculation, such as zenith angle or clear-sky irradiance. For instance, Grantham et al. (2016) binned the past errors according to elevation angle and hour angle, which are both calculable; when a forecast becomes available, the dressing errors are only sampled from the corresponding bin. Notwithstanding, since the explanatory power of these calculable variables to the forecast situation is just as limited as time-of-day, one should not expect the conditioning to be very effective. Alternatively, one can define the forecast situation based on the forecast quantity itself, e.g., if the forecast clear-sky index is greater than 0.9, errors from the past clear-sky situations are sampled and dressed; otherwise, errors from the past cloudy situations are sampled and dressed. It is worth noting that the segregation of forecast situations may not be cleancut—to classify clear from cloudy sky, a Boolean interpretation is usually not sufficient. In this regard, Pinson and Kariniotakis (2010); Pinson and Kariniotakis (2004) proposed a fuzzy take on prediction intervals, in which the wind power forecast condition is divided into three ranges (low, medium, and high), to which three trapezoidal fuzzy sets are associated. This idea can be readily transferred to a solar forecasting scenario. Another important notion to be considered during error dressing is the weather trajectory. The temporal evolution of weather variables is thoroughly beneficial in identifying the forecast situation, therefore, it prompts one to consider block bootstrapping. A solar power forecasting application with block bootstrapping has been demonstrated by Yang (2020f), who followed the algorithm proposed by Athanasopoulos et al. (2020).

330

8.5.2.2

Solar Irradiance and Photovoltaic Power Forecasting

Dressing with parametric, mixture, or copula-based distributions

Much of the statistical theory of prediction relies on the assumption that the error distribution is normal. This (in)famous assumption has been proven to be inadequate in many situations, but when there is no other better option available, it is still often taken as the last resort. If the normality assumption on forecast error is made, with the estimated standard deviation, the prediction interval is yt ± cσt , where c is the coverage factor, e.g., c = 1.96 for a coverage probability of 95%. The estimation, and thus the expression, of σt depends on the specific statistical model of choice. If the forecasting model is in the form of linear regression, the analytic expression of σt is found in Chapter 13 of Wasserman (2013); as to the time series models, such as the autoregressive integrated moving average (ARIMA) or the exponential smoothing (ETS) families of models, standard textbook references offer the expressions of standard error (see Chapter 5 and Chapter 6 of Box et al., 2015; Hyndman et al., 2008, respectively). Not every standard error of a statistical model can be expressed analytically though. One example is lasso regression, of which the estimation of standard error remains challenging (Kyung et al., 2010), and the inventor of lasso seems to agree with it (Tibshirani, 2011). In the solar forecasting literature, Panamtash et al. (2020); Grantham et al. (2016) noted that the variance of σt may be estimated based on time-of-day as a conditioning variable; the arguments and discussions pertaining to how conditional variance estimation can be extended and carried out in a solar setting are similar to that of Section 8.5.2.1, and thus are not reiterated. In terms of mixture distribution, Munkhammar et al. (2019) developed a so-called Markov-chain mixture (MCM) distribution model, which dresses a mixture of uniform distributions onto deterministic forecasts. In their approach, the main strategy is to construct a transition probability matrix using past observations, such that for any given forecast state, the predictive distribution can be assumed to be a mixture of uniform distributions. The detailed procedure starts by cutting the range of the forecast variable (i.e., clear-sky index) into equal-size bins, each being a state, and past observations can therefore be assigned to one of these bins. Then, based on counting how historical samples transit from one state to another, the transition probability matrix is obtained. Finally, for any given current state, the forecast would be simply a piecewise uniform predictive distribution. To exemplify the procedure, the clearsky irradiance is assumed to vary from 0 to 1.2, and is cut into three bins: [0, 0.4), [0.4, 0.8), and [0.8, 1.2]. Suppose the historical data suggests a transition matrix: ⎛ ⎞ 0.2 0.5 0.3 M = ⎝0.7 0.1 0.2⎠ , (8.42) 0.3 0.3 0.4 then for any current clear-sky index observation residing in the first bin, the predictive PDF of the corresponding forecast would be: ⎧ ⎪ ⎨0.2/0.4 for 0 ≤ y < 0.4 f (y) = 0.5/0.4 for 0.4 ≤ y < 0.8 (8.43) ⎪ ⎩ 0.3/0.4 for 0.8 ≤ y ≤ 1.2

Post-processing Solar Forecasts

331

where the denominator 0.4 is the bin width in this example. If the clear-sky index observation falls into one of the other two bins, the predictive distribution would follow the probabilities in the corresponding row of the transition matrix. 8.5.3

PROBABILISTIC REGRESSION

Regression methods constitute the third class of D2P post-processing techniques. To the extent that uncertainty exists in data, all regressions are essentially probabilistic. Nonetheless, we should devise a way to distinguish regression in a D2D setting from the current D2P setting, and a clear strategy to make that distinction would be based on whether conditional heteroskedasticity modeling is supported by the method. Regression methods as seen in Section 8.3.1 either assume homoscedastic error variances (such as multiple linear regression) or do not explicitly provide uncertainty information (such as an ANN with one output neuron). Given the fact that the error distributions of solar forecasts evolve according to the weather conditions, the ability to capture such evolution is key to regression-based D2P post-processing. In this regard, the generalized additive models for location, scale and shape (GAMLSS) and quantile regression hold prominent places in D2P post-processing. Whereas the former can be regarded as a class of semiparametric regression methods, the latter is nonparametric. There is, however, a minor ambiguity with calling GAMLSS and quantile regression D2P techniques, because both of them can, in fact, be applied to post-process ensemble forecasts. Furthermore, as GAMLSS outputs predictive distributions and quantile regression outputs quantiles, they can be regarded as P2P techniques. To avoid conflict with the typology which holds the entire chapter together, the predictors of GAMLSS and quantile regression, when they are classified as D2P techniques, are multivariate weather forecasts—following the notational con( j) vention, the input variables are indexed by superscript ( j), e.g., xi ; when GAMLSS and quantile regression are classified as P2P techniques, the predictors are ensemble forecasts—following the notational convention, component forecasts are indexed by subscript j, e.g., xi j . 8.5.3.1

Generalized additive models for location, scale, and shape

The interpretation of the acronym GAMLSS can be broken down into four model specifications, each being more general than the previous one; they are: (1) multiple linear regression (MLR), (2) generalized linear model (GLM), (3) generalized additive model (GAM), and (4) GAMLSS. Hence, it is thought that the best way to introduce GAMLSS is to start with the specification of MLR down to that of GAMLSS, and observe how the specifications progressively become more general. The random variable to be forecast, at time i, is denoted as Yi . Then, with some (1) (m) predictors, xi , . . . , xi , which are m-variate observations (e.g., GHI, temperature, wind speed, or aerosol optical depth), MLR performs a linear aggregation of the observations: (1) (m) (8.44) Yi = β0 + β1 xi + · · · + βm xi + εi ,

332

Solar Irradiance and Photovoltaic Power Forecasting

with ε ∼ N (0, σ 2 ) being a zero-mean homoscedastic error term. The specification of Eq. (8.44) is equivalent to:   (8.45) Yi ∼ N μi , σ 2 , (1)

(m)

μi = β0 + β1 xi + · · · + βm xi .

(8.46)

The estimation of model parameters β0 , . . . , βm requires n samples, therefore, if we let Y = (Y1 , . . . ,Yn ) be the random vector denoting the response and ⎛ ⎞ (1) (2) (m) x1 · · · x1 1 x1 ⎜. .. .. .. ⎟ .. ⎟ X =⎜ (8.47) . . . . ⎠ ⎝ .. 1

(1)

xn

(2)

xn

···

(m)

xn

be the n × p design matrix with p = m + 1, the equivalent MLR model in vector format can be written as:   (8.48) Y ∼ N μ,σ2 , μ = Xβ, (8.49) where μ = (μ1 , . . . , μn ) and β = (β0 , . . . , βm ) . From Eq. (8.48), one can see that Y is Gaussian under the MLR specification. Hence, the first extension of the MLR specification is to relax the Gaussian assumption, and replace it with a member of the exponential family of distributions, such as gamma or inverse Gaussian (see Chapter 9 of Wasserman, 2013, for background on the exponential family). Additionally, based on Eq. (8.49), one can see that the mean vector of Y is linear. This can be modified by considering a link function η(·), e.g., a logarithm function, to allow additional flexibility in modeling the mean vector. Combining both extensions, the GLM is specified as: μ,φ), Y ∼ ExpFamily (μ μ) = X β , η(μ

(8.50) (8.51)

where φ is the parameter of an exponential family distribution. Moving on from GLM, GAM adds one more innovation, that is, the inclusion of some unspecified smooth functions during the modeling of the mean vector. Denoting the smooth function for the jth predictor as ξ j (·), the specification of GAM is: μ,φ), Y ∼ ExpFamily (μ     (1) μ ) = X β + ξ1 x η(μ + · · · + ξm x(m) ,

(8.52) (8.53)

  ( j) ( j)

where x ( j) = x1 , . . . , xn . The rationale behind the inclusion of smooth functions is that the overarching assumption of linear models, namely, the response is

Post-processing Solar Forecasts

333

linear to predictors, may not always hold. Therefore, when the smooth functions are applied to the predictors, the regression inherits a notion of nonlinearity. In particular, the smooth functions are obtained by a scatter-plot smoother, such as a smoothing spline or kernel smoother. The reader is referred to Chapter 9 of Hastie et al. (2019) for further information on GAM. Finally, building upon the two previous rounds of evolution, GAMLSS makes two more changes. One of those is that GAMLSS considers each Yi as a fourparameter distribution D, with location μi , scale σi , and shape parameters νi and τi , or in vector form: μ,σ ,ν,τ). Y ∼ D (μ (8.54) Second, owing to the four parameters in the distribution, GAMLSS model each parameter in a way that is analogous to how the mean vector is modeled under GAM. That is:     μ ) = X β 1 + ξ11 x (1) + · · · + ξm1 x (m) , η1 (μ (8.55)     σ ) = X β 2 + ξ12 x (1) + · · · + ξm2 x (m) , η2 (σ (8.56)     η3 (νν ) = X β 3 + ξ13 x (1) + · · · + ξm3 x (m) , (8.57)     (8.58) η4 (ττ ) = X β 4 + ξ14 x (1) + · · · + ξm4 x (m) . To allow more flexibility, the above four equations can allow different X and x ( j) for different parameters. As more than 100 distributions can be characterized by location, scale and shape, the generality of GAMLSS as a post-processing method can be confirmed at once. Moreover, GAMLSS also supports the truncated and finite mixture distributions, which are adequate for describing clear-sky index distributions (see Section 4.5.2). Although the modeling philosophy of GAMLSS is straightforward, its implementation can be quite tedious. Fortunately, GAMLSS has been coded in the gamlss package in R by its inventors (Rigby and Stasinopoulos, 2005; Stasinopoulos and Rigby, 2007), and the book by Stasinopoulos et al. (2017) provides a complete guide on GAMLSS, as well as how to take advantage of the R package for various modeling, forecasting, and post-processing tasks. Indeed, GAMLSS, as a regression tool, can be used for stand-alone forecasting (e.g., Brabec et al., 2015). In such a scenario, one may choose to include the lagged versions of the variation as the predictor, which is analogous to how MLRbased forecasting is performed. In another case, it is also possible to directly regress weather forecasts onto PV power, which implies that GAMLSS can be used as an irradiance-to-power conversion tool. Whereas there could be many other applications of this sort, GAMLSS is not a popular method in the field of solar forecasting, as evidenced by its low usage rate. A solar forecast post-processing application of GAMLSS has been put forth by Bakker et al. (2019). In particular, Bakker et al. (2019) assumed gamma and truncated-normal distributions in post-processing a large collection of output variables from the high-resolution non-hydrostatic NWP model HARMONIE-AROME. In terms of its deterministic performance, the two GAMLSS

334

Solar Irradiance and Photovoltaic Power Forecasting

models performed similarly to the nonparametric quantile-regression-based postprocessing methods. However, in terms of probabilistic performance, it appeared that the nonparametric post-processing techniques are more advantageous. This result can be expected, as the advantages of nonparametric statistics are well known (Wasserman, 2006), chiefly the exceptional flexibility in describing predictive distributions. 8.5.3.2

Quantile regression and its variants

The history of quantile regression (or rather, median regression) can be traced back to 1760, which predates the least squares regression by half a century (Koenker, 2005). Ironically, it was not until very recently that quantile regression started to gain major attention. Today, the importance of quantile regression as a nonparametric regression framework cannot be overstated. Since the basic form of quantile regression is highly analogous to least squares, with the only major dissimilarity being the loss function employed during parameter estimation, it is useful to recall that the least squares, for multivariate predictors, seek to minimize the residual sum of squares: 2 n  , β = arg min ∑ yi − x

i β

(8.59)

β ∈R p i=1

where p = m + 1 with m being the number of predictors, β = (β0 , . . . , βm ) are re  (1) (m)

in a D2P post-processing context is a gression weights, and x i = 1, xi , . . . , xi vector of forecast values of m weather variables made at time i. (One should not confuse x i with the x ( j) used in the GAMLSS section.) For quantile regression, instead of using a squared loss function, the minimization is with respect to a piecewise linear loss function ρτ (·) (also known as pinball loss, tick loss, check loss, tilted absolute loss), cf. Eq. (8.16):   n (8.60) β = arg min ρτ yi − x β , τ

where



β ∈R p i=1

i

  5 6   



β = ≤ x β − τ x β − y ρτ yi − x

y i i i i i   

for yi ≥ x

τ yi − x i β , i β,   =

(τ − 1) yi − x i β , for yi ≤ x

i β.

(8.61)

Figure 8.5 shows a graphical visualization of ρτ (u). From the figure, it is clear that the pinball loss, and thus the estimated regression parameters, depend on τ, with τ ∈ (0, 1). When τ = 0.5, quantile regression is no different from the least absolute deviation (LAD) regression. Equation (8.60) also implies that to obtain a list of Q quantiles to arrive at a probabilistic representation of the post-processed forecast, Q regressions need to be fitted. Being a major powerhouse of nonparametric regression, the literature on quantile regression is bulky, especially in terms of the algorithms used for parameter estimation. On this point, the book by Koenker (2005) is the

Post-processing Solar Forecasts

335

“go-to” option if one wishes to understand the theoretical foundation of quantile regression. For forecast practitioners, knowing how to operate with the R package quantreg (Koenker, 2020) is sufficient. ρτ (u)

τ −1

τ u

0

Figure 8.5 Pinball loss. For u > 0, ρτ (u) = u × τ and for u < 0, ρτ (u) = u × (τ − 1). Just like the case of least squares regression, quantile regression has many variants. In what follows, three important variants are described in brief. First of all, similar to how penalized regression can be performed in a least squares setting, penalized regression can be performed with quantile regression. For instance, the well-known lasso estimator is given by: 2 n  β − β 0 1 , β λ = arg min ∑ yi − x

β + λ β i

(8.62)

β ∈R p i=1

which aims at shrinking the unconstrained least squares estimator towards β 0 (typically taken as zero), subject to some positive λ , which is a complexity parameter that controls the amount of shrinkage. In comparison, the lasso-penalized quantile regression (LPQR) estimator takes the form:   n β − β 0 1 , β τ,λ = arg min ∑ ρτ yi − x

i β + λ β

(8.63)

β ∈R p i=1

which can be understood at once. An implementation of LPQR is also available in the quantreg package. Another variant of quantile regression is quantile regression neural network (QRNN), which introduces nonlinearity to the quantile regression framework. The basic network architecture of QRNN is a multilayer perceptron (MLP), which consists of one input layer, one hidden layer, and one output layer, as shown in Fig. 8.6. Denoting the output value from the kth hidden neuron for the ith sample as Hik , QRNN applies the hyperbolic tangent transfer function: 0 / m

( j)

Hik = tanh bk,[H] + ∑ xi w jk,[H] ,

(8.64)

j=1

where j = 1, . . . , m indexes the input variables; k = 1, . . . , o indexes the hidden neu( j) rons; w jk,[H] is the weight connecting xi to Hik ; and bk,[H] is the bias for the kth

336

Solar Irradiance and Photovoltaic Power Forecasting

hidden neuron, with the subscript “[H]” indicating “hidden.” Similarly, from the hidden layer output to the output layer, the mapping is: o

qτ,i = b[O] + ∑ Hik wk,[O] ,

(8.65)

k=1

where qτ,i is the τ th quantile at time i; wk,[O] is the weight connecting Hik to the output neuron; and b[O] is the bias for the output neuron, with the subscript “[O]” indicating “output.” The parameters of a QRNN can be acquired via optimization of the pinball loss over n samples. As is the case of MLP, the optimization of QRNN uses the back-propagation algorithm. A problem, however, arises owing to the fact that the pinball loss is not differentiable everywhere. In this regard, Cannon (2011) suggested using the Huber function to construct smooth approximates of ρτ (u), which is more discussed in Section 8.6.1.4. After this transformation, standard gradient-based algorithms are functional. An R implementation of QRNN is available from the qrnn package (Cannon, 2018). Notwithstanding, as a total of Q QRNNs need to be fitted during D2P post-processing, one for each desired quantile, the computation time can be quite substantial if the training data is sizable. There are, however, an option to use multiple output neurons, one for each quantile of interest, but the discussion is omitted.

(1)

input layer

xi

hidden layer w jk,[H]

ouput layer

Hi1

(2)

xi

wk,[O]

.. .

(3)

xi

(m)

.. .

qτ,i

Hio

xi

bk,[H]

b[O]

Figure 8.6 Architecture of a quantile regression neural network. Here, i indexes the number of training samples, i = 1, . . . , n; j indexes the number of inputs (in this case, number of weather variables), j = 1, . . . , m; k indexes the number of hidden neurons, k = 1, . . . , o; and superscripts “[H]” and “[O]” index the “hidden” and “output” layers, respectively.

The last variant of quantile regression which needs to be discussed is quantile regression forest (QRF). QRF was proposed by Meinshausen (2006), and is a generalization of random forests. In a regular random forest, the conditional mean E(Yt |xxt )

Post-processing Solar Forecasts

337

is the weighted mean of the observed response variables, i.e., n

E(Yt |xxt ) = ∑ wi (xxt )yi ,

(8.66)

i=1

where wi (xxt ) are a function of xt but always sum to 1, which can be derived from the “counting leaf” type of training (Breiman, 2001). In QRF, instead of estimating just the conditional mean, the conditional distribution is inferred: n

F(y|xxt ) = ∑ wi (xxt ) {yi ≤ y},

(8.67)

i=1

where {yi ≤ y} is an indicator function. In other words, the conditional distribution F(y|xxt ) is estimated by the weighted distribution of the observed response variables. Subsequently, the τ th quantile of the conditional predictive distribution is obtained via: (8.68) qτ = inf {y : F(y|xxt ) ≥ τ} . To compute the distribution of Yt , one needs to evaluate Eq. (8.67) for all y ∈ R. It must be apparent by now that, differing from quantile regression and QRNN, in which the training requires optimizing the pinball loss, QRF training does not directly involve minimization. Furthermore, QRF trains one model for all Q quantiles, and thus is faster than QRNN. The QRF implementation is available from the quantregForest R package (Meinshausen, 2017), which calls the random forest routine implemented in the randomForest package (Liaw and Wiener, 2002). Similar to the case of GAMLSS in the previous section, the various aforementioned variants of quantile regression have been more popular as stand-alone probabilistic solar forecasting methods (Nagy et al., 2016; David et al., 2018) than as post-processing tools. The present authors are only aware of one group of solar forecasters who used quantile regression and QRNN as D2P post-processing tools for multivariate deterministic NWP forecast (Bakker et al., 2019). However, when the input is ensemble predictions, both GAMLSS and quantile regression variants can act as P2P post-processing tools (Yagli et al., 2020b; Yang and Gueymard, 2021a,b). More specifically, GAMLSS and quantile regression are both capable of calibrating ensemble forecasts, which constitutes a main objective of P2P post-processing.

8.6

P2P POST-PROCESSING

The uncertainty information embedded in a forecast can be represented in four styles, namely, a predictive distribution, a set of quantiles, a prediction interval, or simply an ensemble of (deterministic or probabilistic) component forecasts. Among these four options, a prediction interval is the weakest form of probabilistic representation, since it does not reveal the complete probability law according to which the forecast variable behaves, rather, it only provides a nominal coverage probability with a pair of bounds describing that probability. In opposite, a (semi)parametric predictive distribution contains all relevant information about the forecast variable.

338

Solar Irradiance and Photovoltaic Power Forecasting

When such distributional assumption is not used, either because the forecaster does not possess sufficient confidence to associate a mathematical description to a physical phenomenon or because the modeling tool does not provide a sufficient degree of freedom that can respond to the potentially time-varying distributional assumptions, quantiles and ensembles are adequate, for they are nonparametric. In any case, aside from prediction intervals, which are just a special case (i.e., subset) of quantiles, the other three forms of probabilistic representations can be converted from one to another, with fairly little alternation of the embedded information. The virtue of probabilistic forecasts consists of a set of properties describing the statistical behavior of the forecasts with corresponding observations. Whereas visualizing and quantifying these properties are the main themes for Chapter 10, a brief introduction may benefit the current discussion. Calibration refers to the statistical consistency between a set of distributional forecasts Ft and a set of observations yt , for t = 1, . . . , n. As yt is a realization of its corresponding random variable, which is distributed according to Gt , then probabilistic calibration means n Gt ◦ Ft−1 (τ) → τ, for all τ ∈ (0, 1). What this definition implies is that (1/n) ∑t=1 when we define pt ≡ Ft (yt ), i.e., the value of Ft evaluated at yt , pt should be uniformly distributed. Here, pt is called the probability integral transform (PIT), and its histogram is called a PIT histogram, which serves as one of the most pivotal tools for visual assessment of goodness of probabilistic forecasts. Besides calibration, another important property is sharpness, which is related to the spread of a distributional forecast, the larger the spread, the lower the sharpness; this is a rather straightforward notion. The third property is resolution, which quantifies the average divergence of the conditional distribution of observation given a specific forecast from the climatology. Stated differently, when the forecasts are different, one wishes the observations to be conditionally distributed differently too, and the larger that difference is from the unconditional distribution (i.e., climatology), the better the probabilistic forecasts are. All these ideas are to be revisited in Chapter 10. In this section, we are concerned with post-processing techniques that can adjust those aforementioned properties, to make forecasts better. Whereas the umbrella term “calibrating probabilistic forecasts” is widely accepted as a post-processing means to improve forecast quality, which makes it seem to be relevant to calibration only, some researchers (e.g., Pinson et al., 2007) believe sharpness and resolution cannot be enhanced by applying simple post-processing methods. We do not entirely understand the rationale behind such belief, for one can simply make the forecast spread wider or narrower through post-processing—the method is known as variance deficit (or surplus), see Alessandrini et al. (2015a,b)—and thereby altering the sharpness of that forecast. In any case, ensemble model output statistics (EMOS) and ensemble dressing, which are two classes of P2P post-processing techniques that could improve the statistical properties of probabilistic forecasts, are to be discussed in Section 8.6.1. EMOS and ensemble dressing are both designed for ensemble forecasts. Alternatively, one can also achieve calibration by means of GAMLSS and quantile regressions. The second aspect of P2P post-processing is combining probabilistic forecasts, of which the idea is identical to combining deterministic forecasts, in that, each of

Post-processing Solar Forecasts

339

several forecasters issues a probabilistic forecast, and the forecasts are combined either linearly or nonlinearly, in order to form a final forecast that is hopefully better in quality. The underlying philosophy is one of the wisdom of the crowd. Although the idea as to combining probabilistic forecasts is natural, it is likely the least developed post-processing framework in the field of forecasting (Winkler et al., 2019; Wang et al., 2019d). Combining can be applied to all forms of probabilistic representations, and the member forecasts do not necessarily take the same form. Since combining deterministic forecasts as a P2D post-processing tool has achieved unparalleled success in all domains of forecasting, the benefits of P2P combining can be expected to be consequential as well. P2P combining is discussed in Section 8.6.2. 8.6.1

CALIBRATING ENSEMBLE FORECASTS

Logically speaking, the input to a calibration workflow should be ensemble forecasts. This is because the ways in which distributional forecasts and quantile forecasts are generated already inherit a sense of calibration. For instance, if one employs a Gaussian predictive distribution, and obtains the mean and variance by optimizing some scoring rule, the issued predictive distribution should be calibrated by construct; if it is not, one should attribute the source of deficiencies to the optimization routine and setup. On the other hand, calibration does not place any assumption on the form of the final calibrated forecasts, they can be (1) parametric predictive distribution (Sections 8.6.1.1 and 8.6.1.2), (2) ensemble forecasts with more members (Section 8.6.1.3), or (3) quantiles (Section 8.6.1.4). Since prediction intervals can be obtained at no cost once predictive distributions, ensembles, or quantiles are available, calibration in prediction intervals is implied. 8.6.1.1

Ensemble model output statistics

EMOS was proposed by Gneiting et al. (2005) and was in fact originally intended to be used as a forecast post-processing tool. It takes ensemble forecasts as input, and outputs a parametric predictive distribution of choice. Since Gaussian predictive distribution has been a very popular (or a rather lazy) choice, EMOS also goes by the name nonhomogeneous Gaussian regression (NGR). But most definitely, EMOS can be used when other distributional assumptions are placed, such as truncated normal (Yang, 2020d) or gamma (Baran and Nemoda, 2016). In the original work, Gneiting et al. (2005) applied EMOS to 48-h forecasts of sea level pressure and surface temperature over the North American Pacific Northwest. After that work, owing to its universality and generalizability, EMOS has been both applied in its original form and in extended forms to forecasts of a wide range of atmospheric variables, including solar and wind (e.g., Baran and Lerch, 2016; Baran and Nemoda, 2016; Thorarinsdottir and Gneiting, 2010). The original idea of EMOS is direct and compact, in that, it uses linear functions to map the ensemble members and ensemble variance to the mean and variance of the Gaussian prediction distribution. Given a poor man’s ensemble with m members at time t, denoted by xt1 , . . . , xtm , EMOS assumes the observation yt to be a realization

340

Solar Irradiance and Photovoltaic Power Forecasting

from a normal predictive distribution,   0 + w 1 xt1 + · · · + w m xtm , β 0 + β 1 St2 , Yt ∼ N w

(8.69)

i.e., its mean and variance are given by: 0 + w 1 xt1 + · · · + w m xtm , E(Yt ) = w V(Yt ) = β 0 + β 1 S2 ,

(8.70) (8.71)

t

where St2 is ensemble variance, which can be directly computed using the member forecasts: ⎡ / 02 ⎤ m m 1 1 ⎣ ∑ xt2j − St2 = (8.72) ∑ xt j ⎦ , m − 1 j=1 m j=1 5 6 0 , w 1 , . . . , wˆ m , β 0 , β 1 is the estimate of unknown model paand parameter θ = w rameter θ . In arriving at θ two competing approaches are commonly used, one is the maximum likelihood estimation and the other is CRPS minimization. It must be highlighted that maximizing the log-likelihood is equivalent to minimizing the ignorance score (IGN), which is also known as the logarithmic score or the predictive deviance. Minimizing IGN and minimizing CRPS are both optimization problems, which require objective functions. Gneiting et al. (2005) showed that the IGN and CRPS for n fitting samples, namely, (yi , xi1 , . . . , xim ), i = 1, . . . , n, can be written as: IGN =

  # 1 n " ∑ ln(2π) + ln β0 + β1 Si2 + z2i , 2n i=1

and CRPS =

1 1 n  β0 + β1 Si2 2 ∑ n i=1



1 zi [2Φ(zi ) − 1] + 2ϕ(zi ) − √ π

(8.73)

,

(8.74)

respectively, where Φ and ϕ are CDF and PDF of a standard normal distribution, and zi =

yi − (w0 + w1 xi1 + · · · + wm xim )  1 β0 + β1 Si2 2

(8.75)

is the ith standardized EMOS model error. One should note that the IGN and CRPS in Eqs. (8.73) and (8.74) in fact denote the average IGN and CRPS over n samples. Since these scoring rules can be evaluated for each sample, the scores of sample i are written in small letters, i.e., igni and crpsi . That is, IGN = CRPS =

1 n ∑ igni , n i=1

(8.76)

1 n ∑ crpsi . n i=1

(8.77)

This convention will be used throughout the remaining part of the book.

Post-processing Solar Forecasts

341

There is no a priori preference between minimizing IGN and minimizing CRPS, since both scores are aggregates of the so-called strictly proper scoring rules. Strictly proper scoring rules are effective against hedging, of which the concept has been revealed in Sections 1.1.3 and 2.2, and will be reiterated again in Section 10.2. To that end, the choice should depend solely on the scoring rule specified ex ante—the idea is identical to the case of consistent scoring function and elicitable functional discussed in Section 8.4.1. IGN-optimized forecasts perform better than CRPS-optimized forecasts under IGN and CRPS-optimized forecasts perform better than IGN-optimized forecasts under CRPS; this has been empirically demonstrated by Yang (2020d) in a solar engineering context. Based on Eqs. (8.70) and (8.71), it is immediately clear that the predictive distributions from EMOS are nonhomogeneous. When St2 varies, the variance of the final predictive distribution follows such variation. This is a very amenable characteristic, since the ensemble variance of most atmospheric variables would vary according to the weather condition, e.g., the ensemble variance of GHI forecasts can be expected to be small during clear-sky conditions, and large under skies with intermittent clouds. Notwithstanding, discretion has to be advised as to the EMOS mean, which is the weighted average of the ensemble members, which makes the default EMOS unfit for dynamical or analog ensemble forecasts with equally probable member forecasts (see also Section 8.4.2). Although this pitfall has trapped some solar forecasters (e.g., Schulz et al., 2021; Alessandrini et al., 2015a), it can be easily fixed by replacing β1 , . . . , βm with 1/m, and various post-processing software packages such as ensembleMOS in R (Yuen et al., 2018) already have this option embedded, in that, users are able to specify which members are interchangeable and which members are not. 8.6.1.2

Extensions of EMOS

The modeling philosophy of EMOS allows many variations and extensions. The simplest adaptation is to modify the Gaussian predictive distribution to some other distribution, which may better describe the error behavior. Solar forecasting proceeds with forecasting the clear-sky index, which has been known to have a bimodal distribution due to the cloud-induced switching between the clear and cloudy states of the atmosphere (Hollands and Suehrcke, 2013; Yang et al., 2017a). Although it is generally not the case that the forecast error also exhibits bimodality, one may expect the error to have fatter tails than Gaussian. In another circumstance, clear-sky index, irradiance, and solar power are all non-negative quantities, it is reasonable to truncate the lower tail of the predictive distribution at zero. Similarly, upper-tailtruncated predictive distributions may be appropriate to indicate the existence of an extraterrestrial upper bound of solar irradiance or a physically possible upper bound of the clear-sky index. If we are to perform EMOS using distributions other than Gaussian, the corresponding IGN or CRPS expressions are required. Derivations of analytic expressions of IGN and CRPS, on certain occasions, are straightforward but tedious (Gneiting et al., 2006), and on other occasions, are outright difficult. The literature, nevertheless, has accumulated the analytic expressions for a good collection

342

Solar Irradiance and Photovoltaic Power Forecasting

of distributions, for which the reader is referred to Wilks (2019) for a list of sources where these expressions can be found. Building on the idea of using more flexible predictive distributions, the GAMLSS introduced in an earlier section may be regarded as a more general form of EMOS. In contrast to its application in D2P post-processing, the inputs of GAMLSS in P2P post-processing are ensemble forecasts. Instead of modeling the variance of predictive distribution with a linear model of the ensemble variance, GAMLSS models it in the same fashion as how EMOS deals with the mean. More generally, GAMLSS predictive distributions have (at most) four parameters, and each is modeled as a linear (or nonlinear, owing to the use of smoothers, recall Section 8.5.3.1) function of member forecasts. The parameter estimation of GAMLSS is more versatile, and does not rely on the analytic expression of CRPS, which facilitates implementation (Stasinopoulos et al., 2017). In the field of solar forecasting, GAMLSS-based calibration has been demonstrated by Yagli et al. (2020b), who applied GAMLSS on ensemble solar forecasts generated using 20 data-driven models. Two distributions, namely, the truncated logistic distribution and the truncated skewed Student’s t distribution, were considered after some initial exploration of the distributional choice; both choices led to better forecasts than that produced by the truncated normal distribution, in terms of CRPS and quantile score (Yagli et al., 2020b). EMOS can also be coupled with AnEn, which is useful if the length of historical data is short, and AnEn is challenged to identify a sufficient number of meaningful analogs. Pairing EMOS with AnEn is particularly attractive in the sense that one can start with a set of deterministic forecasts and end with a set of calibrated probabilistic forecasts. When this strategy is used, Eq. (8.70) with unequal weights is inappropriate for the aforementioned reason. Besides using an equally weighted mean, one might directly use the observation that corresponds to the best analog as the mean of the predictive distribution. Subsequently, EMOS is only tasked to adjust the variance, as the variance from AnEn is almost surely under-dispersed. Some authors refer to such lack of spread as variance deficit (Alessandrini et al., 2013), which occurs when the forecaster is over-confident about her forecasts. The less common scenario would be a variance surplus, in which the forecaster is under-confident about her forecasts, e.g., by issuing climatology forecasts. EMOS is evidently useful in both cases in correcting the forecaster’s initial imprecise judgment. 8.6.1.3

Calibration via dressing

The method of dressing has been introduced as a D2P post-processing tool. The exact idea can be transferred to P2P post-processing. Dressing of ensemble forecasts was first discussed by Roulston and Smith (2003), who argued that dressing ensembles is a more complicated problem than dressing deterministic forecasts. The concern of “double counting” the uncertainty using conventional dressing techniques was raised. To be more specific, as dressing errors contain all sources of uncertainties involved throughout the forecasting process, when observed errors are dressed on ensemble forecasts, the initial condition uncertainty is accounted for twice, because dynamical ensemble in fact has explicitly factored in that uncertainty during

Post-processing Solar Forecasts

343

perturbation of initial condition. This double counting often leads to an ensemble that is too loose. To mitigate such effect, the best-member dressing method was proposed (Roulston and Smith, 2003); it proceeds as follows. Suppose there are m ensemble members initially, denoted with xi j , where i = 1, . . . , n, and j = 1, . . . , m. Given the corresponding observations yi , a total of n × m errors can be computed: ⎞ ⎛ e11 e12 · · · e1m ⎜e21 e22 · · · e2m ⎟ ⎟ ⎜ (8.78) E =⎜ . .. .. ⎟ , .. ⎝ .. . . . ⎠ en1

en2

···

enm

where ei j = xi j − yi . Traditional dressing simply samples errors from the jth column of E and dresses them onto the jth member forecast. In contrast, the best-member dressing first determines the row-wise smallest errors in each row of E , εi = ei j∗ ,

(8.79)

j∗ = arg min |ei j |,

(8.80)

j

where εi is the smallest error for row i. It follows that, for each ensemble member, the dressing errors are all drawn from the set {ε1 , . . . , εn }. There are, nevertheless, some caveats about selecting the false best member when applying the best-member dressing on multivariate ensemble forecasts (see Roulston and Smith, 2003, for further discussion). As the best-member dressing method only uses errors of the best forecast, the final dressed ensemble is subject to a smaller spread than that dressed using the traditional dressing method. Whereas gaining sharpness is desirable, the best-memberdressed forecast is prone to lack of calibration, as noted by Wang and Bishop (2005). In this regard, Wang and Bishop (2005) proposed a new dressing kernel that constrains the variance of the dressing perturbations, such that it is equal to the difference between the variance of the error of the underlying ensemble mean and the ensemble variance. Figure 2 of Wang and Bishop (2005) offers an excellent graphical illustration of their method, making the method unhesitatingly obvious to the readers. Even though the approach of Wang and Bishop (2005) improves the best-member dressing method, it can still be subjected to further refinement, as argued by Fortin et al. (2006). One of the issues is that the method proposed by Wang and Bishop (2005) can only apply if the ensemble is under-dispersed. A less noticeable problem is that the constrained error variance used by Wang and Bishop (2005) can lead to ensemble forecasts with heavier tails than that of the observations. On both accounts, a weighted-member dressing method was proposed, which integrates the notion of ranking when selecting errors to be dressed. More specifically, instead of drawing samples from an unconditional error pool, the new method proposed by Fortin et al. (2006) draws samples from a conditional pool, e.g., for the jth -rank member in the

344

Solar Irradiance and Photovoltaic Power Forecasting

current ensemble, its dressing errors are drawn from a pool of past errors of the jth -rank members. Besides dressing errors directly, one may opt to dress a distribution or a density function, and such approaches are typified by the Bayesian model averaging (BMA), which first appeared in Raftery et al. (2005). BMA dresses each ensemble member with a PDF, and subsequently linearly combines the PDFs with weights representing the posterior model probability. These weights reflect the skill of each component model on the training data. Needless to say, the default setup of BMA is only suitable for poor man’s ensembles, but not dynamical ensembles, owing to the a priori assumption on weights. In the original work, Raftery et al. (2005) applied BMA to 48-h forecasts of surface temperature issued by the University of Washington fifth-generation Pennsylvania State University–NCAR Mesoscale Model (MM5). Although MM5 is just one model, its initial conditions are taken from five different operational centers, thereby legitimizing the weight assignment. The graphical illustration in Fig. 8.7 may help one understand how the BMA predictive distribution is constructed. Mathematically, the post-processed forecast at time t via BMA is simply: gt (x) =

m

∑ w j ft j (x|xt j ),

(8.81)

j=1

where ft j (x|xt j ) is the PDF dressed for the jth component forecast, gt (x) is the PDF of the combined forecast, and x is a generic variable representing the threshold of density functions. The choice of f (·) adopted by Raftery et al. (2005) is homoscedastic Gaussian:   Xt j |xt j ∼ N a j + b j xt j , σ 2 , (8.82) which is common for surface temperature and sea level pressure. Parameters a j and b j are the estimates of the intercept and slope that are used to correct the potentially biased ensemble member; they can be obtained by regressing the training-set yi on xi j , i = 1, . . . , n. Parameters w j and σ can be estimated using an expectation– maximization algorithm. Owing to the fact that BMA dresses PDFs instead of past errors, the postprocessed forecasts from BMA take the form of a predictive distribution, in contrast to the method of dressing, as used by Roulston and Smith (2003); Wang and Bishop (2005); Fortin et al. (2006), which returns an ensemble. However, this difference does not have much practical implication, since both BMA and the method of dressing can lead to calibrated forecasts, and can be easily converted from one to the other. An implementation of BMA can be found in the R package ensembleBMA (Fraley et al., 2021). As is the case of the ensembleMOS package, ensembleBMA also allows its users to identify interchangeable members, such that equal weights can be used in scenarios where a dynamical ensemble is to be post-processed. BMA has already been applied in solar forecasting numerous times. Its earliest appearance may be the work of Aryaputera et al. (2016), who compared the choices of using the normal and skew-normal PDFs to dress the members of a poor man’s

Post-processing Solar Forecasts

345

Probability density

2.0 1.5 1.0 0.5

w4 = 0.25

w3 = 0.3 w2 = 0.2 w1 = 0.1

w5 = 0.15

0.0 x1 x2 x3

x4

x5

Forecast

Figure 8.7 An illustration of BMA. The post-processed predictive distribution is constructed from a 5-member ensemble. The weights for the (dressed) Gaussian components are indicated. The BMA predictive distribution is then obtained via the weighted sum.

ensemble of accumulated irradiance from several NWP models. Furthermore, BMA was also compared to EMOS in the same work. Although the work is very rich in content and has all the universal appeal, it did not gain much attention thus far, most likely because it was published as a conference paper. Five years later, Doubleday et al. (2021) applied BMA in a solar power forecasting context for the first time, as claimed by the authors. The innovation of Doubleday et al. (2021) on top of Aryaputera et al. (2016) is thought two-fold. First is that Doubleday et al. (2021) considered an additional preprocessing step to convert NWP ensemble members to forecast solar power at the utility scale. Second, to accommodate potential inverter clipping in their probabilistic forecasts, the authors considered the ensemble members as realizations of mixed random variables. In other words, the dressing PDF is a mixture of a discrete part and a continuous part. The discrete probability of inverter clipping is estimated using logistic regression. As for the continuous part, the beta distribution is compared to a two-sided truncated normal distribution. The results showed that BMA, largely owing to its more flexible predictive distribution, outperformed EMOS, and skill scores between 2–36% were achieved. 8.6.1.4

Calibration via quantile regression

Thus far we have seen several P2P post-processing techniques, among which EMOS, GAMLSS, and BMA post-process ensemble forecasts to predictive distribution, but the method of dressing post-processes ensemble forecasts to another set of ensemble forecasts. In this subsection, quantile regressions, which convert ensemble forecasts to quantiles, are revisited. Since the fundamental of quantile regression and its variants has been covered in Section 8.5.3.2, the present content focuses mainly on reviewing the part of the solar forecasting literature in which quantile regression is used as a P2P post-processing tool.

346

Solar Irradiance and Photovoltaic Power Forecasting

To the best of our knowledge, the first attempt to calibrate dynamical ensemble forecasts with quantile regression was performed by Bremnes (2004), who used local quantile regression to post-process precipitation forecasts from ECMWF’s EPS, and received substantial improvements over the raw ensemble. In the field of solar forecasting, Ben Bouall`egue (2017) applied quantile regression and LPQR on a set of under-dispersive forecasts from the Consortium for Small-scale Modeling (COSMO) ensemble system. Instead of using the raw ensemble members as regression input, Ben Bouall`egue (2017) took a different path. Recall that in order to obtain Q quantiles, Q quantile regressions are to be fitted (except for the case of QRF). Hence, for each regression model, Ben Bouall`egue (2017) used: (1) the τ th initial quantile retrieved from the sorted ensemble members, (2) the square of that τ th initial quantile, (3) the top-of-atmosphere radiation, (4) a collection of 49 pre-selected weather variables, and (5) the product of the initial τ th quantile with each of the 49 weather variables, making a total of 101 predictors. Evidently, given the large number of predictors selected in the case study, LPQR is well justified, and has indeed shown superiority over the regular quantile regression, in terms of the pinball loss. Notwithstanding, since Ben Bouall`egue (2017) did not consider the option of directly using all ensemble members as inputs (without the remaining predictors), it is unclear how much of the excess performance can be attributed to the additional predictors. The modeling philosophy of Ben Bouall`egue (2017) reveals the possibility of incorporating variable transformation during quantile regression modeling—the square of the τ th quantile is an example. More generally, given some predictors, x i , nonlinear extension of the quantile regression minimizes the residual sum of squares, cf. Eq. (8.60): n   βˆ = arg min ρτ yi − ζ (xxi ) , (8.83) τ



β ∈R p i=1

where ζ (·) represents some nonlinear function and ρτ represents the pinball loss function. As mentioned earlier, common options for ζ (·) include ANN (i.e., the approach of QRNN) and gradient boosted regression trees (GBRTs). Whereas GBRTs support using pinball loss as the cost function, algorithmically, see scikit-learn documentation, QRNN requires modification of the pinball loss as it is not differentiable everywhere, which hinders the use of gradient-based training algorithms. Cannon (2011) resolved the issue by invoking the Huber function (Huber, 1973, 1964), which is a hybrid L1 /L2 -norm. The idea of the Huber function, h(u), is rather straightforward: It circumvents the discontinuity of the L1 -norm at the origin by replacing it with the L2 -norm, for 0 ≤ |u| ≤ ε, where ε is a small positive real number, i.e.,  u2 /(2ε), if 0 ≤ |u| ≤ ε, h(u) = (8.84) |u| − ε/2, if |u| > ε, which can be used to approximate the pinball loss, by replacing the argument of the pinball loss with h(u) (see Eq. 10 of Cannon, 2011). In the default QRNN training routine, ε is set to 2−8 initially, and gradually reduces to 2−32 over subsequent iterations (Cannon, 2011).

Post-processing Solar Forecasts

347

In the solar forecast post-processing literature, works that adopt nonlinear variants of quantile regression can be found. For instance, Massidda and Marrocu (2018) used GBRTs with the pinball loss function to calibrate the ECMWF-EPS-based PV power forecasts, and compared the method to the more simplistic calibration with variance deficit. The 51 EPS members were first converted to 51 clear-sky index forecasts for PV power via a na¨ıve approximation, which ignores most aspects of formal irradiance-to-power conversion procedures. Then, nine quantiles were extracted from the 51 clear-sky index forecasts for PV power—perhaps as a measure to shorten the training time—and fed to 51 GBRT models to produce a probabilistic forecast with the same number of quantile levels as the EPS. The verification results show that the final ensembles are well-calibrated and are able to outperform the variance deficit method. Given the fact that Massidda and Marrocu (2018) used just nine na¨ıvely approximated low-quality input variables, whereas Ben Bouall`egue (2017) used 101 input variables, one has reason to believe that quantile regression is a fairly robust calibration tool, which is not so sensitive to the quality and number of inputs. This view has been confirmed by others (pers. comm. with Sebastian Lerch, Karlsruhe Institute of Technology, 2023). QRF had also made its appearance in forecast post-processing of solar and other meteorological variables. Taillardat et al. (2016) used QRF to calibrate the 35member ensemble forecasts of surface temperature and wind speed of M´et´eo-France and compared their results to the raw ensemble and EMOS calibrated ensemble. The authors showed that both EMOS and QRF successfully calibrated the ensemble forecasts but that QRF outperformed EMOS in terms of CRPS, since the former included additional output variables from the ensemble forecasts. In a solar context, Yagli et al. (2022) compared QRF to GAMLSS, but the results were somewhat surprising. Even though the post-processed forecasts from QRF appear to improve the calibration compared to the GAMLSS-based ones, in terms of the PIT histogram, the CRPS of the QRF-based forecasts is marginally higher than that of the GAMLSSbased ones. Besides QRNN and QRF, other nonlinear (machine learning) quantile regressions, such as the constrained quantile regression splines (CQRS), Bernstein quantile network (BQN), or monotonic composite quantile regression neural networks (MCQRNN), are available (Bremnes, 2019, 2020). Some of these options, for instance, CQRS, may experience an issue known as quantile crossing, which describes the situation where forecasts from quantile regressions with larger τ’s are smaller than forecasts from quantile regressions with smaller τ’s. Nevertheless, quantile crossing can be easily addressed with pragmatism, by rearranging the quantiles, as done in (Wang et al., 2019d; Haben and Giasemidis, 2016). Otherwise, preventive measures such as the approach of MCQRNN, which can theoretically solve the issue for some quantile regression variants, were proposed (Bremnes, 2019, 2020). That said, the improvements of these more intricate data-driven techniques over the more na¨ıve sorting approach have not been substantial (Bremnes, 2020; Schulz and Lerch, 2022), which echoes the earlier statement that quantile regression is a sufficiently robust calibration tool. As the single largest source of uncertainty in solar power forecasts is the immanent bias and under-dispersion in the raw ensemble irradiance forecasts,

348

Solar Irradiance and Photovoltaic Power Forecasting

improving the quality of base forecasts (Chapter 7) is still regarded as the principal formula for achieving better solar power forecasting. Once good base forecasts are in place, the need for overly intricate post-processing tools might be reduced to an insignificant degree. 8.6.2

COMBINING PROBABILISTIC FORECASTS

Combining probabilistic forecasts has yet to become common in solar forecast postprocessing, and most available works of this sort adopt simple averaging (e.g., Yang et al., 2019), though the results often come out to be satisfactory. Combing probabilistic forecasts shares the same motive as combining deterministic forecasts, in that, probabilistic forecasts from several forecasters or several forecasting systems are consolidated into a single probabilistic forecast, such that the combined forecast can outperform the member forecasts under some verification criteria. Interestingly, some authors have considered combining several P2P post-processed forecasts (e.g., Yang et al., 2023a; Baran and Lerch, 2018)—in those works, post-processed forecasts resulting from several EMOS and QR models were combined. “Post-processing the post-processed forecasts” is an action that is not easily justified, since it suggests inefficiency and deficiency of the first-layer post-processing. Consequently, one can also argue, or just assume, that the second-layer post-processing can still be refined and therefore apply a third-layer post-processing to it, ad infinitum, which, obviously, is neither logical nor practical. Whereas the previous results from Yang et al. (2023a); Baran and Lerch (2018) do suggest, empirically, some degree of performance improvement after combining the post-processed versions of forecasts, the theory still requires further examinations as to rationalizing the generality of such strategies. This section should nevertheless explain P2P combining in its most fundamental setup, in that, refined decorations such as the cascade of two or more postprocessing techniques are not discussed. 8.6.2.1

Model-free heuristics

As is the case of combining deterministic forecasts, model-free heuristics can be used to combine probabilistic forecasts. The term “opinion pool” was coined by Stone (1961) to narrate the situation where m forecasters each delivering a predictive distribution to an aggregator who is tasked to reconcile these different opinions into an overriding one. Denoting the individual predictive distributions as F1 , . . . , Fm , a linear combination with convex weights is G = ∑mj=1 w j Fj where w j ≥ 0 and ∑mj=1 w j = 1. Stone (1961) suggested setting all weights to 1/m to allow for a “democratic” system where all experts are equally valued; this is identical to the recommendation given for combining deterministic forecasts, whenever the rationale for preference assignment is less than absolutely certain. At time t, denoting the component predictive distribution with Ft j (x), where x is a generic variable representing the argument (i.e., threshold) of the distribution

Post-processing Solar Forecasts

349

function, then the linear opinion pool with equal weights is: Gt (x) =

1 m

m

∑ Ft j (x).

(8.85)

j=1

Furthermore, writing the mean and variance of each Ft j (x) as μt j and σt2j , it can be derived that the mean and variance of Gt (x) are: μt = σt2 =

1 m 1 m

m

∑ μt j ,

j=1 m

(8.86) 1

m

∑ σt2j + m ∑ (μt j − μt )2 ,

j=1

(8.87)

j=1

respectively, see Appendix A.3 for derivation. The second term of Eq. (8.87) reveals that the variance of the opinion pool depends on the location (i.e., mean) diversity of individual members. On the one hand, if the opinions are diverse, σt2 is large, which corresponds to an under-confident forecast (a predictive distribution that is too wide). On the other hand, Winkler et al. (2019) argued that in practice it is desirable to solicit forecasts from experts, who are likely to have similar backgrounds in terms of training and experience, but that may lead to over-confident forecasts (a predictive distribution that is too narrow). To address both potential issues, Jose et al. (2014) introduced the exterior- and interior-trimmed opinion pool to improve the calibration. In a subsequent paper, the idea of the trimmed opinion pool was applied in conjunction with random forests (Grushka-Cockayne et al., 2017). To trim an opinion pool, the forecasts need to be ordered. Two approaches are available for that: (1) ordering based on μ j (or the mean approach), and (2) ordering based on Fj (or the CDF approach) (Jose et al., 2014). In the mean approach, suppose k forecasts are to be trimmed from each side, then the exterior-trimmed average forecast is simply: m−k 1 Gt (x) = (8.88) ∑ Ft,( j) (x), m − 2k j=k+1 where Ft,( j) (x) is the CDF with the jth -rank mean value among the peers. Similarly, the interior-trimmed average forecast is: / 0 m/2−k m 1 (8.89) Gt (x) = ∑ Ft,( j) (x) + ∑ Ft,( j) (x) . m − 2k j=1 j=m/2+k+1 These simple modifications must appear very natural to any forecaster. It is, however, worth mentioning that the above notation applies when m is even, whereas if m is odd, one can modify the notation using the floor operator. An illustration of the opinion pool and its trimmed versions with m = 6 and k = 1 is shown in Fig. 8.8. In the CDF approach, instead of ranking the mean, one trims according to the CDF values over the range of the forecast variable; see Jose et al. (2014) for more details and illustrative plots.

350

Solar Irradiance and Photovoltaic Power Forecasting

Cumulative probability

(a) 1.00

Exterior−trimmed

Interior−trimed

Opinon pool

Exterior−trimmed

Interior−trimed

0.75 0.50 0.25

0.00 (b) 4

Probability density

Opinon pool

3 2 1 0 0.0

0.5

1.0

0.0

0.5 1.0 Forecast

0.0

0.5

1.0

Figure 8.8 (a) CDFs and (b) PDFs of m = 6 component forecasts (thin dashed lines), and that of linear, exterior-trimmed, and interior-trimmed opinion pools using the mean approach, with one forecast trimmed from each side, i.e., k = 1. Combining can take place for quantiles. Suppose Q quantiles are issued for time t, by each of the m forecasters or forecasting systems, the τ th quantile from forecaster j can be denoted as qτ,t j with τ = τ1 , . . . , τQ and j = 1, . . . , m. Then, the obvious strategy for obtaining a combined quantile forecast would be: qτ,t =

1 m

m

∑ qτ,t j ,

(8.90)

j=1

for each τ. Similarly, exterior- and interior-trimmed averaging could be used if the number of forecasters permits; Armstrong (2001) advised that number to be five or more. Instead of computing the mean, one can also choose to use the median as the summary statistics for {qτ,t1 , . . . , qτ,tm }. Since each group of τ th quantiles is treated separately, quantile crossing may occur. To avoid quantile crossing, one can gather all forecasts for time t into a single sorted vector. Let us suppose that there are Q quantiles and m forecasters, the vector, say q has length Q × m, i.e., q = {q1 , . . . , qQ×m }. Then, na¨ıve sorting (Wang et al., 2019d) proceeds as: qτo ,t =

o×m 1 ∑ qk , m k=(o−1)×m+1

(8.91)

where o = 1, . . . , Q. To understand Eq. (8.91), we look at o = 1, which leads to qτ1 ,t = (1/m) ∑m k=1 qk , which is just the average of m smallest quantiles in the set q . For o =

Post-processing Solar Forecasts

351

th 2, Eq. (8.91) becomes qτ2 ,t = (1/m) ∑2m k=m+1 qk , which is the average of the (m + 1) th to (2m) smallest quantiles in the set q . Clearly then, na¨ıve sorting averages size-m blocks of sorted quantiles Q times in total, resulting in a new set of Q quantiles. As for prediction intervals, Gaba et al. (2017) outlined six heuristics, namely, (1) simple averaging, (2) retrieving median, (3) minimum–maximum envelope, (4) exterior trimming, (5) interior trimming, and (6) probability averaging of endpoints and simple averaging of midpoints. These heuristics were subsequently promoted by Grushka-Cockayne and Jose (2020) through post-processing the prediction intervals submitted during the M4 forecasting competition (Taleb, 2020; Petropoulos and Makridakis, 2020; Hong, 2020). With little surprise, interval aggregation was found to be able to improve calibration. (This again confirms that a simple combination of forecasts is beneficial.) Denoting the m pairs of component prediction intervals for time t as l t = {lt1 , . . . , ltm } and ut = {ut1 , . . . , utm }, simple averaging generates the final pair of intervals using:   1 m 1 m (8.92) [lt , ut ] = ∑ lt j , m ∑ ut j , m j=1 j=1

the median method uses [lt , ut ] = [median(ll t ), median(uut )] ,

(8.93)

and the minimum–maximum envelope takes [lt , ut ] = [min(ll t ), max(uut )] ,

(8.94)

as the final prediction intervals. The exterior and interior trimming approaches are identical to Eq. (8.88) and (8.89), respectively, except they are applied to quantiles:   m 1 m−k 1 [lt , ut ] = (8.95) ∑ lt,( j) , m − k ∑ ut,( j) , m − k j=k+1 j=1   m 1 m−k 1 [lt , ut ] = (8.96) ∑ lt,( j) , m − k ∑ ut,( j) , m − k j=1 j=k+1 where k is the number of forecasts to be trimmed from either side. It should be noted that whenever trimming is used, the member forecasts need to be sorted, as can be reflected through the subscript ( j), per the current notational convention. Last but not least, probability averaging of endpoints assumes individual prediction intervals were drawn based on Gaussian CDFs, i.e., Ft j ∼ N (μt j , σt2j ), for j = 1, . . . , m. Then the probability averaged lower and upper bounds, denoted using lt∗ and ut∗ , satisfy: 1 m

m

1

m

∑ Ft j (lt∗ ) = 1 − m ∑ Ft j (ut∗ ) =

j=1

j=1

α . 2

(8.97)

In words, lt∗ and ut∗ are the thresholds at which the averages of the cumulative probabilities are α/2 and 1 − α/2. Subsequently, the interval [lt∗ , ut∗ ] is shifted towards the midpoint of the interval of simple averaging. This procedure is exemplified in Fig. 8.9, and explained as follows.

352

Solar Irradiance and Photovoltaic Power Forecasting

1. Assuming three forecasters each issued a prediction interval [lt j , ut j ], namely, [0.1, 0.5], [0.15, 0.75], and [0.5, 1.1], which is assumed to represent the 80% prediction interval, i.e., α = 0.2. 2. Convert these bounds to Ft j ∼ N (μt j , σt2j ). The method to retrieve μt j , σt2j is detailed in Appendix A.4.

Cumulative probability

3. Compute Gt (x) = m−1 ∑mj=1 Ft j (x) for all x, and evaluate Gt−1 (α/2) and Gt−1 (1 − α/2), which return lt∗ and ut∗ , respectively.   4. Compute the midpoint of the interval m−1 ∑mj=1 lt j , m−1 ∑mj=1 ut j , and compute the midpoint of [lt∗ , ut∗ ].   5. Shift [lt∗ , ut∗ ] towards the midpoint of m−1 ∑mj=1 lt j , m−1 ∑mj=1 ut j .

1.0 0.9 1−α 2 Midpoint from simple averging 0.5

(l t* + u t*) 2

α 2

l t* lt

0.1 0.0

0

ut

0.16

u t* 0.935

1.2

Forecast

Figure 8.9 Illustration of probability averaging of endpoints and simple averaging of midpoints. Based on three lt j = {0.1, 0.15, 0.5} and ut j = {0.5, 0.75, 1.1}, three Gaussian CDFs are derived (thin dotted curves). After summing the three CDFs (thick solid curve), lt∗ and ut∗ are computed from Eq. (8.97) (thin arrows starting from α/2 and 1 − α/2, and ending at 0.16 and 0.935). Then the interval [lt∗ , ut∗ ] is shifted towards the midpoint from simple averaging (small black arrows), which results in [lt , ut ].

8.6.2.2

Linear combination methods

Building on top of the idea of the opinion pool, Winkler (1968) outlined three methods, so as to allow and to decide weights w j : (1) based on some kind of ranking, (2) based on a self-rating, and (3) based on scoring rules where past predictive distributions are compared to observations. Whereas the first two methods depend on the forecaster’s judgment, which does not really fall within the scope of solar forecasting but could be useful in business and economics forecasting, it is the third method that

Post-processing Solar Forecasts

353

is of concern here. In this method, the combining weights are optimized based on the performance of past forecasts against observations, under some scoring rule. For instance, one can optimize the combining weights by minimizing the IGN or CRPS of n historical forecasts. To give perspective on such optimization procedures, consider the traditional linear pool (TLP). The combined predictive distribution of the ith training sample is expressed as: Gi (x) =

m

∑ w j Fi j (x),

(8.98)

j=1

and the combined density for that sample is: gi (x) =

m

∑ w j fi j (x).

(8.99)

j=1

Given n samples, the likelihood function is defined as (see Eq. 9.5 of Wasserman, 2013): n

Ln (w1 , . . . , wm ) = ∏ gi (yi ),

(8.100)

i=1

where gi (yi ) is the value of gi evaluated at the ith observation, yi , whereas the loglikelihood is: / 0 n

n

i=1

i=1

n (w1 , . . . , wm ) = ∑ log gi (yi ) = ∑ log

m

∑ w j fi j (yi )

,

(8.101)

j=1

where fi j (yi ) is the value of fi j evaluated at the ith observation, yi . On the other hand, the average logarithmic score, i.e., IGN, over n samples is defined as: / 0 m 1 n 1 n 1 n (8.102) IGN = ∑ igni = ∑ − log gi (yi ) = − ∑ log ∑ w j fi j (yi ) . n i=1 n i=1 n i=1 j=1 It is instantly clear that IGN = −n−1 · n (w1 , . . . , wm ). (This scoring rule is to be reiterated in Chapter 10. For now, it should be reminded that the capital-letter IGN denotes the average score over n samples, whereas the small-letter ign denotes the score of a single sample.) Since n is fixed, which does not contribute to the optimization, maximizing the log-likelihood is equivalent to minimizing the averaged IGN. In both R and Python, the evaluation of PDFs and nonlinear optimization can be done with built-in functions. Hence, the optimization task is reduced to simply writing out the cost function. Once the weights are determined after minimizing Eq. (8.102), they can be used to combine all subsequent ft j ’s, for t = n + 1, . . . , n . TLP has been shown to be a highly capable calibration tool for predictive distributions (e.g., Mitchell and Hall, 2005; Garratt et al., 2011; Li et al., 2020). In solar forecasting, however, there has not been much uptake of TLP, most likely owing to the fact that multiple predictive distributions in a meteorological context default to multiple sets of dynamical ensembles, which are rare. One interesting exception

354

Solar Irradiance and Photovoltaic Power Forecasting

is the study by Thorey et al. (2018), who combined ensemble forecasts from the 50-member ECMWF, the 34-member M´et´eo France, as well as quantile forecasts derived from the residuals of the ECMWF’s and M´et´eo France’s deterministic forecasts, into PV power forecasts out to 6 days, in a 30-min temporal resolution. The combined weights were learned with an online algorithm that minimizes the CRPS. With little surprise, combining was found to be able to improve the under-dispersed raw ensembles, and thereby boost CRPS skill. In cases where multiple sets of dynamical ensembles are unavailable, to demonstrate the novelty application of TLP in this domain, solar forecasters naturally turn to statistical solar forecasting models, which are available in bulk quantities. For instance, Bracale et al. (2017) combined a Bayesian model, a Markov-chain model, and a quantile regression model, and the weights were again obtained through CRPS minimization. The overall improvement on CRPS with compared to without TLP was reported to be around 3% for 1-h-ahead forecasting. Theoretically, because the member predictive distributions are more likely to be under- than over-dispersed, linear pooling works well as it increases the spread (Gneiting and Ranjan, 2013). Notwithstanding, there has also been evidence that linear combinations of individually calibrated forecasts are over-dispersed, which is caused by the linear pool not being flexibly dispersive (Hora, 2004; Ranjan and Gneiting, 2010). A flexibly dispersive aggregation method is a combination formula Gθ (F1 , . . . , Fm ) with θ = (w1 , . . . , wm ) that results in a neutrally dispersed combined predictive distribution G, i.e., the variance of its PIT is 1/12, which is the variance of a uniform random variable over the range 0 to 1 (Gneiting and Ranjan, 2013). Furthermore, Gneiting and Ranjan (2013) defined an aggregation method to be exchangeably flexibly dispersive if Gθ is anonymous. An example of an anonymous combination formula would be the opinion pool proposed by Stone (1961), where the predictive distributions are equally weighted. With these definitions, Gneiting and Ranjan (2013) proved that the linear pool is not a flexibly dispersive aggregation method—cf. Theorems 3.1 and 3.2 therein. Based on the linear pool, Gneiting and Ranjan (2013) introduced the notion of generalized linear combination formulas for predictive CDFs (see also Dawid et al., 1995, who studied the generalized linear combination formulas for probability forecasts of binary events). The modification from the original ones to the generalized linear combination formulas is done through a strictly monotone link function (recall the case of GAMLSS), that is,   η Gi (x) =

m

∑ w jη

  Fi j (x) .

(8.103)

j=1

When η(·) is the identity function, Eq. (8.103) reduces to the linear pool. Table 4 of Gneiting and Ranjan (2013) provides a list of link functions and the restrictions on the weights. However, generalized linear pools with link functions of certain types may still fail to be flexibly dispersive. Thus far, we have discussed linear pooling of predictive distributions. Whereas the remaining part of this section discusses linear combination of quantiles and prediction intervals, nonlinear pooling of predictive

Post-processing Solar Forecasts

355

distribution, which can often outperform the linear ones (Ranjan and Gneiting, 2010), is the topic of Section 8.6.2.3. The methodology of linear combination of quantile forecasts is similar to that of predictive distributions. Recall Eq. (8.90), which uses equal weights for aggregation, linear combination of quantile forecasts extends the equation to: qτ,i =

1 m

m

∑ wτ, j qτ,i j ,

(8.104)

j=1

by including convex weights, i.e., wτ, j ≥ 0 and ∑mj=1 wτ, j = 1. As evidenced from the above equation, the major difference between linearly combining predictive distributions and combining quantiles is that the number of weights to be estimated for the latter is more numerous, which scales with the number of nominal probability levels (i.e., τ) that are considered. The typical objective function for weight optimization is the pinball loss, in that, pinball loss to combining quantiles is as CRPS to combining predictive distributions. A solar forecasting example on the linear combination of quantiles has been presented by Bracale et al. (2019), whose work is largely based on Wang et al. (2019d) who combined quantiles in a load forecasting context. As compared to their previous work on combining predictive distribution (Bracale et al., 2017), a more elaborate approach that includes periodicity and regularization was used during weight estimation. Instead of using the Bayesian model, Markov chain, and quantile regression as done previously, Bracale et al. (2019) used quantile k-nearest neighbor, quantile regression forests, and quantile regression. It can be readily seen how many variants one can have, as to generating publications of this sort—one can (1) choose different components models, (2) combine predictive distribution, quantiles, or prediction intervals, (3) employ different cost functions and optimization routines, and (4) combine different forecast horizons and different locations. Obviously, piling works based on such strategies is not very useful; as noted by Yang and van der Meer (2021), Bracale et al. (2019) neither included extensive analyses of their results via reliability diagrams or PIT histograms, nor compared their results to their previous works, which makes it challenging to determine the generality of the proposal. Moving on to the linear combination of prediction intervals, it is again very similar, in both theory and form, to combining predictive distributions or quantiles. The general expression of combining prediction intervals at a (1 − α) nominal coverage probability, with weights, is:   m m # " (8.105) [li , ui ] = qα/2,i , q1−α/2,i = ∑ wα/2, j li j , ∑ w1−α/2, j ui j . j=1

j=1

Since prediction intervals are just a pair of quantiles, pinball loss is suitable to be used as the objective function for determining the weights. In another case, the interval score, which is also a proper score (Gneiting and Raftery, 2007), can also be applied to find weights. The interval score given an observation y and a pair of bounds [l, u] is 2 2 sα (l, u, y) = (u − l) + (l − y) · {y < l} + (y − u) · {y > u}, (8.106) α α

356

Solar Irradiance and Photovoltaic Power Forecasting

where {·} is the indicator function, which takes the value 1 if the condition that follows is true and 0 otherwise. The interval score is closely related to the Winkler score (Winkler, 1972). Since the interval score is a negatively oriented score, it can be seen from Eq. (8.106) that it rewards sharp prediction intervals and penalizes poor reliability. Ni et al. (2017) applied an ensemble lower-upper bound estimation (ELUBE) method to estimate prediction intervals. The authors used the extreme learning machine (ELM), which is a single-layer feed-forward neural network, as the base model for producing prediction intervals. The member forecasts were generated using three ELMs with different activation functions, namely the sigmoid function, radial basis function, and sine function. Subsequently, the coverage-width-based criterion (CWC) was used to determine the weights. However, since CWC is not a proper scoring rule, it can yield misleading results (Pinson and Tastu, 2014; Lauret et al., 2019). Despite that, the empirical results of Ni et al. (2017) showed that the combined interval forecasts were able to outperform the individual member forecasts in terms of the Winkler score. 8.6.2.3

Nonlinear combination methods

In order to overcome the flexible dispersivity problem of TLP, Gneiting and Ranjan (2013) proposed the spread-adjusted linear pool (SLP). SLP aims to adjust the spread of the individual predictive distributions through a parameter c > 0:   m x − med(Fi j ) Gi (x) = ∑ w j Fi0j , (8.107) c j=1   where med(Fi j ) is the median of Fi j and Fi j (x) = Fi0j x − med(Fi j ) . If Fi j has its mean equal to its median, i.e., μi j = med(Fi j ), the mean and variance of Gi (x) are: μi = σi2 =

m

∑ w j μi j ,

(8.108)

j=1 m

m

j=1

j=1

∑ w j σi2j c2 + ∑ w j (μi j − μi )2 .

(8.109)

  These results are immediate, because Fi0j (x − μi j )/c in Eq. (8.107) is in fact the CDF of the spread-adjusted predictand with mean μi j and variance c2 σi2j , and with that, the mean and variance equations of TLP, of which the derivation is shown in Appendix A.3, can be applied at once. One should, however, note that the combined density of SLP, unlike TLP, does not take the same form as the combined predictive distribution, instead, it is:   x − med(Fi j ) 1 m gi (x) = ∑ w j fi0j . (8.110) c j=1 c In practice, the SLP weights w1 , . . . , wm and distributional parameter c can be jointly estimated by maximizing the likelihood (or equivalently, minimizing the IGN),

Post-processing Solar Forecasts

357

which is compatible with the weight estimation technique of TLP as seen in Section 8.6.2.2. The log-likelihood of n-sample SLP density is: /  0 n 1 m 0 yi − med(Fi j ) , (8.111) n (w1 , . . . , wm , c) = ∑ log ∑ w j fi j c j=1 c i=1 where yi is the ith observation. Gneiting and Ranjan (2013) stated that under-dispersed predictive distributions can benefit from c ≥ 1, whereas over-dispersed or neutrally dispersed predictive distributions can benefit from c < 1. Particularly, when c = 1, SLP reduces to TLP, following Eq. (8.107). Logically speaking, if SLP reduces to TLP, the flexible dispersivity issue in TLP would also be seen in SLP. Indeed, as shown in Theorem 3.4 of Gneiting and Ranjan (2013), SLP also fails to be flexibly dispersive. From a modeling viewpoint, Eq. (8.107) allows one to set distinct values of c for each individual component, however, that extension does not help with gaining flexible dispersivity either. In this regard, SLP is only, in a sense, “more flexible” than TLP, but should never be regarded as a complete solution to the flexible dispersivity problem. At the moment, there have not been applications of SLP in solar forecasting, but works that post-process forecasts of other meteorological variables with SLP are available. For instance, using ECMWF’s temperature forecasts over locations in Germany, M¨oller and Groß (2016) considered two post-processing methods, EMOS and an extended version of EMOS, which considers additionally the temporal dependencies in forecasts through an autoregressive adjustment (the method is coined “AR-EMOS”). Subsequently, post-processed forecasts from EMOS and AR-EMOS are combined via SLP, especially after observing that the EMOS-based forecasts are under-dispersed whereas the AR-EMOS-based forecasts are over-dispersed. Combining EMOS and AR-EMOS with weights w1 = w2 = 0.5 and c = 0.9 was found to be able to improve the calibration and sharpness of the predictive distributions substantially. A few years later, M¨oller and Groß (2020) extended the AR-EMOS framework by including a heteroscedastic model of the variance that depends on a linear combination of the variance of the autoregressive errors and the current empirical variance of the corrected ensemble members. Major improvements over EMOS were again observed, particularly as the forecast horizon increased. Aside from SLP, another nonlinear combination method is the beta-transformed linear pool (BLP), which was first introduced by Ranjan and Gneiting (2010) and then generalized by Gneiting and Ranjan (2013). The combined predictive distribu0 / tion of the BLP is: Gi (x) = Bα,β

m

∑ w j Fi j (x)

,

(8.112)

j=1

whereas the combined density of the BLP is: / 0 gi (x) =

m

∑ w j fi j (x)

j=1

bα,β

/

m

∑ w j Fi j (x)

0 ,

(8.113)

j=1

where Bα,β and bα,β represent the beta CDF and PDF with parameters α, β > 0. The weights in Eqs. (8.112) and (8.113) are again convex, as in the cases of TLP and SLP.

358

Solar Irradiance and Photovoltaic Power Forecasting

The motivation for BLP emerges from the fact that ∑mj=1 w j Fi j (x) is a CDF, which also takes values in the unit interval, whereas the beta distribution is defined on the interval [0, 1], and can assume very flexible shapes ranging from a uniform distribution to a Student’s t or χ 2 distribution (Graham, 1996). When α = β = 1, the beta density becomes uniform, and BLP reduces to TLP. In terms of the statistical properties of BLP, Gneiting and Ranjan (2013) proved that when the weights w1 , . . . , wm are fixed, the variance of the PIT can attain any value in the open interval (0, 1/4). Furthermore, Gneiting and Ranjan (2013) proved that BLP is exchangeably flexibly dispersive with weights w1 , . . . , wm = m−1 . In practice, the BLP weights w1 , . . . , wm and distributional parameters α and β can again be estimated using maximum likelihood; the log-likelihood of n-sample BLP density is: n (w1 , . . . , wm , α, β ) n

= ∑ log gi (yi ) i=1

/

n

= ∑ log i=1 n



∑ w j fi j (yi )

j=1

/

= ∑ (α − 1) log i=1

n

+ ∑ log i=1

0

m

/

m

n



+ ∑ log bα,β i=1

m

∑ w j Fi j (yi )

j=1

∑ w j fi j (yi )

0

/

0

m

∑ w j Fi j (yi )

j=1

/

m

0

+ (β − 1) log 1 − ∑ w j Fi j (yi )

0

j=1

− n log B(α, β ),

(8.114)

j=1

where B denotes the classical beta function, which should not be confused with B, which denotes the beta CDF. BLP has been applied in a solar irradiance forecasting context by Fatemi et al. (2018). Fatemi et al. (2018) generated the component forecasts into two steps. In the first step, based on historical data, the mean and variance parameters of the 1-h-ahead irradiance forecast were separately regressed on the lagged versions of irradiance. In the subsequent step, each pair of distributional parameters was converted to two predictive distributions, namely, a beta distribution and a two-sided power distribution, through fitting under the paradigm of maximum likelihood. Finally, BLP was used to combine these two distributions. Although the application of BLP is novel, this work suffers from two severe defects in component model design: (1) no exogenous weather information was utilized, and (2) the authors failed to apply the widelyaccepted clear-sky models in solar forecasting. Both defects limit the interpretation on how well BLP is able to perform under state-of-the-art component models. Hence, some follow-up investigation is welcoming. 8.6.2.4

A simulation study of various pooling techniques

A simulation study is presented in this section, following the original idea of Gneiting and Ranjan (2013), to elaborate on the efficacy of various aforementioned pooling techniques, and how pooled forecasts compare to the component ones. Pooling is

Post-processing Solar Forecasts

359

to take place among m predictive distributions, i.e., m forecasters. The design on the simulation of component forecasts considers a frequently encountered real-life scenario, in which the information accessible to each forecaster is composed of a publicly available part, e.g., issued by a weather service, and a proprietary part, e.g., based on the local measurements owned by the forecaster. However, the m forecasters do not share the proprietary part of information with each other. Without loss of generality, m = 3 is used. The data-generating process, which can be thought of as the Nature’s model according to which the observations materialize, is assumed to be a combination of standard normal random variables, X0 , X1 , X2 , X3 , ε ∼ N (0, 1): Y = X0 + a1 X1 + a2 X2 + a3 X3 + ε,

(8.115)

where ε represents an error term. Echoing the above simulation design, X0 represents the public information, whereas X1 , X2 , and X3 represent the proprietary information sets of the three forecasters. Furthermore, as it is assumed that each forecaster generates forecasts using both the public and proprietary information, their predictive densities take the form:   (8.116) f1 = N X0 + a1 X1 , 1 + a22 + a23 ,   2 2 f2 = N X0 + a2 X2 , 1 + a1 + a3 , (8.117)   2 2 f3 = N X0 + a3 X3 , 1 + a1 + a2 , (8.118) where a1 = a2 = 1 and a3 = 1.4, are set strategically, such that the effect of pooling can be better analyzed. The density functions f1 , f2 , and f3 are component forecasts to be combined. Based on the above data-generating process, a total of 3000 data points are generated, among which half of the data points are used as training samples, and half as testing samples. Put it more plainly, 3000 copies of {X0 , X1 , X2 , X3 , ε} are acquired by randomly sampling from five independent standard normal distributions, with which Eqs. (8.115)–(8.118) can be evaluated 3000 times. The outcome of this stage is to arrive at, using the notation conversion of this chapter, {yi , fi j }, where i = 1, . . . , 1500 and j = 1, 2, 3, as well as {yt , ft j }, where t = 1501, . . . , 3000. Based on {yi , fi j }, the combining parameters, w1 , w2 , w3 , c, α, β are found by maximizing the log-likelihood functions of TLP, SLP, and BLP, which can be calculated using Eqs. (8.101)–(8.114). Optimization is conducted using the Rsolnp package in R (Ghalanos and Theussl, 2015), which implements the general nonlinear augmented Lagrange multiplier method solver of Ye (1987). The optimized parameters are listed in Table 8.4. With the trained combining weights and parameters, the remaining 1500 sets of predictive distributions are combined. For each individual forecast and each pooling technique, three statistics are computed: (1) the variance of PIT, which is able to quantify the calibration, since a uniform PIT would correspond to a variance of 1/12; (2) the root mean variance (RMV) of the density forecast, which is a measure of sharpness; and (3) the averaged IGN, which is the strictly proper scoring rule under

360

Solar Irradiance and Photovoltaic Power Forecasting

Table 8.4 Optimized parameters for TLP, SLP, and BLP for the case study in Section 8.6.2.4. TLP SLP BLP

α — — 1.438

β — — 1.469

c — 0.807 —

w1 0.067 0.147 0.165

w2 0.211 0.274 0.277

w3 0.722 0.579 0.557

which the goodness of forecasts is to be assessed. These statistics, alongside the same set of statistics over the training set, are depicted in Table 8.5. In terms of visualization, Fig. 8.10 (a) presents the PIT histograms of component and pooled forecasts. Additionally, the density functions of component and pooled forecasts, over four instances in the test set (t = 1501, 1502, 1503, 1504), are shown in Fig. 8.10 (b).

Table 8.5 Results of the individual and combined forecasts in terms of the variance of the PIT, the RMV of the forecasts and the averaged IGN over the training and test sets for the case study in Section 8.6.2.4. Note that the PIT variance of neutrally dispersed forecasts is 1/12 or approximately 0.083. Training f1 f2 f3 TLP SLP BLP

V(PIT) 0.085 0.082 0.086 0.069 0.080 0.084

RMV 1.990 1.990 1.732 1.974 1.739 1.648

Testing IGN 2.125 2.101 1.980 1.949 1.923 1.913

V(PIT) 0.083 0.083 0.084 0.068 0.078 0.082

RMV 1.990 1.990 1.732 1.973 1.738 1.647

IGN 2.094 2.113 1.980 1.949 1.927 1.915

As verification of probabilistic forecasts is to be detailed separately in a subsequent chapter, only a very brief interpretation of the results is provided. First of all, the relative sizes of w1 , w2 , and w3 shown in Table 8.4 reveal that the third forecaster is being valued the most, since w3 > w1 , w2 under all three combining strategies. This may be due to the fact that the spread of f3 is smaller than that of f1 and f2 , which is a desirable property in probabilistic forecasting. The second observation in regard to the simulation results, which can be seen from Table 8.5 and Fig. 8.10 (a), is that the predictive distributions f1 , f2 , and f3 are close to fairly neutrally dispersed, and therefore have led to an over-dispersed TLP, as the theory suggests. Variance of the SLP’s PIT (0.078) is closer to 1/12 than that of TLP’s PIT (0.068)—both suffering from the flexible dispersivity problem—but BLP’s PIT (0.082) is evidently better. On top of being more calibrated, the sharpness of BLP (1.647) is also signifi-

Post-processing Solar Forecasts

Relative frequency

(a) 1.2

361

f1

f2

f3

TLP

SLP

BLP

0.5

1 0 0.5 1 0 Probability integral transform

0.5

0.8 0.4 0.0 1.2 0.8 0.4 0.0 0

(b)

t = 1501

t = 1502

t = 1503

t = 1504

1

Probability density

0.2 0.1 0.0

0.2 0.1 0.0 −10

−5

0

5 TLP

10 −10 y SLP

−5

0

5

10

BLP

Figure 8.10 (a) The PIT histograms of the individual (top row) and pooled (bottom row) probabilistic forecasts for the case study in Section 8.6.2.4. As the PIT histograms of the individual forecasts are close to uniform, the resulting PIT histogram of the TLP is overdispersed. (b) Probability density functions for the component forecasts (thin gray lines) and three pooling techniques, over four instances. The corresponding observation is marked as the vertical line in each subplot

cantly higher than its peers (1.973 for TLP and 1.738 for SLP). The calibration and sharpness, or the overall goodness of density forecasts, can be assessed through IGN, which further confirms that BLP, insofar as this simulation suggests, is the superior pooling strategy.

362

8.7

Solar Irradiance and Photovoltaic Power Forecasting

SUMMARY

Insofar as how the scope of forecast post-processing is defined at the beginning of this chapter, the typology outlined in this chapter can be said to be mutually exclusive and collectively exhaustive. Since forecasts are either deterministic or probabilistic, a total of four types of post-processing, based on the direction of conversion between deterministic and probabilistic forecasts, cover all possibilities. Moreover, under each type of post-processing, several general post-processing strategies are reviewed, which forms the second layer of the typology. Indeed, this typology is able to help forecasters to crystallize what various post-processing methods are essentially doing, without being carried away by the torturous and lengthy method descriptions as commonly seen in today’s forecasting literature. Each post-processing strategy outlined in this chapter is associated with innumerable details on both theory and implementations, and each piece of detail is subjected to further refinement, as typified by the regime-switching model output statistics (Mejia et al., 2018) seen in Section 8.3.1, or the concept of P2P post-processing of the P2P post-processed forecasts (Yang et al., 2023a; Baran and Lerch, 2018) seen in Section 8.6.2. It is true that these refinement procedures often are more often than not empirically verified to be advantageous, which subsequently justify publications, but the quantified benefits may not necessarily arise from the proposals themselves, but may be due to some other latent elements embedded in the comparative experiment design, such as weak benchmarks used or the suboptimal component models. More critically, one should be aware of the fact that these additional measures and procedures generally cease to be serviceable at some point—there would always be an irreducible amount of uncertainty in forecasts. Hence, it is recommended always focusing on the typology rather than on form of post-processing. Simulation, as exemplified in Section 8.6.2.4, is an effective yet under-utilized way of quantifying the benefits of post-processing. The scope of post-processing of weather forecasts goes beyond what has been surveyed in this chapter. And many of those variations have yet to receive any wide uptake in solar forecasting. For instance, in Chapter 12 of this book, hierarchical forecast reconciliation is covered, which is essentially a post-processing technique that can handle spatially distributed forecasts within a geographical region in a joint fashion. Since grid integration pays only a very marginal amount of attention to forecasts from any individual PV plant, spatial post-processing ought to be regarded as cardinal when forecasts submitted by different plant owners are produced using different base forecasting methods and post-processed using different strategies. Besides methodological concerns, there are also practical concerns relevant to postprocessing, such as those pertaining to training data length, time-varying dynamical model configuration, or lack of ideal predictors for developing post-processing models. All these call for an indefinite amount of further investigations and adaptations. With that, we conclude this very long chapter.

Forecast 9 Deterministic Verification “Science depends upon perception and inference. Its credibility is due to the fact that the perceptions are such as any observer can test.” — Bertrand Russell

Forecast verification is the process of determining the goodness of forecasts, and it constitutes an essential component of any scientific forecasting system. Owing to its importance and complexity, forecast verification truly deserves to be treated in a separate book (see Jolliffe and Stephenson, 2012, for instance). To that end, this part of the book concerning verification serves mainly as an overview and presents only a theoretical minimum of the subject. Whereas the topic of deterministic forecast verification is covered in this chapter, verification of probabilistic forecasts is dealt with in the next. Many have contributed to the science of weather forecast verification, but none is parallel to the late Allan H. Murphy, who was a Fellow of the American Meteorological Society and devoted his professional life to the verification of weather forecasts. Indeed, in the introductory chapter of the book entitled Forecast Verification: A Practitioner’s Guide in Atmospheric Science, by Jolliffe and Stephenson (2012), Murphy’s name appeared most frequently. The content of this chapter, on that account, should be based chiefly on those contributions of Murphy. Particularly, our discussion should revolve around three general papers of his: In the first, Murphy and Winkler (1987) established a general forecast verification framework; the theory of making reference forecasts has been outlined in the second (Murphy, 1992); and in the third Murphy (1993) provided an answer to the ultimate question concerning verification—“What is a good forecast?” Murphy (1991) categorized forecast verification into absolute verification and comparative verification.1 The first of the two is concerned with the performance of a single forecasting system, which is inconsequential to decision-making as the forecast users are not presented with choices at any rate. When there are two or more forecasting systems, one turns to comparative verification. In academia, all papers proposing new forecasting methods would require comparative verification, and the favorable outcome (i.e., acceptance) of a forecasting paper is determined, in large part, based on whether the comparative verification procedure is well thought out and how well the results turn out to be. 1 Theoretically, separating verification into absolute and comparative makes the classification definitive, in that, all verification tasks must belong to one or the other of these two categories. In reality, such categorization resembles dividing systems into linear and nonlinear, or dividing animals into elephant and non-elephant. As almost all real-world systems are nonlinear and almost all animals are non-elephant, almost all verification tasks are comparative.

DOI: 10.1201/9781003203971-9

363

364

Solar Irradiance and Photovoltaic Power Forecasting

If one can demonstrate, in an intelligible way and under the identical forecast verification setup (also known as matched comparative verification), that a new approach outperforms the old one as reported in a published paper, the proposal containing the new approach must be acceptable to the same journal, insofar as the editor is rational and ethical. A guaranteed acceptance is desired by all, hence, making direct comparisons with published works is logically attractive. However, the consequence of doing so is that the effort spent might not always translate to success— such research is said to be falsifiable—because one may spend much effort but attain only poor results. For that reason, only very few forecasters opt for making head-tohead comparisons with published works, but most would settle for new datasets with which fair comparison is difficult to be carried out by others; this is not just limited to solar forecasting but applies to energy forecasting as a whole. It then follows that whatever recommendation made about forecasting verification ought to take into consideration the state of affairs in the current literature, that is, researchers tend to select specific datasets and deliver non-reproducible research. This troubling pitfall, which pertains to what is known as the unmatched comparative verification, is highly likely to persist into the foreseeable future. One remedy to the pitfall is introducing reference methods that are operable with just a minimal requirement on the available data, such that they can be implemented and accessed with respect to all forecasting setups. Furthermore, these reference methods should be able to reflect, at least to a large extent, the difficulty of a forecasting situation. Then, by comparing forecasts made using different methods, at different locations, and over different time periods with their own respective reference forecasts, one is able to arrive at a quantity known as the skill score, which quantifies the forecasting skill of a system (or forecaster) and is able to offer some comparability across different forecasting situations. The first mission of this chapter is to deliver the necessary technical details related to the use of skill score in solar forecasting, see Section 9.1. The skill score is a measure of forecast quality. Being a general notion, quality can be evaluated in numerous other ways, in that, any statistic that gauges the similarity or dissimilarity between a set of forecasts and its corresponding set of observations has the potential to affect verification result, as well as the level of confidence with which those forecasts are used. Some well-known statistics of this kind include mean bias error (MBE), root mean square error (RMSE), mean absolute error (MAE), correlation, and the Kolmogorov–Smirnov test statistic. One common trait of these statistics is that all of them can be computed from the joint distribution of forecast and observation. Indeed, according to Murphy and Winkler (1987), if the time order of forecast–observation pairs does not matter, their joint distribution contains all necessary information relevant to the verification of forecast quality. Therefore, the second mission of this chapter is to discuss the measure-oriented verification and its limitations (Section 9.2) and how to leverage the joint distribution of forecast and observation during verification to overcome those limitations (Section 9.3)—this framework is known as the Murphy–Winkler distribution-oriented forecast verification. The third mission of this chapter revolves around three types of goodness of forecasts, namely, consistency, quality, and value of forecasts, as first debated by

Deterministic Forecast Verification

365

Murphy (1993). The notion of consistency has appeared in several earlier parts of the book, more specifically, in Sections 3.2.3, 6.2.2, 8.3.3, and 8.4.1. Despite that the scenarios which motivated the discussions on consistency differ, in all previous occasions, consistency generally refers to the correspondence between forecasts and the forecaster’s judgment. Stated differently, consistency is a class of properties that certifies the agreement between what should be done in theory and what is done in reality. Quality can be defined analogously to consistency, in that, it denotes the correspondence between forecasts and observations. Each aspect of forecast quality has a widely accepted qualitative term associated with it, and each term can be quantitatively gauged in one or more ways. Finally, value denotes the correspondence between forecasts and the user’s incremental economic benefits. That is to say, a good forecast should possess a good value, which can only be assigned by forecast users. All three types of goodness are discussed jointly with a case study in Section 9.4, which demonstrates a complete process of verifying deterministic forecasts. Before we proceed, some clarifications on notation are thought helpful. Chapter 8 adopted the convention of using capital letters X and Y to denote respectively the random variables representing forecast and observation, and using lower-case letters x and y to denote particular realizations of these random variables. We shall continue to use our notational convention in this chapter, instead of the choice of symbols used in the original paper (Murphy and Winkler, 1987), which may be more accustomed to some readers. Probability density functions are denoted with f , so f (x), f (y), f (x|y), f (y|x), and f (x, y) would be how marginal, conditional, and joint densities are written. The expectation of X is E(X) ≡ μX = x f (x)dx, and the variance of X is V(X) ≡ σX2 = (x − μX )2 f (x)dx. The conditional expectation of X given Y is E(X|Y ), which is a random variable in itself, for Y can take different values. When Y takes a particular value, say y, the conditional expectation E(X|Y = y) becomes a number. What this implies is that we can further evaluate the mean of E(X|Y ) with respect to the sample space of Y , that is, EY [E(X|Y )], where the subscript indicates that the expectation is taken with respect to Y .2 Moreover, it is noted that EY [E(X|Y )] = E(X), which is known as the rule of iterated expectations, see Appendix A.1 for proof. Last but not least, small letters ϕ, ρ, κ, c, r are used to denote realizations of the clear-sky index of forecast, clear-sky index of reference forecast, clear-sky index of observation, clear-sky irradiance (or PV power), and reference irradiance (or PV power) forecast, respectively. Then, it is understood that for any time t, ϕt ≡ xt /ct , ρt ≡ rt /ct , and κt ≡ yt /ct . Occasionally, when random variables of these quantities are needed, capital letters are used, e.g., K and R denote the random variables representing the clear-sky index of observation and reference irradiance (or PV power) forecast, respectively.

9.1

MAKING REFERENCE SOLAR FORECASTS

Skill score reflects the quality of some forecasts of interest relative to some forecasts produced by a na¨ıve method. The na¨ıve method acts as a standard of reference 2 There

is no such thing as EY [E(Y |X)] or EX [E(X|Y )] obviously.

366

Solar Irradiance and Photovoltaic Power Forecasting

and carries a notion of forecast-context awareness, allowing comparison of forecasts generated under different forecasting situations. By forecasting situation we mean the condition under which the forecasts are issued, including but not limited to location, time period, sky condition, K¨oppen–Geiger climate class, amount of exogenous information, and number of forecasters in the opinion pool. Skill scores are popularly used, therefore, by weather forecasters who are confronted by the most diverse forecasting situations. To define skill score, one first needs an accuracy measure. Also known as an error metric, an accuracy measure is usually an aggregate of a scoring function, which is denoted with S. Most scoring functions are negatively oriented, that is to say, the smaller the better. One should note that the MBE is not an accuracy measure in that sense, since two large errors in opposite directions may cancel out and thus yield a small bias—MBE does not reflect forecast accuracy, but only the overall (i.e., unconditional) bias in the forecasts. Table 9.1 lists some commonly used scoring functions in statistics. In solar forecasting, only MSE, RMSE, and MAE are permissible, and those accuracy measures averaged using median or based on absolute (or squared) percentage error are inadequate. Averaging individual scores by median is inappropriate because it ignores wholly those large errors, which are arguably most harmful to power system operations (Yang et al., 2020a). On the other hand, accuracy measures based on percentage error are not used, because a few misidentifications of sky condition can result in an unrealistic large percentage error (Yang et al., 2020a).

Table 9.1 Some commonly used scoring functions, and their commonly used aggregate forms. Scoring function

Name

Aggregate form

S(x, y) = (x − y)2

Squared error

S(x, y) = |x − y|

Absolute error

S(x, y) = (x − y)2 /y2

Squared percentage error

S(x, y) = |(x − y)/y|

Absolute percentage error

Mean square error (MSE), root mean square error (RMSE) Mean absolute error (MAE), median absolute error (MdAE) Root mean square percentage error (RMSPE), root median percentage error (RMdSPE) Mean absolute percentage error (MAPE), median absolute percentage error (MdAPE)

9.1.1

SKILL SCORE

The last column of Table 9.1 suggests the different possibilities of how scores of individual forecasts can be aggregated. For instance, given n samples of forecasts of interest xt , reference forecasts rt , and their verifications yt , the mean scores of xt and

Deterministic Forecast Verification

rt are: Sfcst =

367

1 n 1 n S(xt , yt ), and Sref = ∑ S(rt , yt ), ∑ n t=1 n t=1

and the root mean scores are: 4 4 1 n 1 n Sfcst = S(xt , yt ), and Sref = ∑ ∑ S(rt , yt ). n t=1 n t=1

(9.1)

(9.2)

Since the computation procedure of the aggregated scores is self-evident from the names of the aggregate forms, e.g., MSE computes the mean of squared errors and RMSE computes the root mean of squared errors, it is uncritical to distinguish the two situations as to notation—for simplicity, all aggregated scores are denoted with S. Suppose that, besides xt and rt , there is also a set of optimal (or perfect) forecast xt∗ , then, the skill score of xt , with respect to reference forecasts rt , xt∗ , and yt , is S =

Sfcst − Sref . Sopt − Sref

(9.3)

It is clear, then, that the skill score represents the fractional improvement in the accuracy of forecasts of interest over the reference forecasts relative to the total possible improvement. During verification, x and y are fixed, and the degree of freedom that affects the value of S , therefore, resides in: (1) what constitutes the perfect forecasts x∗ , (2) the choice of scoring function S, and (3) how reference forecasts r are produced. It is often assumed that a perfect forecast is one which corresponds to no error. This thought is practically challenged due to the irreducible measurement uncertainty, which, as we have discussed earlier, is about 5% for irradiance. This 5% is non-negligible as compared to the size of a typical forecast error. Even if the measurements are assumed, for convenience, to be perfectly accurate, questions still may arise in regard to the intrinsic limitations in predicting weather events—recall the discussion in Section 3.1 about how a perfect projection of a deterministic dynamical system is foiled by limited accuracy in analysis and incomplete understanding on weather dynamics. Until now, the literature still lacks an authoritative definition of a perfect forecast, although several attempts have been made. For instance, the aggregated score of perfect forecasts is defined by Yang (2022a) through a concept known as the predictability error growth, which is the error between a control forecast and a perturbed forecast. Although this procedure can be at least dated back to Baumhefner (1984) and holds a strong theoretical foundation, its implementation requires ensemble numerical weather prediction (NWP), which is not always available to solar forecasters. In most situations, therefore, one has no choice but to assume the statement that “no-error forecast is perfect” is admissible. Consequently, Eq. (9.3) reduces to: S ∗ = 1−

Sfcst , Sref

(9.4)

368

Solar Irradiance and Photovoltaic Power Forecasting

which is a more familiar skill score expression known to solar forecasters. One should, however, bear in mind that S ∗ is always lower than S for the same forecasting situation, and therefore may incur a false sense of having bad forecasts (Liu et al., 2023). Next, one is tasked to select an S among MSE, RMSE, and MAE. Since the MSE skill score can be transformed into the RMSE skill score with ease: 4 7 MSEfcst RMSE fcst ∗ ∗ , = 1− = 1− = 1 − 1 − SMSE (9.5) SRMSE RMSEref MSEref the question is narrowed down to whether should squared or absolute error be favored in solar forecasting. Two u¨ berreviews have shown that, between the two options, RMSE has received heretofore better uptake than MAE (Yang et al., 2018a; Blaga et al., 2019). One plausible explanation could be that RMSE penalizes large errors, which are more undesirable during grid integration, as compared to those smaller errors which only induce fluctuations that could potentially be absorbed by power systems. With that being the case, the popular appeal of RMSE seems to justify sufficiently the choice of using it in skill score computation. The only remaining issue, at this stage, is how to generate reference forecasts, which is the focus of the next section. 9.1.2

REQUIREMENTS ON THE STANDARD OF REFERENCE

From Eq. (9.4), it is straightforward to see that if Sfcst > Sref , the skill score S ∗ < 0, if Sfcst < Sref , the skill score S ∗ > 0, and Sfcst = Sref , the skill score S ∗ = 0. To that end, a positive skill score marks the minimum level of confidence for any forecaster to choose the forecasts of interest over the reference forecasts. Because practically all progress in forecasting can only be claimed relative to the current best methods, one may think that choosing a set of reference forecasts that leads to the smallest possible Sref is desirable. However, generating reference forecasts through state-ofthe-art methods, though useful in some sense, is not recommended in general. If a reference forecasting method is to be considered as universal, it must satisfy two conditions. One of those is that the method is compelled to be na¨ıve, which is a synonym for “no skill.” It further implies that the reference method can be implemented in those forecasting situations with only minimal data, that is, univariate time series, to be more precise. The reason behind this is straightforward: If a reference method places too much emphasis on the forecasting technique or depends on data that is too specific, it prevents weak forecasters, who do not have the skill or the specific data, from making reference forecasts. Hence, it is not universal in that respect. The second condition for a reference method to attain universality is that it must carry a notion of forecast-context awareness, such that the performance of the reference method can reflect adequately the inherent difficulty of a forecasting situation. Though the difficulty of a forecasting situation is hard to quantify, it is known a priori to be related to variability and uncertainty of the process being forecast. It is,

Deterministic Forecast Verification

369

thus, reasonable to expect a reference forecasting method to produce high errors in situations that are harder to forecast (e.g., forecasting in the tropics), and low errors in situations that are easier to forecast (e.g., forecasting in the deserts). Further, when the reference forecasting method is applied to processes with seasonal components, it is necessary to remove the seasonal components before forecasting, for the reason that such seasonal components can be estimated to a high degree of accuracy, and therefore should not contribute to the account of predictability. As far as the two conditions are concerned, both persistence and climatology can be deemed suitable. Indeed, they are the most popular reference methods in weather forecasting. Persistence uses the most recent observation as forecast, while climatology issues the long-term mean value as forecast. There are, nevertheless, important ambiguities in these definitions that need to be addressed. When forecasts at multiple horizons are sought, such as generating hourly forecasts for the next 48 h, it is unclear whether one should use the most recent block of continuous observations as forecasts and assign one observation to each respective horizon (this is called block persistence), or simply take just the most recent observation and assign it as forecasts for all horizons (this is called single-valued persistence). At first sight, the choice may appear trivial—95% of solar forecasters, during an informal survey we conducted in 2019, chose to use block persistence—but the reality has been counterintuitive, as demonstrated by Yang (2019f), in that, the single-valued persistence outperformed block persistence in 26 out of 32 test cases under diverse climate types. On the other hand, the ambiguity in the definition of climatology is owed to those samples on which the calculation is based. If the mean is calculated based on the observations in the verification set, the climatology is said to be internal single-valued climatology, as it is internal to the verification samples. If the mean is calculated based on a larger set of samples, such as those including the observations used in training, the climatology is said to be external single-valued climatology. Beyond that, there are also internal multiple-valued climatology and external multiple-valued climatology, which both issue situation-dependent climatology forecasts. For example, in a solar forecasting context, the forecaster can issue r1 during summer, and issue r2 during winter. (The choices of conditional variables are ample, with the only prerequisite being that their values at the forecast time stamps are known ex ante.) The reader is referred to Murphy (1988) for distinctive features of these variants of climatology, as well as their implications on skill score. As stated by Murphy (1992), when choosing a reference method, it is appropriate to adopt this general rule: The na¨ıve method that produced the most accurate forecasts should be selected. By nature, persistence is able to produce better forecasts than climatology on short horizons, but such an advantage diminishes as the forecast horizon gets long. This simple fact adds resistance when the skill scores of forecasts for different horizons are being compared. Furthermore, it is not immediately clear at which horizon exactly should the transition from persistence to climatology be made. For this reason, Murphy (1992) advocated a third reference forecasting method, that is, the optimal convex combination of climatology and persistence (CLIPER), which is presented next.

370

9.1.3

Solar Irradiance and Photovoltaic Power Forecasting

OPTIMAL CONVEX COMBINATION OF CLIMATOLOGY AND PERSISTENCE

The benefits of CLIPER, though they seem quite fundamental to meteorologists, are not commonly discussed among solar forecasters. CLIPER remained unfamiliar to a vast majority of solar forecasters until recently. Even after CLIPER was formally introduced to the solar community with its benefits elaborated in full in 2019 (Yang, 2019b,f), the momentum of the old and suboptimal practices of making reference forecasts still shows little to no sign of discount. This momentum must be wholly attributed to the several repetitive and assertive recommendations made by a very influential forecaster, namely, Carlos Coimbra, in the early 2010s (Marquez and Coimbra, 2011, 2013; Coimbra et al., 2013). In those works, the so-called “smart persistence” or “clear-sky persistence,” which is just applying persistence on clear-sky index and back-transforming it to irradiance or PV power terms, was taken and recommended as the standard of reference due to its superior performance over the simple persistence. In retrospect, since clear-sky index ought to be a variable on which solar forecasting models operate, simple persistence on irradiance or PV power should never be regarded as a reasonable option in the first place. Be that as it may, this section reiterates the core concepts of CLIPER, and theoretically proves its guaranteed superiority over both persistence and climatology. Using the notation reviewed at the beginning of this chapter, the h-step-ahead internal single-valued climatology forecast of clear-sky index made at time t − h is ρtc = μK = n−1 ∑t=n t=1 κt , and the MSE over n samples is: MSECLIM =

1 n 1 n (ρtc − κt )2 = ∑ (μK − κt )2 = σK2 , ∑ n t=1 n t=1

(9.6)

that is, the variance of observation. On the other hand, the h-step-ahead persistence forecast of clear-sky index made at time t − h is ρtp = κt−h , and the MSE over n samples is: MSEPERS = =

2 1 n 1 n  p ρ − κ = ∑ (κt−h − κt )2 t ∑ t n t=1 n t=1 1 n ∑ [(κt−h − μK ) − (κt − μK )]2 ≈ 2(1 − γh )σK2 , n t=1

(9.7)

where γh is the lag-h autocorrelation of the {κt : t = 1, . . . , n} time series. If one combines ρtc and ρtp linearly, with the weight on ρtc being 1 − α and the weight on ρtp being α, the combined forecast takes the from ρtcp = (1 − α)μK + ακt−h . The MSE of ρtcp over n samples is: MSECLIPER = =

2 1 n 1 n  cp ρt − κt = ∑ [(1 − α)μK + ακt−h − κt ]2 ∑ n t=1 n t=1 1 n ∑ [α(κt−h − μK ) − (κt − μK )]2 n t=1

≈(α 2 + 1)σK2 − 2αCov(Kt−h , Kt ),

(9.8)

Deterministic Forecast Verification

371

where Kt−h and Kt are lagged copies of the same random variable. To obtain the optimal choice of α, a standard approach of evaluating the derivative can be used. Differentiate the expression of MSECLIPER with respect to α and set it to zero yields, after straightforward calculation, α optimal = γh . It follows that the MSE of optimal CLIPER reduces to:   (9.9) MSECLIPER = 1 − γh2 σK2 . The above result is significant as to the understanding of the relative size of the MSEs of the three reference methods. Combining Eqs. (9.9) and (9.6) leads  to MSECLIPER = 1 − γh2 MSECLIM , and combining Eqs. (9.9) and (9.7) leads to MSECLIPER = [(1 + γh )/2]MSEPERS . Due to the fact that correlation is bounded between −1 and 1, MSECLIPER ≤ MSECLIM and MSECLIPER ≤ MSEPERS , regardless the value of γh . In other words, the MSE of CLIPER is guaranteed to be no worse than that of either persistence or climatology. While this is true for clear-sky index forecasts, the same can be expected if the clear-sky index forecasts are backtransformed to irradiance. Nevertheless, one should not consider as identical the MSEs of the back-transformed variables (i.e., the MSEs of the clear-sky persistence, climatology, and CLIPER) with those calculated directly using simple persistence, climatology, and CLIPER.

RMSE [W/m2]

Irradiance

Clear−sky index

400 300 200 100 0

10

20 0 10 Forecast horizon, h [15−min step] RMSECLIPER

RMSECLIM

20

RMSEPERS

Figure 9.1 Root mean square errors of three na¨ıve references applied on: (left) irradiance and (right) clear-sky index, for h = 1, . . . , 25, computed and evaluated based on 15-min GHI data at Table Mountain, Colorado (40.12498◦ N, 105.23680◦ W, 1689 m), over 2015–2018.

The relationship among the three MSEs can be presented graphically, as shown in Fig. 9.1. Using 15-min global horizontal irradiance (GHI) data from Table Mountain, Colorado, 2015–2018, reference forecasts are computed and evaluated based on climatology, persistence, as well as their optimal convex combination for h = 1, . . . , 25. It should be highlighted that the three reference methods are applied to both irradiance and clear-sky index, in order to demonstrate the inadequacy of the former. In the case of the latter, clear-sky index is first retrieved via the McClear model, forecast using the reference methods, and then back-transformed to irradiance for RMSE calculation. On the one hand, when the reference methods are directly applied to

372

Solar Irradiance and Photovoltaic Power Forecasting

irradiance, the errors are much higher than the ones resulting from applying the reference methods on clear-sky index; the former can subsequently lead to overly exaggerated skill scores, which can be grossly misleading. On the other hand, both sub-figures echo the fact that persistence is preferred over climatology for short horizons, and climatology is more accurate than persistence for long horizons, but CLIPER strictly dominates the other two for all horizons. Further results for intrahour, intra-day, and day-ahead forecasting using these three reference methods have been tabulated by Yang (2019b,f) using ample data from the Baseline Surface Radiation Network. Though the advantage of using the climatology–persistence combination as the standard of reference has been demonstrated both analytically and empirically, this new reference method, as mentioned earlier, is still confronted by the issue of public acceptance. After all, (smart) persistence has been used as the reference method in solar forecasting for a decade, and the present change in traditional practice might be viewed as inappropriate and unnecessary for the reason that it may render all formerly published, persistence-based skill scores hard to interpret. Furthermore, as it is much easier to discover a paper that uses persistence-based skill score than one of Yang (2019b,f), novices entering the field are very likely to add inertia to the propagation of the suboptimal standard of reference—this whiplash is thought to be strong and long-lasting—without being aware of the better alternative. That said, all innovators, including scientists, have to fight for recognition. We therefore organized perhaps the largest collaboration in the history of solar forecasting, to advocate this new standard of reference, as well as other useful recommendations in regard to deterministic solar forecast verification, with the hope to kick start the reform from the coauthor networks of our collaborators. The reader is referred to Yang et al. (2020a) for more information.

9.2

PROBLEM OF THE MEASURE-ORIENTED FORECAST VERIFICATION FRAMEWORK

Though skill score is possibly the best resort for comparing forecasts generated under different conditions, it is not perfect, for the notion of quality goes beyond skill. For instance, some forecasters may wish to access the bias in the forecasts through MBE, some may quantify association through the correlation between forecasts and observations, and others may be interested in accuracy, for which MSE, RMSE, or MAE can be used. These statistics reflect different aspects of forecast quality, therefore, a more general framework is desired to collect these individual verification approaches under the same umbrella. In fact, such framework has received a name, as referred to by Murphy and Winkler (1987) as the measure-oriented forecast verification framework. Ever since the beginning of solar forecasting, measure-oriented forecast verification has been accepted as the default verification framework and a sufficient means to support various superiority claims made. One obvious drawback of this framework has been the sheer amount of available measures and statistics in it, which has introduced high heterogeneity and inconsistency when it comes to sub-setting. Because

Deterministic Forecast Verification

373

there is not any mandatory directive in regard to which measure should or should not be used, the choice of accuracy measures is at the full disposal of those individuals who verify forecasts. Some researchers had realized this problem in the early days (Zhang et al., 2015a), and had attempted to remedy the situation by examining the pros and cons of various accuracy measures and making recommendations. Unfortunately, after a lengthy discussion process that involved stakeholders from both the meteorological and power systems communities, the final recommendation was still far from being satisfactory. To give perspective, we quote the conclusion of (Zhang et al., 2015a): “The results showed that (i) all proposed metrics were sensitive to solar forecasts with uniform forecasting improvements; (ii) the metrics of skewness, kurtosis, and R´enyi entropy were also sensitive to solar forecasts with ramp forecasting improvements and ramp forecasting threshold; and (iii) the differences among the metrics of OVERPer, RMSE, and the remaining metrics were statistically significant. In addition, a small suite of metrics were recommended based on the sensitivity analysis and nonparametric statistical testing results, including MBE, standard deviation, skewness, kurtosis, distribution of forecast errors, R´enyi entropy, RMSE, and OVERPer.” Jie Zhang et al. A suite of metrics for assessing the performance of solar power forecasting This conclusion contains three defects, one technical, one practical, and the other logical. The technical flaw is related to what those authors referred to as the “distribution of forecast errors.” While acknowledging that the error distribution is no doubt relevant to verification, it can hardly be viewed as a metric as those authors did. The practical flaw, on the other hand, is that the resultant suite of metrics is not lean and therefore is taxing for its users. Moreover, the suite contains some rather unpopular metrics, such as OVERPer, which is an ad hoc metric without statistical rigor. Hence, there have not been many papers that strictly followed the recommendation therein made, as evidenced by the succeeding literature. Last but not least, the logical flaw owes to the fact that the entities in the suite are not at an equal level of generality, for instance, one can easily derive MBE and RMSE from the error distribution, so the former must be regarded as specifics of the latter, but not as its parallels. With that being said, there are two points brought forward by the Zhang et al. (2015a) that must be commended. First, the authors provided great empirical evidence on the sensitivity of accuracy measures to solar forecasts. This confirmation is important in its effect to the extent that a majority of solar forecasters are now aware of the limitation of measure-oriented forecast verification. Next, the authors have touched on, though imperfect, the concept of distribution in verification that ought to be emphasized. We shall take this opportunity to highlight the first point through a simulation example and discuss the second in the next section. The pragmatic relevance of statistical simulation consists in the fact that statisticians have control over the data-generating process. The question of whether a model

374

Table 9.2 Simulated global horizontal irradiance (GHI) clear-sky index (κ) for 55 hours, or 5 days with 11 daytime hours each. The clear-sky GHI (c) is obtained from the McClear model, at an arbitrary location and over an arbitrary time window. (The goal of these clear-sky values is simply to add diurnal cycles to the clear-sky index, such that the verification can be performed on GHI.) Day 1

Day 2

Day 3

Day 4

Day 5

κ

c

Index

κ

c

Index

κ

c

Index

κ

c

Index

κ

c

1 2 3 4 5 6 7 8 9 10 11

1.000 0.959 0.998 0.868 0.867 0.999 0.999 0.974 0.996 0.792 0.997

169 389 587 739 832 859 817 710 546 340 121

12 13 14 15 16 17 18 19 20 21 22

0.991 0.947 0.989 0.939 0.760 0.211 0.717 1.000 0.947 0.986 0.751

170 389 589 743 836 864 822 716 552 346 124

23 24 25 26 27 28 29 30 31 32 33

0.783 0.650 0.268 1.000 0.690 0.960 0.976 0.962 0.994 0.971 0.994

176 397 597 752 846 874 832 724 558 351 130

34 35 36 37 38 39 40 41 42 43 44

0.937 0.994 0.967 0.977 0.819 0.899 0.971 0.996 0.992 0.790 0.904

182 403 601 753 845 871 827 719 553 345 125

45 46 47 48 49 50 51 52 53 54 55

0.674 0.930 0.684 0.983 1.000 0.998 0.925 0.943 0.682 0.944 0.879

168 380 576 728 820 845 800 687 516 308 105

Solar Irradiance and Photovoltaic Power Forecasting

Index

Deterministic Forecast Verification

375

sufficiently describes the data is therefore no longer of concern. In this simulation, the hourly GHI clear-sky index is assumed to follow: κt = 1 − zt2 ,

(9.10)

where zt is a particular realization of a conditionally heteroscedastic time series Zt ∼ N (0, σt2 ), which is normally distributed with zero mean and time-varying variance that behaves as: 2 2 σt2 = 0.4zt−1 + 0.15σt−1 + 0.05. (9.11) The initial values are set to be z0 = 0 and σ02 = 0.01. Based on such a data-generating process, 55 clear-sky index values are simulated; all simulated values are assumed to correspond to daytime hours. These 55 simulated κt values are subsequently transformed to GHI values (yt ), by multiplying the clear-sky GHI estimates (ct ) obtained from the McClear model at an arbitrary location and over an arbitrary time window, so that the irradiance time series would possess diurnal cycles as the actual one does. The simulation results are tabulated in Table 9.2 and plotted in Fig. 9.2.

GHI [W m2]

750 500 250 0

0 Novice

20 Optimist

40 Statistician

c

y

Figure 9.2 A window of 55 simulated daytime hourly GHI (y) and clear-sky GHI data points (c). Forecasts generated by three forecasters, Novice, Optimist, and Statistician, over the same window are overlaid. With this simulated data, three forecasters are tasked to generate 1-h-ahead forecasts, in a rolling fashion, starting from hour 0. The Novice, who knows little about forecasting, issues forecasts using persistence, that is, ϕtNovice = κt−1 . The Optimist believes that the weather is sunny throughout the forecasting period, hence, issues a Optimist constant forecast of ϕt = 0.95 for all time stamps. The Statistician knows the data-generating process and the initial values, therefore, issues the true conditional mean as forecasts:     ϕtStatistician =E Kt |σt2 = 1 − E Zt2 |σt2  "  #2  = 1 − σt2 , =1 − V Zt |σt2 − E Zt |σt2

376

Solar Irradiance and Photovoltaic Power Forecasting

where Kt = 1 − Zt2 is the random variable from which κt materializes. SubseOptimist quently, ϕtNovice , ϕt and ϕtStatistician are transformed to GHI forecasts and verified through MBE, MAE, and RMSE.

Table 9.3 Verification results of forecasts submitted by three forecasters, generated based on the simulated data. All units are in W/m2 . forecaster

MBE

MAE

RMSE

Novice Optimist Statistician

−2.85 36.19 8.72

79.63 57.68 63.02

142.36 119.81 111.77

Table 9.3 shows some very interesting verification results. In that, each forecaster performs best under one error metric: Novice has the smallest MBE, Optimist has the smallest MAE, and Statistician has the smallest RMSE. This toy example has thereby perfectly illustrated the limitation of the measure-oriented verification framework. The “best” forecasts as perceived in one aspect of quality may be suboptimal in another aspect, preventing one from concluding the general superiority of one forecasting method over another. On this point, all published methods in the literature that claimed superiority solely based on measure-oriented verification are subject to the same criticism. The simple remedy recommended here, as well as by Yang et al. (2020a), is to not only tabulate accuracy measures but also report forecast–observation pairs, such that the performance and thus the credibility of any proposed forecasting method becomes transparent to those readers who wish to conduct their own verification. This recommendation, in turn, calls for formal methods to analyze the forecast–observation pairs.

9.3

MURPHY–WINKLER DISTRIBUTION-ORIENTED FORECAST VERIFICATION FRAMEWORK

What the forecast–observation pairs are essentially describing is the joint distribution of forecast and observation. Murphy and Winkler (1987) visioned that a forecast verification framework, in order to be useful, ought to contain the following characteristics: It shall (1) have a comprehensive representation, (2) provide insights in regard to the relationship among verification measures, (3) establish a scientific basis for developing and selecting specific verification measures in specific contexts, and most importantly, (4) minimize the number of situations that must be considered. A verification framework constructed on the basis of joint distribution possesses all of the above characteristics. A joint distribution of forecast and observation contains all necessary information relevant to quantification of forecast quality, insofar as the time order of the

Deterministic Forecast Verification

377

verification samples does not matter.3 It is, therefore, apparent that the joint distribution is an ex ante distribution, on which decisions in regard to choosing forecasting systems (or forecasters) can be based. The notion of ex ante is in contrast to that of ex post, where decisions are based on verification results, which are often rigid in the form of presentation and liable to the subjectivity of the person who made the verification. For instance, in academia, it would be transparent and much easier for editors, reviewers, and readers to judge the performance of a proposed method and the value of the proposal through the forecast–observation pairs, rather than through a few reported measures which, as we have just shown, can be misleading or even deceiving. In industry, similar arguments can be advanced, in that, power system operators or PV plant owners select forecast providers based on the submitted forecasts in the past and materialized observations, but never based on those verification results reported by the forecast providers. As far as verification of deterministic forecasts is concerned, the joint distribution between X and Y , or f (x, y), is simply the empirical relative frequency distribution of the forecast–verification pairs. In other words, one can visualize f (x, y) using a scatter plot, a three-dimensional histogram, a heat map, or a contour plot. Whereas x and y span two dimensions, the density of points, the height of the histogram, the color of the heat map, or the value of the equipotential line represents the relative frequency of the verification samples. (Each forecast–verification pair is a sample.) (a)

(b) 1200 Forecast, x [W/m2]

Forecast, x [W/m2]

1200 900 600 300 0

900 600 300 0

0

300 600 900 1200 Observation, y [W/m2]

0

300 600 900 1200 Observation, y [W/m2]

Figure 9.3 Joint and marginal distributions of 24-h-ahead hourly NAM GHI forecasts and

SURFRAD GHI observations at (a) Desert Rock, Nevada (36.624◦ N, 116.019◦ W), and (b) Penn. State Univ., Pennsylvania (40.720◦ N, 77.931◦ W), over 2015–2016. The contour lines show the 2D kernel densities.

To give perspective, Fig. 9.3 shows the scatter plots of x versus y at two locations from the Surface Radiation Budget Network (SURFRAD), namely, Desert Rock (DRA) and Pennsylvania State University (PSU), over the years 2015 3 There are some statistics, such as those based on dynamic time warping, that can only be calculated for sequential samples.

378

Solar Irradiance and Photovoltaic Power Forecasting

and 2016. The forecasts that are being verified come from the 00Z runs of the North American Mesoscale (NAM) model, with the time specification being 0h 24h } (recall Section 5.6.1). To assist visualization, contour plots {S 24h , R 1h f ,L ,U are overlaid onto the scatter plots, and the histograms representing f (x) and f (y)— the marginal densities—are drawn on the right and top of each figure. One can readily see that the NAM forecasts at DRA and PSU behave quite distinctively, in that, the scatter and contour for the DRA station are densely concentrated around the x = y identity line, whereas those for the PSU station are more spread out and with visually identifiable over-predictions, i.e., x > y, because the contour lines are wider in the upper triangular region of the plot than in the lower triangular region. Although the joint distribution contains all the information, there are, however, more convenient channels to access such information. For instance, one can examine, by comparing the histograms in Fig. 9.3, the similarity between the marginal distributions of forecast and observation, and conduct statistical tests if need be. On this point, it is well known that any joint distribution can be factored into two distributions, one conditional and the other marginal. When the random variables are forecast and observation, one has: f (x, y) = f (y|x) f (x), f (x, y) = f (x|y) f (y).

(9.12) (9.13)

The factorizations in Eqs. (9.12) and (9.13) are known as the calibration–refinement factorization and likelihood–base rate factorization, respectively. The naming convention of these factorizations follows the verification-related implications carried by each term, of which the details are elaborated in what follows. 9.3.1

CALIBRATION–REFINEMENT FACTORIZATION

The first term in the calibration–refinement factorization is f (y|x), which is the conditional distribution of observation given forecast. It describes how the observation have materialized when a particular forecast was given. Needless to say, each time a forecaster issues forecast of a specific value, he or she wishes the observation to occur in the near neighborhood of that value. Additional to that, when the same forecast value is issued multiple times on different occasions, the forecaster ought to wish the observations to be distributed evenly around that forecast value. Stated differently, for any given forecast, the f (y|x) term describes how reliable that forecast is, judging based on how well the observations have materialized according to the two above-mentioned ideals. In forecast verification, reliability is also known as calibration. Mathematically, if E(Y |X = x) = x for all x, we say that the set of forecasts is perfectly reliable or calibrated. This definition is intuitive. Suppose for some particular forecast value x, such that E(Y |X = x) = x − b, where b is a constant, it is natural to view b as a bias, and the forecast quality may be improved once b is removed. It is, then, clear that to quantify such bias for x, one would simply compute [x − E(Y |X = x)]2 , where the square is set to ignore the directional effect in

Deterministic Forecast Verification

379

the bias and only to focus on its magnitude.4 Furthermore, the total bias of this type embedded in the forecast–observation pairs can be quantified by averaging [x − E(Y |X = x)]2 for all x, that is, EX [X − E(Y |X)]2 , where the subscript of the first expectation operator indicates that the expectation is taken with respect to X—this is just written for clarity. This last expression is known as the type 1 conditional bias, which provides quantification of reliability. (a)

(b) 950 Forecast, x [W/m2]

Forecast, x [W/m2]

950

650

350

650

350

50

50 50

350 650 950 Observation, y [W/m2]

50

350 650 950 Observation, y [W/m2]

Figure 9.4 Conditional distributions of 24-h-ahead hourly SURFRAD GHI observations given NAM GHI forecasts, f (y|x), at (a) Desert Rock, Nevada (36.624◦ N, 116.019◦ W), and (b) Penn. State Univ., Pennsylvania (40.720◦ N, 77.931◦ W), over 2015–2016. Figure 9.4 shows a visualization of f (y|x), using the same data as that used in Fig. 9.3. To facilitate plotting, all forecasts are binned to ten specific forecast values, namely, {50, 150, . . . , 950} W/m2 . After binning, each specific forecast value would correspond to multiple observations, thus allowing kernel density estimation. The bottom-most densities in Fig. 9.4 depict f (y|x) at x = 50 W/m2 , and the topmost densities depict f (y|x) at x = 950 W/m2 . Also annotated on the figure are the conditional expectations, E(Y |X = x), for each x ∈ {50, 150, . . . , 950} W/m2 , which are marked as crosses. It should be noted that the horizontal distance between the identity line and the crosses is the quantity x − E(Y |X = x). The smaller the average squared distance is, the more calibrated the forecasts are. In this regard, comparing the verification results at the two locations, it can be readily seen that forecasts at DRA are more calibrated than those at PSU.5 The second term in the calibration–refinement factorization is f (x), the marginal density of forecast. What the marginal density of forecast describes is how frequently each forecast value was used. It is helpful to consider a binary deterministic forecasting case, in which the forecaster has the freedom to issue, for each time stamp, a 4 One can no doubt use the absolute operator instead. Nevertheless, squaring the term has further implications, and it shall be discussed in Section 9.3.3. 5 This comparison is made for illustration purposes only. Due to the distinct weather regimes at these two locations, nor is there any practical relevance in comparing their forecasts directly. One must leverage skill score instead.

380

Solar Irradiance and Photovoltaic Power Forecasting

sky-condition forecast that is either clear or cloudy (but not a mix of those), and after some time, the past forecasts are tallied. If the relative frequency of that forecaster issuing “clear” is P(clear) = p, then the relative frequency of that forecaster issuing “cloudy” must be P(cloudy) = 1 − p, together, they give the probability mass function (PMF) of the forecast. In the case of forecasting a continuous random variable, instead of having a PMF, the probability law according to which each forecast is issued is described by a PDF, which is f (x). If the same forecast is always issued, such as in the case of forecasting using climatology, the forecasts are said to be not at all refined. This is equivalent to saying that a set of identical forecasts has no resolving power in understanding how the observations would have materialized had the forecasts been refined. This conception of refinement can be generalized by considering f (y|x) and f (x). If the forecast and observation are independent, f (x) has no effect on f (y|x), and the latter reduces to f (y). Since this scenario corresponds to the least useful situation where forecasts are totally uninformative, one naturally wishes to maximize the difference between f (y|x) and f (y). In other words, the conditional distribution should be as distinctive to the unconditional distribution as possible, such that the forecast has resolving power, or simply resolution. Resolution can be quantified through computing the value of EX [E(Y |X) − E(Y )]2 , and the larger the difference between the conditional and unconditional mean values of observation is, the better the forecasts are. 9.3.2

LIKELIHOOD–BASE RATE FACTORIZATION

The first term in the likelihood–base rate factorization is f (x|y), the conditional distribution of forecast given a particular observation value. It indicates how often different forecasts have had been given before a particular observation value was materialized. Murphy and Winkler (1987) referred to this term as likelihood. Nevertheless, this terminology is thought to be confusing to the extent that likelihood is a general concept in Bayesian inference, which is applicable not only to forecasts but also to observations. In other words, one may also view f (y|x) as likelihood, if f (x|y) is taken as posterior. To that end, it seems more appropriate to view f (x|y) as a representation of forecast consistency. For the same value of observation, a forecaster is expected to produce the same forecast, as long as his judgment (or forecasting methodology) is consistent. However, due to various sources of uncertainty involved in the forecasting process, such consistency is generally not attainable. A relaxed expectation is hoping that those repeated forecasts under the same forecasting situation can all land in the near neighborhood of the specific observation value which corresponds to the forecasting situation, and are distributed around that observation value evenly. Our interpretation of the f (x|y) term aligns with the interpretation of f (y|x). As a set of calibrated forecasts satisfies E(Y |X = x) = x, a set of consistent forecasts should satisfy E(X|Y = y) = y. Suppose for a particular observation y, one observes E(X|Y = y) = y − b , we can say that the forecaster’s judgment is biased towards an additive constant of b , which can be quantified by calculating [y − E(X|Y = y)]2 , where the square is again employed to let go the directional effect in the bias and

Deterministic Forecast Verification

381

only to focus on its magnitude. It follows that in order to gauge the overall size of inconsistency in a set of forecasts, the expression EY [Y − E(X|Y )]2 can be deemed suitable. This last expression is known as the type 2 conditional bias. It is useful to think of the type 2 conditional bias to be originating from the forecaster’s judgmental process, whereas the type 1 conditional bias describes the bias in the forecast itself. (b)

950

Forecast, x [W/m2]

Forecast, x [W/m2]

(a)

650 350 50

950 650 350 50

50

350 650 950 Observation, y [W/m2]

50

350 650 950 Observation, y [W/m2]

Figure 9.5 Conditional distributions of 24-h-ahead hourly NAM GHI forecasts given SURFRAD GHI observations at (a) Desert Rock, Nevada (36.624◦ N, 116.019◦ W), and (b) Penn. State Univ., Pennsylvania (40.720◦ N, 77.931◦ W), over 2015–2016. Analogous to Fig. 9.4, Fig. 9.5 shows the conditional distributions of forecast given observation, that is, f (x|y). The observations are binned into ten specific values also, such that the conditional densities can be estimated based on multiple forecasts that correspond to each specific observation value. The conditional mean forecasts, or E(X|Y = y), with y ∈ {50, 100, . . . , 950} W/m2 are marked with crosses, and their vertical distances from the identity line depict y − E(X|Y = y). The smaller the average squared distance is, the more consistent the forecasts are. The second term in the likelihood–base rate factorization is f (y), the marginal density of observation. It describes how often different values of y have occurred, which is a property of the observation only. Once again, it is helpful to consider a case of binary sky condition, where the sky can be either clear or cloudy. By counting the time intervals (minutes, hours, or days depending on the time scale of interest) that are clear and cloudy, the relative frequency of occurrence and thus a PMF can be obtained, e.g., P(clear) = p and P(cloudy) = 1 − p . When the random variable is continuous, PMF is replaced analogously by a PDF, that is, f (y). The term f (y) is known as the base rate. If one were to assume again that it is the case in which forecast and observation are independent, f (x|y) = f (x) is implied, which marks the condition of another worst-case scenario. In this scenario, regardless of what the observation turns out to be, the forecaster always behaves the same. It thus can be said that the forecaster is unable to discern different forecasting situations, or lack of discrimination.

382

Solar Irradiance and Photovoltaic Power Forecasting

It is most certain that we wish the forecasts to discriminate forecasting situations as thoroughly as possible. To quantify discrimination, one can compute the value of EY [E(X|Y ) − E(X)]2 —the mean square difference between conditional and unconditional mean forecasts—the larger the difference, the better. At this stage, it has been made clear that calibration–refinement and likelihood–base rate factorizations are closely analogous in forms, but their substances, or interpretations, are profoundly different. 9.3.3

DECOMPOSITION OF MEAN SQUARE ERROR

Four statistics written in functions of the conditional and unconditional mean forecast and mean observation, as derived from the Murphy–Winkler factorizations, have been discussed in the last two sections, they are: EX [X − E(Y |X)]2 , EX [E(Y |X) − E(Y )]2 , EY [Y − E(X|Y )]2 , and EY [E(X|Y ) − E(X)]2 . The significance of these four statistics in quantifying aspects of forecast quality can be recognized not only individually, but also collectively, for the reason that all four statistics are linked to MSE. Put it differently, the MSE of a set of forecasts can be decomposed into those four terms. Before we examine that, the general concept of MSE decomposition is elaborated through the bias–variance decomposition, which is thought to be more fundamental. The bias–variance decomposition of MSE is given in Eq. (9.14) and its mathematical details are given in Appendix A.5. unconditional bias

marginal dist. association  !   !   !  MSE(X,Y ) = V(X) + V(Y ) − 2Cov(X,Y ) + [E(X) − E(Y )]2 .

(9.14)

The MSE between X and Y can be split into four components. The first two are related to the marginal densities of X and Y , as expressed in terms of their variances. Since the variance of observation is fixed, as forecasters have no control over how the process behaves, it seems that if the variance of forecast is reduced, the MSE can, in turn, be reduced. However, this thinking is invalid, insofar as reducing the variance of forecast also decreases the association between forecast and observation, which is embodied in correlation or covariance. Indeed, the first three terms in Eq. (9.14) sum up to V(X −Y ), or the variance of the forecast error. Whereas it is desirable to reduce the variance of forecast error, reducing the variance of forecast alone has no practical relevance. (Otherwise, every logical person would choose to simply issue forecasts that are constant and therefore have no variance.) The fourth term in Eq. (9.14) is the squared difference between mean forecast and mean observation, or simply the squared unconditional bias, or MBE squared. For unbiased forecasts, this term is zero, and any non-zero unconditional bias would contribute to the size of MSE. The idea which has been advanced through studying the bias–variance decomposition of MSE is that the MSE, though being a single statistic, carries different aspects of forecast quality. In this respect, two other decompositions of MSE are

Deterministic Forecast Verification

383

available. One of those, type 1 conditional bias

resolution

  !   ! MSE(X,Y ) = V(Y ) + EX [X − E(Y |X)]2 − EX [E(Y |X) − E(Y )]2 ,

(9.15)

is linked to the calibration–refinement factorization, and the other, type 2 conditional bias

discrimination

  !   ! MSE(X,Y ) = V(X) + EY [Y − E(X|Y )]2 − EY [E(X|Y ) − E(X)]2 .

(9.16)

is related to the likelihood–base rate factorization. Since the expectations in Eq. (9.15) are conditioning on forecast (or “cof”), one may refer to the decomposition as cof decomposition.Similarly, Eq. (9.16) is called conditioning on observation (or “cox”) decomposition. The mathematical derivations of these two equations can be found in Appendix A.5. MSE can be decomposed into three terms as per Eq. (9.15). The first term is the variance of observation, and there is little that can be said about that. The second term quantifies the calibration of the forecasts. Recall that we say a set of forecasts are calibrated if E(Y |X = x) = x for all x, it follows that the EX [X − E(Y |X)]2 term should be as small as possible. On the other hand, we wish a set of forecasts to be informative and to possess high resolving power, which is reflected through how much the conditional distribution of Y differs from its marginal. Therefore, one ought to desire the EX [E(Y |X) − E(Y )]2 term to be as large as possible, which is also evidenced by the negative sign in front of the term in Eq. (9.15). Another three-term decomposition of MSE is given in Eq. (9.16), which corresponds to the likelihood–base rate factorization. The first term is the variance of forecast. In the extreme, the variance of a set of constant forecasts is zero. But such forecasts drastically increase the type 2 conditional bias and have no discrimination, which is not at all beneficial. The type 2 conditional bias is the mean square difference between Y and E(X|Y ). If the forecaster’s judgment is unbiased for all forecasting situations, the EY [Y − E(X|Y )]2 term is zero. The discrimination term EY [E(X|Y ) − E(X)]2 quantifies the ability of a forecaster in differentiating forecasting situations. It should be maximized, as also indicated by the negative sign in front of it. 9.3.4

COMPUTATION OF VERIFICATION STATISTICS

Suppose there are n forecast–observation pairs, denoted by (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), one technical issue that remains is on how to compute various statistics, or accuracy measures, from these samples. For most readers of this book, it is believed that computing conventional statistics, such as MBE, RMSE, or correlation, is no problem. The complication may stem, however, from computing the conditional means, as needed by EX [X − E(Y |X)]2 , EX [E(Y |X) − E(Y )]2 , EY [Y − E(X|Y )]2 , and EY [E(X|Y ) − E(X)]2

384

Solar Irradiance and Photovoltaic Power Forecasting

One possible approach has been revealed earlier through Figs. (9.4) and (9.5), in which forecasts (or observations) are discretized, such that there are multiple observations (or forecasts) corresponding to each discretized value. For instance, in Figs. (9.4), ten discretized values of forecast are used, and there are thus ten values of E(Y |X = x). By averaging the squared differences between the discretized values and their corresponding E(Y |X = x) estimates, EX [X − E(Y |X)]2 can be estimated. The computation for the remaining three statistics can proceed in a similar manner. Statistics evaluated using this approach are affected by the number of discretized values, which may be considered to be a rather subjective choice. To circumvent the need to select the number of discretized values, it is possible to consider a nonparametric approach, leveraging kernel conditional density estimation (KCDE). The concept of kernel has been introduced in Section 3.2.4.3. For the current purpose, the conditional distributions are of interest, and the kernel conditional density estimator of f (y|x) is:   n y − yt 1 f (y|x) = ∑ wt (x) · K , (9.17) hy hy t=1 where K(·) is the kernel function of choice, hy is the bandwidth to be estimated, and the weight wt (x) is given by:   x − xt K hx , wt (x) = n  (9.18) x − xt ∑ K hx t=1 where hx is another bandwidth to be estimated (Lee et al., 2015). Hyndman et al. (1996) showed that the conditional mean estimator is: |X) = E(Y

n

∑ wt (x)yt ,

(9.19)

t=1

It should be noted that this conditional mean estimation is identical to the Nadaraya–Watson kernel estimator, as used in local regression (see Section 5.4 of |X) can be used for the Wasserman, 2006). The exact procedure for obtaining E(Y estimation of E(X|Y ), with a change of variable. The reader is referred to Yang and Perez (2019) and Appendix A.6 for more mathematical details regarding KCDE. Although KCDE is a nonparametric approach, it requires the bandwidths hx and h f to operate. For that matter, there have been automated bandwidth selection algorithms proposed in the literature (e.g., Ruppert et al., 1995). However, for the present application—MSE decomposition—the goal is to estimate EX [X − E(Y |X)]2 , EX [E(Y |X) − E(Y )]2 , EY [Y − E(X|Y )]2 , and EY [E(X|Y ) − E(X)]2 , such that they can reconstruct MSE through Eqs. (9.15) and (9.16). It is, therefore, much easier to perform a simple trial-and-error, where several choices of hx and h f are tested, and the pair that gives the best MSE reconstruction is said to be most appropriate.

Deterministic Forecast Verification

9.4

385

GOODNESS OF FORECASTS: A CASE STUDY ON VERIFICATION OF OPERATIONAL DAY-AHEAD SOLAR FORECASTS

To reach a consensus on what constitutes “a good forecast” is inherently difficult, since what each forecaster conceives is, necessarily, dependent upon her own individual experience: She knows what she has read, what she has been told, and what she has been able to infer from past information. Unlike describing a physical fact, where subjectivity is a vice, the individual conception of the abstract notion of goodness contains an element which cannot be refuted by others. A forecast may be very valuable to one forecast user but not so much to another. Hence, instead of defining the goodness of a forecast exactly with rigid rules of assessment by applying reduction indefinitely (recall Section 2.3), one has to only use it insofar as reaching a stage where the testimony can be accepted by the majority. On this account, the three types of goodness outlined by Murphy (1993) are what we believe to be both necessary and sufficient. Forecast verification that goes beyond assessing these three types of goodness would be tedious and inefficient, whereas ignoring them would lead to inconclusive verification results. Hence, the central aim of this section is to deliver a case study, as to exemplify the typical procedure that is expected during deterministic forecast verification. We shall first provide a brief account of the setup used for the case study in the immediate subsection below, and then devise several subsections to depict the results of different stages of forecast verification. 9.4.1

CASE STUDY SETUP

In the following case study, we shall take forecasting and forecast verification at the day-ahead horizon as the subject. Insofar as the actual operational context is to be respected in academic forecasting exercises, Section 5.6.1 has introduced the quadruplet of time parameters for precise specification of a forecasting task. The quadruplet for California Independent System Operator’s   (CAISO’s) day-ahead market (DAM), 72 h 1 h 14 h 24 h for example, is S ,Rf ,L ,U (Makarov et al., 2011). The forecast span (S ) for CAISO’s DAM is 72 h, which, as explained in Section 5.6.3, complicates forecast verification. Hence, for simplicity, this case study reduces S to 24 h, such that there would be no overlap between forecasts submitted in different cycles. In regard to forecast resolution (R f ), many ISOs demand just hourly value for the DAM,6 and that should be taken to be the case here. Lead time (L ) does not pose much challenge to operational solar forecasting, because operational NWP models usually issue forecasts a few times a day, out to several days ahead, as such forecasters just need to choose the nearest-possible run that can meet the lead time requirement. The same argument can be placed on the forecast update rate (U ), which is often less frequent than the refresh rate of NWP  models. In summary, the present 24 h 1 h 12 h 24 h . case study assumes S ,Rf ,L ,U 6 Some grid operators such as those of China or Hungary require 15-min values. In this case, solar forecasters would have to either downscale hourly forecasts to 15-min resolution using statistical methods, or place their faith in the local weather services to provide 15-min weather forecasts, which, fortunately, is indeed the case for China and Hungary (pers. comm. with Yanbo Shen, Chinese Meteorological Administration, and Martin J´anos Mayer, Budapest University of Technology and Economics, 2022).

386

Solar Irradiance and Photovoltaic Power Forecasting

The general framework for producing day-ahead PV generation forecasts can be divided into three main steps: (1) acquiring forecasts of the relevant atmospheric variables, such as GHI or ambient temperature, from NWP, (2) performing data-driven post-processing on the raw NWP forecasts, and (3) converting the post-processed weather forecasts to power output forecasts, with which the value of forecasts can be materialized (Mayer and Yang, 2022). In consideration of the aims and focus of each individual step, the present case study is set up and designed in such a way that can facilitate the understanding of how the three principal types of goodness—quality, consistency, and value of forecasts—can be verified; we shall attempt to make a one-to-one correspondence between steps in the forecasting framework and types of goodness through the content of Section 9.4.2, wherein NWP forecasts are verified in terms of quality, Section 9.4.3, wherein how the statistical theory of consistency descends and affects NWP forecasts post-processing is discussed, and Section 9.4.4, wherein the value of PV power forecasts converted from weather forecasts is placed in perspective to the typical penalty schemes which system operators would employ. First of all, it is our intention to take two sets of NWP forecasts to demonstrate comparative forecast verification through gauging various aspects of forecast quality. On this point, the European Centre for Medium-Range Weather Forecasts (ECMWF) High-Resolution (HRES) model and the National Centers for Environmental Prediction (NCEP) NAM model are considered. The 12–35-step-ahead deterministic forecasts from the 12Z runs of both NWP models, which cover 00:00 to 23:00 of the next day, are selected, so as to satisfy the aforementioned quadruplet of time parameters of the case study. Next, to efficiently elaborate why consistency matters in producing and verifying forecasts, we perform post-processing through three simple linear correction methods, each optimizing the raw forecasts based on a distinct objective function, i.e., aggregate score. In theory, forecasts optimized for one aggregate score should perform favorably under that scoring function against forecasts optimized by other means. Here, MAE, RMSE, and MAPE are chosen as the objective functions, with which the parameters of the linear correction models are optimized. Third, to exemplify how the value of forecasts can be materialized and ascertained, the postprocessed forecasts are converted to PV power. For that, three PV plants, each of which is located within the balancing area of a differing ISO, are devised. Each ISO is assumed to have set a unique penalty scheme for penalizing bad forecasts. The value of forecasts is accounted through the amount of penalty incurred, the smaller the amount is, the higher the value of forecasts is. The three PV plants of concern are assumed to be located at Bondville (BON), Desert Rock (DRA), and Pennsylvania State University (PSU), such that the groundbased data collected by the corresponding SURFRAD stations can be taken as the in situ measurements of the sites. Here, one year (2020) of quality-controlled data is used, and missing records are filled with the National Solar Radiation Data Base (NSRDB) satellite-derived data, such that a serially complete dataset is obtained. This dataset is used as the basis for simulating the PV output, which is to be taken as observations during the verification of PV power forecasts. All three PV plants are assumed to have a DC nominal capacity of 10 MW with a typical DC–AC ratio of 1.2, and glass/polymer open-rack PV arrays tilted at 25◦ facing south. A simple model

Deterministic Forecast Verification

387

chain—recall Section 4.5.5, and see Chapter 11 for more details on model chain—is used, which consists of: (1) the solar position algorithm (SPA) of Reda and Andreas (2008), (2) the DISC separation model (Maxwell, 1987), (3) the 1990 version of the Perez transposition model (Perez et al., 1990), (4) the cell temperature model as part of the Sandia Array Performance Model (SAPM; King et al., 2004), (5) the PVWatts DC and AC models (Dobos, 2014). Three versions of post-processed forecasts (i.e., MAE-minimized, MSE-minimized, and MAPE-minimized) are passed through the same model chain, in order to arrive at the PV power forecasts. Finally, Table 9.4 lists the penalty triggers and calculation methods set by the ISOs. Worth mentioning is that the penalty triggers are specifically designed in correspondence with the scoring functions, and the penalty calculation methods are formulated to resemble the realworld ones (cf. Table 8.1). With the penalty for unit energy P set to be 200 $/MWh, the forecast penalties of the three ISOs are evaluated on a monthly basis, based on the aggregate scores of all daylight hours, which are herein defined as those time stamps with zenith angles Z < 85◦ , incidence angles θ < 90◦ , and power output exceeding 3% of the rated capacity of the PV plant. In that, the total penalty incurred would be the sum of penalty amounts of 12 months.

Table 9.4 Penalty triggers and calculation methods for the case study. xt and yt are forecast and observed power output [MW] at time t, respectively; Cap=10 MW is the rated capacity; P=200 $/MWh is the penalty for unit energy; n is the number of daylight hours during the evaluation period; and E is the error calculated using the ISO-specific metric. Stn. BON DRA

PSU

9.4.2

Penalty trigger 1 |xt − yt | ∑ Cap > 10% n t=1 4 1 1 n E= ∑ (xt − yt )2 > 15% Cap n t=1   1 n  xt − yt  E = ∑  > 15% n t=1 yt 

E=

n

Penalty [$] (E − 10%) × Cap × P × n (E − 15%) × Cap × P × n ⎧

⎪ ⎨0.1 × P × |1.15xt − yt |dt

if 1.15xt < yt

⎪ ⎩0.1 × P ×

if 0.85xt > yt



|0.85xt − yt |dt

VERIFYING RAW NWP FORECASTS WITH MURPHY–WINKLER FRAMEWORK

The step that precedes all deterministic forecast verification procedures ought to be data inspection and visualization. In particular, various housekeeping practices outlined in Chapter 5 must be strictly adhered to, such that the quality of measurements and forecasts, as well as the alignment between the two, can be ensured. As the data selected for this case study has been studied comprehensively by the authors in a

388

Solar Irradiance and Photovoltaic Power Forecasting

series of published papers (Yang et al., 2022a,c, 2020a, 2019; Yang, 2021b, 2019d, 2018a,c), we do not reiterate too much detail on the housekeeping practices; the reader can either refer to those published papers for more information, or simply place faith in our integrity of data handling. That said, potential time alignment issues that may stem from these two datasets should be highlighted. HRES forecasts are time-averaged values and stamped at the end of the hour, whereas NAM forecasts are instantaneous values. On this account, even if two forecasts, one from each model, have the same time stamp, the values are not directly comparable. In allowing a fair comparison of these two models, the ground-based data must be averaged differently, see Fig. 5.13. Following this and other housekeeping rules, the datasets are prepared. 9.4.2.1

Assessing the joint distribution

Scatter plot is a profoundly informative yet often overlooked visualization tool for assessing the association between forecasts and observations. The information embedded in the scatter of forecasts and observations is no different from that contained in their joint density. It allows the forecaster to identify obvious outliers, time alignment issues, as well as the overall degree of disagreement between forecasts from observations. Figure 9.6 depicts the forecast–observation pairs from two NWP models at three locations. Since the density of points in each subplot concentrates densely near the identity line, one can conclude the correctness of time alignment of both datasets, for misalignment leads to an absence of concentration on the identity line. Physically, points near the identity lines likely correspond to clear-sky situations, which are easy to predict as long as a reasonable radiation scheme is selected for NWP. Also revealed in Fig. 9.6 is that the joint distributions of forecast and observation from both models show good symmetry pivoting around the identity line, which implies small bias in those forecasts—if there are significantly more points above or below the identity line, the over- or under-predictions translate to large bias. The points above the identity lines correspond to high-forecast–low-observation events, which occur when the forecasting model predicts no (or thin) clouds while cloud cover is actually observed. Conversely, low-forecast–high-observation events, which render points below the identity lines, correspond to clouds in the model compared to no or thinner clouds in the observations. If either case is persistent and systematic, it warrants bias removal, which should preferably be executed in clear-sky index terms. Although both HRES and NAM forecasts do not seem to suffer from severe bias, the variance of NAM forecast errors appears larger than that of HRES forecast errors. Having assessed it visually through scatter plots, the joint distribution is assessed quantitatively next. In regard to various performance indicators used during deterministic forecast verification, such as MBE, MAE, or RMSE, solar forecasters must be familiar with their calculation based on a given size-n sample, e.g., n (xt − yt ). A less accustomed view of these performance measures, MBE = n−1 ∑t=1 however, is that they are specific ways of summarizing the joint distribution. To give

Deterministic Forecast Verification

389

BON

DRA

PSU ECMWF−HRES

600 300 0 900

NCEP−NAM

Forecast GHI [W/m2]

900

600 300 0

300 600 900 300 600 900 300 600 900 Measured GHI [W/m2]

Figure 9.6 Scatter plots of forecast–observation pairs, from two NWP models, at three locations, over the year 2020. Whereas scatters from both models show good symmetry around the identity lines, NAM forecasts have larger overall deviations.

perspective, MBE, MAE, and RMSE can be written as: MBE = E(X −Y ) =



MAE = E(|X −Y |) = RMSE =

7

E [(X

(x − y) f (x, y)dxdy,



−Y )2 ] =

|x − y| f (x, y)dxdy, 

1 2 (x − y) f (x, y)dxdy . 2

(9.20) (9.21) (9.22)

One can see that the performance measures are simply the integrals of the product of the respective scoring functions and the joint distribution.7 Table 9.5 tabulates the MBEs, MAEs, and RMSEs of HRES and NAM forecasts, based on all forecast–observation pairs with a zenith angle smaller than 85◦ . To facilitate interpretation, percentage versions of those error metrics are reported in the parentheses following the scale-dependent metrics; the percentage versions are normalized with respect to the mean of the observations. An interesting first observation on MBE is that the forecasts from the two models are biased in the opposite direction. Although this is more likely than not to be a coincidence, such observation motivates the strategy of combining, which may benefit the combined forecasts due to bias cancellations. In terms of absolute and squared errors, which gauge accuracy, their sizes have a high correspondence with the climatology of cloudiness of the sites. Figure 9.7 7 It should be emphasized again that MBE is not an accuracy measure, since it does not depend on a scoring function—i.e., (x − y) is not a scoring function, but the error itself.

390

Solar Irradiance and Photovoltaic Power Forecasting

Table 9.5 MBE, MAE, and RMSE in W/m2 (%) of raw HRES and NAM forecasts, at three locations. MBE Stn. BON DRA PSU

MAE

RMSE

HRES

NAM

HRES

NAM

HRES

NAM

21 (6%) −7 (−1%) 9 (3%)

−20 (−5%) 12 (2%) −13 (−4%)

77 (22%) 45 (9%) 84 (25%)

95 (26%) 47 (9%) 110 (33%)

121 (34%) 77 (15%) 128 (38%)

153 (42%) 101 (19%) 170 (51%)

BON

PSU

DRA

Low cloud cover [0−1]

0.2

0.4

0.6

Figure 9.7 Annual mean low cloud cover [0–1] over the contiguous United States, for the year 2020.

shows the annual mean low cloud cover, which is a quantity between 0 and 1, over the contiguous United States. Upon calculating the mean clear-sky index at the three stations, which are 0.678, 0.907, and 0.635 for BON, DRA, and PSU, respectively, a negative correlation between it and the cloudiness depicted in Fig. 9.7 is evident. As forecasting of clouds presents one of the most challenging aspects of modern NWP (Bauer et al., 2015), forecasting irradiance at locations with high cloudiness is likely to receive higher error than doing so at sunny locations, which explains the lower accuracy at BON and PSU than at DRA. Regardless, it is found that HRES forecasts yield lower errors than NAM forecasts, under both MAE and RMSE, at all three locations, which suggests their superiority in terms of accuracy. Other aspects of quality, which can be assessed through marginal and conditional distributions, are examined next.

Deterministic Forecast Verification

9.4.2.2

391

Assessing marginal and conditional distributions

Following Eqs. (9.12) and (9.13), the joint distribution can be decomposed in two ways: (1) calibration–refinement factorization, and (2) likelihood–based rate factorization, thus allowing assessment of forecast quality through marginal and conditional distributions. Figure 9.8 shows the marginal distributions of HRES and NAM forecasts and observations at the three locations of concern, and the numeric value written in each subplot denotes the Wasserstein distance between the two distribution functions. The smaller this distance is, the more similar the two distributions are. It is on this account that we may argue that NAM produces better forecasts than HRES, which is somewhat unexpected, for Table 9.5 in the preceding section has just concluded the opposite. The marginal densities in Fig. 9.8 further reveal that HRES forecasts tend to over-predict the low-irradiance situations but underpredict the high-irradiance situations. To be more specific, both over-predictions of the low-irradiance situations and under-predictions of the high-irradiance situations tend to shift the density of forecast towards the center, as evidenced from the figure. To shrink the forecasts in low-irradiance situations and boost the forecasts in highirradiance situations, one should find as handy a simple linear correction. We shall proceed to such an investigation in a later section.

0.0020 0.0015 0.0010 0.0005 0.0000

PSU

31.0

14.9

25.1

19.8

13.7

16.1

NCEP−NAM

0.0020 0.0015 0.0010 0.0005 0.0000

DRA

ECMWF−HRES

Probability density

BON

0 300 600 900 0 300 600 900 0 300 600 900 GHI [W/m2]

Figure 9.8 Marginal distributions of the forecast (dashed line) and the observations (solid line).

Figures 9.9 and 9.10 show the conditional distributions of observation given forecast and forecast given observation, respectively. Stated in another way, f (y|x) for each NWP model at each location is depicted in Fig. 9.9, and f (x|y) in Fig. 9.10. Let us restate the definition that the forecasts are calibrated if E(Y |X = x) = x, for all x, it is desirable if the “centroids” of the conditional distributions, or the means of the conditional variables, in Fig. 9.9 lie on the identity line. Inspecting the plot, both sets of forecasts appear to be well calibrated with no conspicuous deviations, and if any conclusion about superiority in terms of calibration is to be made, one has to resort to

392

Solar Irradiance and Photovoltaic Power Forecasting

BON

DRA

PSU ECMWF−HRES

600 300 0

NCEP−NAM

Forecast GHI [W/m2]

900

900 600 300 0

0 300 600 900

0 300 600 900 0 300 600 900 2 Measured GHI [W/m ]

Figure 9.9 Conditional distributions of the observation given the forecasts, f (y|x). BON

DRA

PSU ECMWF−HRES

600 300 0

NCEP−NAM

Forecast GHI [W/m2]

900

900 600 300 0 0 300 600 900

0 300 600 900 0 300 600 900 Measured GHI [W/m2]

Figure 9.10 Conditional distributions of the forecasts given the observations, f (x|y).

quantitative assessment (see below). Similar to the concept of calibration is the type 2 conditional bias, which is defined to be the mean of y − E(X|Y = y) over a set of y values. To attain a small type 2 conditional bias, for any given particular value of y, one should endeavor the issued forecasts to have the “centroid” of the corresponding conditional distribution close to that value as much as possible. Visually, as shown in Fig. 9.10, the f (x|y) of HRES appear to have larger systematic departures from the diagonal than those of NAM; the departures are particularly noticeable from those f (x|y) of HRES at high y values. It is this observation that we may regard as a piece of evidence of the superiority of NAM forecasts.

Deterministic Forecast Verification

393

The quantitative assessment of marginal and conditional distributions is facilitated by MSE decompositions. In that, various terms in Eqs. (9.15) and (9.16) are calculated using the method outlined in Section 9.3.4, and the results are tabulated in Table 9.6. Quite a number of conclusions in regard to the forecast quality of the two NWP models can be made from this table. Through observing the variance terms it is found that V(X) of HRES is systematically lower than the corresponding V(Y ), whereas the V(X) of NAM is similar to V(Y ). This leads to the conclusion that the HRES forecasts are under-dispersed. Under-dispersion is commonplace in modeled irradiance data, be it satellite, NWP, or reanalysis; its remedy, namely, variance correction, constitutes a form of post-processing. Since that variance is a summary statistic of the marginal distribution, the quantification results based on V(X) and V(Y ) in Table 9.6 align with Fig. 9.8, i.e., the marginal distributions of the NAM forecast better resemble that of observation.

Table 9.6 Evaluation results of cof and cox decompositions. Whereas the third column shows the mean square error (MSE), the next two columns depict the variances of observations and forecasts. The last four columns correspond to EX [X − E(Y |X)]2 (calibration), EX [E(Y |X) − E(Y )]2 (resolution), EY [Y − E(X|Y )]2 (type 2 conditional bias), and EY [E(X|Y ) − E(X)]2 (discrimination), respectively. The metrics are written as exponentiations, such that all bases have the unit of W/m2 . NWP

MSE

V(Y )

V(X)

Cali.

Res.

Bias2

Dis.

BON

HRES NAM

1212

2742

2552

312

2482

542

1532

2742

2772

552

2342

482

2302 2372

DRA

HRES NAM

772 1012

2942 2932

2772 2942

162 302

2842 2772

302 252

2672 2772

PSU

HRES NAM

1282 1702

2652 2652

2422 2722

232 672

2332 2142

602 572

2142 2202

Stn.

In terms of calibration, i.e., EX [X − E(Y |X)]2 , HRES turns out to be more attractive than NAM, at all stations. In contrast, HRES is less appealing than NAM as to the type 2 conditional bias, that is, EY [Y − E(X|Y )]2 , which echoes the earlier observations made from Fig. 9.10. For the resolution term EX [E(Y |X) − E(Y )]2 , which should be maximized, HRES attains better results, whereas for the discrimination term EY [E(X|Y ) − E(X)]2 , which should also be maximized, NAM turns out to be favorable. With these results, it is not possible to conclude based upon absolute dominance which set of forecasts is better, since both models are preferable over some aspects but not others. This kind of inconclusive verification outcome—no model outperforms its peers in all aspects—is rather common (e.g., Yang and Perez, 2019; Yang et al., 2022a), which is why examining multiple aspects of quality is essential. On the other hand, since each of calibration, refinement, type 2 conditional bias, and

394

Solar Irradiance and Photovoltaic Power Forecasting

discrimination denotes one aspect of quality, but in an impartial distributive manner, with no preference for one metric over another, making a choice between HRES and NAM solely based on forecast quality assessment inherently difficult. 9.4.3

VERIFYING THE POST-PROCESSED NWP FORECASTS: THE NOTION OF CONSISTENCY

The discussions of the preceding section exemplify the assessment of forecast quality. In this part of the case study, we demonstrate the practical relevance of the notion of consistency, which is the most distinctively mathematical of all notions of goodness; it is certainly also the most neglected notion of goodness in the solar forecasting literature thus far. Section 8.4.1 introduced the statistical theory on consistency and elicitability. More specifically, the theory has been elaborated in terms of probabilistic-to-deterministic (P2D) post-processing, or more specifically, eliciting functionals according to a scoring function specified ex ante. In this section, the theory is viewed from another perspective, which is one of deterministic-todeterministic (D2D) post-processing. One may recall that regression is a major strategy for making D2D postprocessing. Ranging from simple linear regression to intricate deep learning, all regression-based D2D post-processing techniques seek to train a mapping between the raw forecasts (independent variable) and the corresponding observations (dependent variable), such that when a new raw forecast is available, the established mapping can post-process it to a (hopefully) better forecast. Since it has been argued at the beginning of this section that more intricate post-processing techniques such as deep learning are unsuitable if the historical forecast–observation pairs are few, a univariate linear regression model in the form of: Yi = β0 + β1 xi + εi ,

(9.23)

is considered. The default approach for estimating model coefficients β0 and β1 is to minimize the sum of squared errors of the fitting samples: n

arg min ∑ (β1 xi + β0 − yi )2 .

(9.24)

β0 ,β1 ∈R i=1

In consideration of the setup of this case study, in which each grid operator penalizes forecasts based on a distinct scoring function, two other ways for estimating β0 and β1 are considered, one minimizes the sum of absolute errors: n

arg min ∑ |β1 xi + β0 − yi | ,

(9.25)

β0 ,β1 ∈R i=1

and the other minimizes the sum of absolute percentage errors   n  β1 xi + β0 − yi  arg min ∑  , yi β0 ,β1 ∈R i=1

(9.26)

Deterministic Forecast Verification

395

cf. Table 9.1. In all three cases, once the unknown coefficients are estimated and denoted with β 0 and β 1 , for any new xt , the post-processed forecast is y t = β 0 + β 1 xt .

(9.27)

Regarding Eqs. (9.23)–(9.27), the reader is encouraged to read the original paper by Mayer and Yang (2023a), who first discussed the concept of calibration of deterministic NWP forecasts and its impact on verification. In what follows, we follow Mayer and Yang (2023a) and use “MAE-, MSE-, and MAPE-minimized forecasts” to denote the three sets of post-processed forecasts, respectively. Indeed, another implication of consistency resides in how a forecasting model is fitted or trained. Theoretically, if a forecasting model is trained by minimizing the averaged score based on one scoring function, but is evaluated under another, it could lead to suboptimal performance. This directly echoes the main argument in Kolassa’s article—Why the “best” point forecast depends on the error or accuracy measure (Kolassa, 2020).

Table 9.7 MAE, RMSE in W/m2 (%), and MAPE in % of linearly corrected HRES and NAM forecasts, using three different correction schemes, at three locations. Stn.

BON DRA PSU

BON DRA PSU

HRES

NAM

NAM

MAE-minimized MAE

MSE-minimized MAE

77 (21%) 38 (7%) 83 (25%)

80 (22%) 41 (8%) 85 (25%)

94 (26%) 47 (9%) 108 (32%)

101 (28%) 52 (10%) 114 (34%)

HRES

NAM

MAPE-minimized MAE 111 (31%) 39 (8%) 106 (32%)

108 (30%) 47 (9%) 132 (39%)

MAE-minimized RMSE

MSE-minimized RMSE

MAPE-minimized RMSE

123 (34%) 78 (15%) 129 (38%)

119 (33%) 77 (15%) 127 (38%)

146 (41%) 77 (15%) 149 (44%)

150 (41%) 100 (19%) 164 (49%)

MAE-minimized MAPE BON DRA PSU

HRES

53% 15% 49%

53% 17% 68%

145 (40%) 98 (19%) 159 (47%)

MSE-minimized MAPE 51% 15% 50%

82% 20% 100%

157 (43%) 100 (19%) 184 (54%)

MAPE-minimized MAPE 48% 14% 44%

48% 17% 54%

With that, both sets of raw forecasts, namely, the one from HRES and the one from NAM, are linearly corrected via Eqs. (9.23) and (9.27), under all three optimization schemes. It should be noted, however, that for the schemes of minimizing MAE and MAPE, numeric methods are required to solve for β0 and β1 , for there is no analytic solution. The general-purpose optimization routine, as provided by the optim function in R, is used here. The verification results of the corrected forecasts are listed in Table 9.7, in which forecasts from each correction scheme are evaluated under three accuracy measures, namely, MAE, RMSE, and MAPE. It can be seen that the MAE-minimized forecasts attain the best results under MAE, MSE-minimized

396

Solar Irradiance and Photovoltaic Power Forecasting

forecasts attain the best results under RMSE, and MAPE-minimized forecasts attain the best results under MAPE; the benefits of opting for the correction scheme that is consistent with the given scoring rule is evident. Generally speaking, state-of-the-art solar forecasting procedures almost always involve some form of optimization, so that the model parameters (e.g., the coefficients of a regression model, or the weights of a neural network) can be obtained in an optimal way, subject to some objective function. In this regard, the notion of consistency extends to all such situations. The above results empirically show why it is problematic to report a single point forecast, and evaluate it using different accuracy measures. On the one hand, Kolassa (2020) holds the opinion that evaluating a single “best” forecast, “from different angles,” with a suite of accuracy measures is logically unattractive. On the other hand, Yang et al. (2020a), in their conclusion, advocate using as many accuracy measures as possible. These two views, though conflicting, are in fact based on different premises. In essence, the question which one ought to ask first during the evaluation of deterministic forecasts should be: Is there a scoring rule specified ex ante? If the answer is “yes,” then there is absolutely no motivation to consider any alternative scoring rule, or any other forecasting or post-processing technique that would favor that alternative scoring rule. If the answer is “no,” then one may wish to consider diverse measures of quality, such as its different aspects can be gauged holistically. In most grid integration scenarios, the scoring rule, or equivalently, the penalty scheme, is almost always known, which calls for a targeted strategy of making and postprocessing forecasts. Notwithstanding, whether forecasts of high quality necessarily mean forecasts of high value is the question that we should investigate next. 9.4.4

ASSESSING NWP FORECASTS THROUGH THEIR VALUE

Under the existing remuneration frameworks, which include, in the main, feed-in tariff and forecast submission penalty, the value of solar forecasts is often quantified in monetary terms, despite that there could be other forms of value such as environmental benefit or societal recognition, for which the forecast user may strive. Using the simple model chain described in Section 9.4.1, respective GHI measurements alongside other auxiliary measured weather variables are converted to the AC power output of the three 10-MW (DC) power plants. Similarly, each post-processed set of GHI forecasts from each NWP model, alongside the forecast auxiliary variables, are passed through the same model chain, so as to acquire three versions of forecast AC power. With these, we may first tabulate the PV power forecast errors in the same fashion as we tabulate the errors of GHI forecasts, in order to check whether irradiance-to-power conversion has any noticeable effect on forecast verification outcome. Echoing Table 9.7 is Table 9.8, which displays the MAEs, RMSEs, and MAPEs of the three sets of post-processed forecasts, from two NWP models, for the three PV plants of concern. Whereas MAE and RMSE are scale-dependent and can be expressed in both MW and percent, MAPE is scale-independent and thus is only expressed in percent. The conclusions that can be drawn from Table 9.8 are highly similar to those drawn from Table 9.7. Firstly, comparing the HRES- and NAM-based PV power

Deterministic Forecast Verification

397

Table 9.8 MAE, RMSE in MW (%), and MAPE in % of linearly corrected PV power forecasts based on HRES and NAM, using three correction schemes, at three identically designed 10-MW PV plants. Stn.

HRES

NAM

MAE-minimized MAE BON DRA PSU

0.92 (23%) 0.42 (8%) 0.98 (26%)

1.13 (27%) 0.49 (9%) 1.33 (34%)

MAE-minimized RMSE BON DRA PSU

1.48 (36%) 0.88 (16%) 1.49 (40%)

1.82 (44%) 1.09 (20%) 1.99 (51%)

MAE-minimized MAPE BON DRA PSU

52% 15% 50%

44% 17% 59%

HRES

NAM

MSE-minimized MAE 1.00 (25%) 0.46 (9%) 1.03 (27%)

1.24 (30%) 0.58 (11%) 1.39 (35%)

MSE-minimized RMSE 1.43 (35%) 0.87 (16%) 1.46 (39%)

1.68 (40%) 1.05 (20%) 1.81 (46%)

MSE-minimized MAPE 50% 16% 52%

63% 21% 75%

HRES

NAM

MAPE-minimized MAE 1.30 (32%) 0.42 (8%) 1.27 (34%)

1.20 (29%) 0.50 (9%) 1.73 (44%)

MAPE-minimized RMSE 1.63 (40%) 0.87 (16%) 1.67 (44%)

1.81 (43%) 1.07 (20%) 2.23 (57%)

MAPE-minimized MAPE 47% 15% 45%

43% 17% 52%

forecasts, the former sees lower errors in most cases and thus possesses higher overall accuracy. Secondly, PV power forecasts obtained from GHI forecasts post-processed using a particular objective function still perform favorably against PV power forecasts obtained by other means, under the aggregate score that is consistent for that objective. For example, the PV power forecasts obtained from the MAE-minimized GHI forecasts perform best under MAE. Hence, one may argue that, if the irradianceto-power is performed adequately, the first two types of goodness (quality and consistency) of irradiance forecasts generally transfer to PV power forecasts. This conclusion has also been arrived at by Mayer et al. (2023), who, instead of using simulated PV power data, presented case studies with real PV data from Hungary. In the next part, the three sets of PV power forecasts originated from two NWP models are evaluated under the forecast penalty calculation methods listed in Table 9.4, and the results are tabulated in Table 9.9. Several points should be discussed, and we should proceed by analyzing the results in each row. For the BON plant, the penalty is triggered whenever MAE over the evaluation period (i.e., a month in this case) exceeds 10%, and the penalty is calculated as the arithmetic product of the MAE exceedance (i.e., percentage MAE minus the 10% threshold), the capacity (i.e., 10 MW), the penalty for unit energy (i.e., 200 $/MWh), and the total daylight hours that enter the evaluation. Whereas the MAE-minimized forecasts and MSEminimized forecasts receive comparable penalties, the penalty for MAPE-minimized forecasts is significantly higher. Referring back to the first row of Table 9.8, in which the MAEs themselves are displayed, one may see that the best (23%) and worst

398

Solar Irradiance and Photovoltaic Power Forecasting

(32%) MAEs only differ by a few percent. Nonetheless, in monetary terms, this 9% difference in MAE translates into an approximately 5x multiplier in cost. Evidently, the correspondence between forecast quality and value is nonlinear.

Table 9.9 Penalty [$] incurred over the course of one year at three 10-MW PV plants. Forecasts are based on HRES and NAM, and three linear correction methods are used to post-process the raw PV power forecasts. Stn.

HRES

NAM

MAE-minimized BON DRA PSU

48691 19 44854

115795 37284 66777

HRES

NAM

MSE-minimized 49364 0 44404

175935 30316 61082

HRES

NAM

MAPE-minimized 239919 0 65898

160589 34929 86491

For the DRA plant, the penalty is triggered if the RMSE over an evaluation period exceeds 15%, and the penalty amount can be computed as the arithmetic product of the RMSE exceedance (i.e., percentage RMSE minus the 15% threshold), the capacity (i.e., 10 MW), the penalty for unit energy (i.e., 200 $/MWh), and the total daylight hours that enter the evaluation. Knowing the plant is located in a desert, one can expect a low penalty, since the forecast error of a good NWP model during clearsky days rarely exceeds 15%. Here, HRES-based PV power forecasts are almost not penalized, whereas the NAM-based PV power forecasts are penalized lightly due to their relative inaccuracy, in that, the penalty at DRA appears too loose as compared to the other two cases. The same concern has been raised previously by Yang et al. (2021a), who debated the inappropriateness of the current Chinese grid code, which sets the penalty by the administrative areas of various inter-provincial grid operators but not by weather regimes, causing highly unfair treatment to some plant owners, who are then likely to be discouraged from participating in new project development. Lastly, for the PSU plant, the penalty is triggered if the MAPE exceeds 15%, and the penalty amount is calculated as the arithmetic product of the integral of the adjusted errors (i.e., using the scaled-up or -down forecasts), the penalty for unit energy (i.e., 200 $/MWh), and a 0.1 penalty relief, without which the penalty would probably be too high for any plant owner to bear. Indeed, even with the 0.1 penalty relief, the penalties for MAE- and MSE-minimized forecasts at PSU are already comparable to those at BON. Another somewhat surprising observation is that the penalties for MAPE-minimized forecasts are higher than the other two cases, which contradicts the earlier result in Table 9.8. One may thus conclude that the correspondence between forecast quality and value is not monotone, for forecasts of better quality may have a lower value. In any case, both observations above provide further evidence of why MAPE, which is extremely sensitive to variables with near-zero values, should not be used to gauge solar forecasts at any rate.

Forecast 10 Probabilistic Verification “Well, forecasts are always wrong. The trick is to do forecasting in a way that you understand how wrong they can be and what the probabilities are about them coming out in different ways.” — Rob Hyndman

In the preceding chapter, the verification of deterministic solar forecasts has been described, and the three types of goodness of forecasts, namely, consistency, quality, and value, have been explained with respect to deterministic forecasts. While these three types of goodness of forecasts also apply to the verification of probabilistic forecasts, there are some notable differences in interpretation. This should be understood a priori, since deterministic and probabilistic forecasts differ not only in form (i.e., single values versus distributions) but also in the judgment of the forecaster (i.e., best guess versus likelihood, although the former may sometimes be derived from the latter). For instance, the different aspects of deterministic forecast quality, such as accuracy as gauged by absolute or squared error, are not calculable when the forecasts are probabilistic, and thus do not carry much significance—we rarely say a probabilistic forecast is accurate. On this point, this chapter orients its discussion on the following questions: (1) what does consistency mean for a probabilistic forecast, (2) what aspects of quality should one expect from a probabilistic forecast, and (3) what are the ways to assess the quality of a probabilistic forecast? In a nutshell, the evaluation of consistency in probabilistic forecast verification needs to be carried out between forecasts and observations, and is different from the evaluation of consistency in deterministic forecast verification, which assesses the correspondence between forecasts and judgment. In terms of quality, or predictive performance, the paradigm of maximizing sharpness without sacrificing calibration is commonly regarded as the overarching principle of making good probabilistic forecasts. Last but not least, the different aspects of quality of probabilistic forecasts can be verified through both scoring rules and visualizations.

10.1

DEFINITIONS OF PROPERTIES IN PROBABILISTIC FORECAST VERIFICATION

In forecast verification, forecasters are concerned with statistical properties, which describe either the forecasts or the observations, or the two simultaneously, such that when certain forecasts and/or observations possess these properties, an evaluator would be able to conclude with high confidence whether or not the forecasts are good, and to what degrees the forecasts are good. Notwithstanding, the DOI: 10.1201/9781003203971-10

399

400

Solar Irradiance and Photovoltaic Power Forecasting

terminology for these statistical properties, as we have seen in Chapter 9 is not standardized, and many terms can be used interchangeably. For example, the terms “reliability,” “calibration,” and “type-1 conditional bias” all mean the same thing in deterministic forecast verification, whereas the word “consistency” means something else. In fact, when we examine the articles on forecast verification in the literature, such inconsistency in terminology can be found everywhere, even in papers of Murphy and Winkler (1987); Murphy (1993), who are responsible for defining many of these terms, let alone works written by different authors, living in different eras many decades apart. Hence, the opening section of this chapter should approach the issue with several well-accepted definitions of some important statistical properties that are relevant to probabilistic forecast verification. 10.1.1

GNEITING’S DEFINITIONS OF CALIBRATION AND SHARPNESS

For a start, a pair of complementary properties, namely, calibration and sharpness, is defined: “Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only.” Gneiting et al. (2007) Probabilistic forecasts, calibration and sharpness The definition of calibration above entails both forecast and observation, and if their distributions are statistically consistent, the forecast is said to be calibrated. On the other hand, sharpness is inversely related to the spread of the predictive distributions, the wider the spread is, the lower the sharpness is. Gneiting et al. (2007) began their explanation of the phrase “statistical consistency” by considering two sets of predictive distributions, {Gt } and {Ft }, for t = 1, 2, . . . , n, with n denoting the number of verification samples. Whereas both Gt and Ft are distributions of the quantity to be forecast at time t, the former is assumed to be chosen by Nature, and the latter is chosen by a forecaster. The assumption which has to be placed here is that Nature is omniscient, and its chosen distributions represent the true data-generating process. On the other hand, the information possessed by a forecaster cannot, in amount, exceed that of Nature. Consequently, a perfect set of probabilistic forecasts would have Ft = Gt ,

for all t.

(10.1)

Based on two sets of distributions {Ft } and {Gt }, with t = 1, 2, . . . , n and n → ∞, Gneiting et al. (2007) defined several forms of calibration. The sequence {Ft } is probabilistically calibrated with respect to {Gt } if 1 n a.s. Gt ◦ Ft−1 (τ) −−→ τ, ∑ n t=1

for all τ ∈ (0, 1),

(10.2)

Probabilistic Forecast Verification

401

where the “◦” symbol denotes composition function, which in this case reads: Function Ft−1 is evaluated at τ, of which the result is where function Gt is evaluated. The sequence {Ft } is exceedance calibrated with respect to {Gt } if 1 n −1 a.s. ∑ Gt ◦ Ft (x) −−→ x, n t=1

for all x ∈ R.

The sequence {Ft } is marginally calibrated with respect to {Gt } if 8  1 n F(x) = lim ∑ Ft (x) , n→∞ n t=1  8 1 n G(x) = lim ∑ Gt (x) , n→∞ n t=1

(10.3)

(10.4) (10.5)

exist and equal each other for all x ∈ R. And sequence {Ft } is strongly calibrated with respect to {Gt } if all three forms of calibrations are true. Condition (10.2) states that the asymptotic mean of Nature’s distribution evaluated at the τ th quantile of the forecaster’s distribution1 converges almost surely to τ. To intuitively understand this, an illustration is given in Fig. 10.1, with three pairs of arbitrarily generated distributions, {Ft } and {Gt } with t = 1, 2, 3. Starting from a probability τ, each quantile function Ft−1 evaluates τ, then the evaluated threshold is taken as input by the respective Gt , and the evaluation of Gt ◦ Ft−1 (τ) gives τ t , of which the mean can be computed (cf. directions of arrows in Fig. 10.1). With this example, it can be understood that probabilistic calibration is essentially saying: If the forecasts are probabilistically calibrated, when the number of distribution pairs being evaluated is large (i.e., n → ∞), the mean of τ 1 , τ 2 , . . . , and τ n converges to τ, and this is true for all τ. Similarly, the definition of exceedance calibration states that the asymptotic mean of Nature’s quantile function evaluated at the probability, which results from the evaluation of the forecaster’s distribution at threshold x, converges almost surely to x. An illustration of this process, with the same three pairs of distributions as appeared in Fig. 10.1, is shown in Fig. 10.2. The interpretation of exceedance calibration can thus be understood from the figure. In that, if the forecasts are exceedance calibrated, when the number of distribution pairs being evaluated is large (i.e., n → ∞), the mean of x 1 , x 2 , . . . , and x n converges to x, and this is true for all x. As for marginally calibrated forecasts, Eqs. (10.4) and (10.5) express the limiting distributions of {Ft } and {Gt }, which are denoted using F and G, respectively. F and G are simply the averages of {Ft } and {Gt } evaluated at x, when n → ∞. And by definition, if F(x) = G(x), for all x’s, {Ft } is marginally calibrated. Figure 10.3 shows G(x) and F(x) computed based on three Ft –Gt pairs; when the number of distribution pairs increases as n → ∞, if {Ft } is marginally calibrated with respect to {Gt }, the solid black line and the dashed black line would coincide. 1 Recall the content of Chapter 3. Here, G and F are cumulative distribution functions (CDFs), which t t evaluate a threshold x and output the probability of the random variable falling at and below that threshold, −1 −1 i.e., P(X ≤ x). On the other hand, Gt and Ft are quantile functions, which evaluate a probability τ and output the threshold, which is known as the τ th quantile.

402

Solar Irradiance and Photovoltaic Power Forecasting

Figure 10.1 An illustration of the evaluation of Gt ◦ Ft−1 (τ), t = 1, 2, 3, at τ = 0.75. The

evaluated values are noted with τ 1 , τ 2 , and τ 3 . The arrows indicate the direction of evaluation.

Probability [0−1]

1.0

0.5

Distributions Ft Gt x2

0.0 0.0

x3

x1

x 0.3 0.9 Clear−sky index [dimensionless]

Figure 10.2 An illustration of the evaluation of Gt−1 ◦ Ft (x), t = 1, 2, 3, at x = 0.6. The

evaluated values are noted with x 1 , x 2 , and x 3 . The arrows indicate the direction of evaluation.

The difficulty of assessing the conditions under which various forms of calibration hold is that the true distribution Gt is perpetually unknown to the forecaster. Instead, at each time t, only a single observation yt would be available, which is just a random sample from Gt . So in probabilistic forecast verification, one works with Ft –yt pairs, rather than Ft –Gt pairs. This contrasts the situation of deterministic forecast verification, in which the full information in regard to forecast–observation pairs are both known. Nevertheless, if the forecasts are perfect, when the predictive distribution Ft is evaluated at yt , the resultant quantity, which is known as the probability integral transform (PIT), (10.6) pt = Ft (yt ),

Probabilistic Forecast Verification

403

Probability [0−1]

1.0

0.5

Limit distributions F G

0.0 0.0

0.3 0.6 0.9 Clear−sky index [dimensionless]

Figure 10.3 An illustration of G(x) and F(x) computed based on three Ft –Gt pairs. must come from a uniform distribution (Dawid, 1984; Diebold et al., 1998). This particular fact can be leveraged to assess probabilistic calibration (Gneiting et al., 2007). Exceedance calibration is defined analogously to probabilistic calibration. Instead of defining calibration in terms of probability, it defines calibration in terms of threshold. That said, although the definitions are analogous, there is no obvious strategy to perform the assessment of exceedance calibration in a way analogous to performing the assessment of probabilistic calibration (Gneiting et al., 2007). As for marginal calibration, Nature’s limiting distribution G represents the climatology. To estimate G, one can use the empirical CDF (ECDF) of observations, i.e.,  n 1 n (x) = ∑ {yt ≤ x}, where {yt ≤ x} = 1 if yt ≤ x, (10.7) G n t=1 0 if yt > x. The hat above G denotes the fact that it is an empirical estimate, and the subscript n suggests that it is estimated based on n samples. Then, with the averaged predictive distributions: 1 n F n (x) = ∑ Ft (x), (10.8) n t=1 n (x), thereby assessing marginal calone can check the equivalence of F n (x) and G ibration. Here, a subscript n is again added to denote that the average is computed over n distributions, and to distinguish F n from the limiting distribution F. The definition of sharpness is discussed next. First of all, based on the definition of marginal calibration, any rational forecaster can spot the pitfall in relying solely on calibration in the verification of probabilistic forecasts. For instance, if one n (x), which can be obtained from a long-enough sequence of historical obissues G servations, each time a forecast is needed, then, the averaged predictive distribution

404

Solar Irradiance and Photovoltaic Power Forecasting

n (x), which immediately makes the forecasts calibrated. On the other would also be G hand, forecasting just the climatology is not at all useful to decision-making, since such a strategy is unable to discern the forecasting situations. Stated differently, if one issues the same forecast for all situations, the scenario is equivalent to a “no forecast” one. In view of this pitfall of the calibration-only verification paradigm, the involvement of sharpness must be compelled. Sharpness describes the degree of confidence of a forecaster. As per its definition, sharpness is maximized if the spread is zero, i.e., when the predictive distribution is fully concentrated on a point. Notwithstanding, if maximizing sharpness alone is used as the virtue of probabilistic forecasts, one can simply issue point forecasts (or issue very narrow predictive distributions, to force a sense of uncertainty). Obviously, this strategy carries no practical relevance, as narrow predictive distributions may miss the observations completely. Combining the two aspects, probabilistic forecasts ought to be generated by attempting to maximize sharpness while maintaining calibration. 10.1.2

BROCKER’S DEFINITIONS OF RELIABILITY AND RESOLUTION

Besides calibration and sharpness, other terms that are used to describe useful properties in probabilistic forecast verification also exist. For instance, Br¨ocker (2012) defined reliability and resolution as follow: “The first property to be discussed here is reliability. On the condition that the forecasting scheme is equal to, say, the probability assignment p, the observation Y should be distributed according to p. [. . . ] The resolution term describes, roughly speaking, the average deviation of π (Γ) from its expectation π; it can therefore be interpreted as a form of ‘variance’ of π (Γ) .” Br¨ocker (2012) Forecast Verification: A Practitioner’s Guide in Atmospheric Science The mathematical symbols in the above definitions are written according to the nomenclature of Br¨ocker (2012), in what follows, another set of symbols that is more consistent with the current book’s nomenclature is used instead. Let Y be the quantity to be forecast, i.e., the random variable representing the observation. In the book chapter by Br¨ocker (2012), Y is assumed to be a discrete random variable which takes value from a finite set with m elements (i.e., categories), Ω = {1, 2, . . . , m}. These elements could either represent natural categories (such as snow, rain, sunshine, and hail), or be used to partition a continuous random variable in a mutually exclusive and collectively exhaustive fashion. One example of the latter, in a solar context, could be to partition the clear-sky index into [0, 0.2), [0.2, 0.9), [0.9, 1.1), and [1.1, ∞), which corresponds to overcast, cloudy, clear, and over-irradiance skies. If m = 2, Y is a binary variable, and if m > 2, Y is a multi-categorical variable. Next, a forecaster Γ is tasked to provide a length-m vector of probability assignment over Ω, according to the forecasting situation. Denoting a generic probability assignment

Probabilistic Forecast Verification

405

as p = (p1 , . . . , pm ) , Γ = p reads “Γ issues p ,” and by the definition of probability distribution, the assignment needs to satisfy p j ≥ 0, for j ∈ Ω, and ∑mj=1 p j = 1. Since the probability assignments p over Ω form a continuum, Γ may be regarded as a vector of random variables with continuous range, i.e., Γ = (Γ1 , . . . , Γm ) , with Γ j ≥ 0, for j ∈ Ω, and ∑mj=1 Γ j = 1. Based on the above notation, one can use P(Y = j|Γ = p ) to denote the conditional probability of the observation materializing as category j, given a forecasting situation for which the forecaster issues p . Then, the condition for reliability, as defined by Br¨ocker (2012), is simply: P(Y = j|Γ = p ) = p j ,

for all j.

(10.9)

This suggests an equivalence between the probability law of how Y materializes given a forecasting situation and the distribution of the forecast of that situation. Hence, Eq. (10.9) can be rewritten in the form of CDF, that is, P(Y ≤ y|Γ = F) = F.

(10.10)

Since the forecaster may issue the same forecast distribution (or probability assignment in the case of categorical variables) more than once, P(Y ≤ y|Γ = F) represents the limiting distribution (or limiting observed frequencies in the case of categorical variables) of the corresponding observation Y (Br¨ocker, 2008). Furthermore, if the forecaster issues a set of forecasts over time, i.e., Ft , t = 1, 2, . . . , the reliability condition implies that P(Y ≤ y|Ft ) = Ft holds. A careful reader would notice the resemblance of this expression with the condition for a set of deterministic forecasts to be calibrated, as presented in the last chapter—a set of deterministic forecasts is calibrated if E(Y |X = x) = x for all x, i.e., the conditional expectation of observations given a specific forecast should be equal to that forecast. Indeed, P(Y ≤ y|F)—or P(Y = j|pp) in the case of discrete random variables—is the conditional distribution of observation given a specific forecast F (or a specific probability assignment p ), and its equivalence to F (or p j ) suggests reliability. What distinguishes the two cases is that the calibration for deterministic forecasts is defined over all forecast–verification pairs, with “observation” and “forecast” being random variables, whereas the reliability of probabilistic forecasts can be defined with just one pair of the forecaster’s and Nature’s distributions, and the random variable is the quantity to be forecast. By now, it should be very obvious that the two words “calibration” and “reliability” mean the exact same property in probabilistic forecast verification; there is very little dispute over this viewpoint in the literature (e.g., Br¨ocker, 2008; Pinson et al., 2010; Wilks, 2019). Another property that is of concern is resolution. To formally define it, Br¨ocker (Γ) (2012) first introduces a new notation, which makes the definition that π j ≡ P(Y =  

(Γ) (Γ) j|Γ). Since j = 1, . . . , m, we can collect π1 , . . . , πm as π (Γ) . Clearly then, π (Γ) is also a probability assignment, and if forecasts are reliable, i.e., Eq. (10.9), the following holds: (10.11) π (Γ) = Γ.

406

Solar Irradiance and Photovoltaic Power Forecasting

It should be recalled that Γ is a forecaster, and she may issue different forecasts under different situations, and as such, π (Γ) which is conditional on Γ also has different probability assignments. When Γ makes infinitely many forecasts, the limit distribution of π (Γ) is denoted as π. In meteorology, π is known as the climatology of Y . Now, we are ready to interpret the definition of resolution made at the beginning of this subsection. To be consistent with the notation convention of this book, we should go back and denote the climatology as G (or y in terms of discrete random variable). Hence, following the definition, the resolution is the average divergence of the conditional distribution of observation given a specific forecast from the climatology, that is,  " # E d P(Y ≤ y|F), G , (10.12) or E {d [P(Y = j|pp), y ]} .

(10.13)

The larger the expectation is, the higher the resolution of the forecasts is. The quantification function for the divergence, namely, d(·, ·), does not need to be associated with any particular choice. For instance, under quadratic score, the divergence between two vectors p = (p1 , . . . , pm ) and q = (q1 , . . . , qm ) is: d(pp, q ) =

m

∑ (p j − q j )2 ≡ (pp − q )2 = (pp − q ) (pp − q ),

(10.14)

j=1

in which several equivalent forms are noted. (The third form will be used fairly frequently in Section 10.2.1. The “square” here means the dot product of two identical vectors.) It should be noted that divergence is not the score itself (see Section 7.3.2 of Jolliffe and Stephenson, 2012), although in the case of quadratic score, the score and the divergence take the same form, see Eq. (10.21) below. There are many other explanations of resolution in the literature, but one should take very special care when interpreting such textual descriptions. For instance, some regard resolution as a measure of the impact obtained by issuing case-dependent probability forecasts (Hersbach, 2000). Under this view, forecasts with high resolution are expected to differ from each other. However, from Eqs. (10.12) and (10.13), one can see that a group of identical forecasts very far from climatology would also lead to a high resolution, which contradicts the explanation of Hersbach (2000). Clearly, this problem is not of right or wrong, but rather of definition. The same can be said for the sharpness introduced earlier. That said, some energy forecasters have attempted to discriminate the usage of the two words based on subject fields—resolution and sharpness refer to so-and-so in meteorology and so-and-so in statistics (Lauret et al., 2019; Pinson et al., 2007). However, we do not believe such a boundary should be drawn, as counter-examples can be found easily. What ought to be done instead is to return to the mathematical expressions (i.e., the definition) when it comes to interpreting these properties. To understand why mathematical expressions are more essential as compared to textual descriptions, we show an example of how the term “resolution” can be associated

Probabilistic Forecast Verification

407

with two profoundly different interpretations, even though both interpretations come from the same subject field and the same (very celebrated) author, in Section 10.2.1. In addition, the mathematical expression for sharpness, in the same section, is also provided to contrast those of resolution. One thing is, however, certain—sharpness is not resolution. A characteristic difference which the majority seems to agree is that sharpness is a property of forecasts only, whereas resolution is a joint property of forecasts and observations (Potts, 2012; Murphy, 1993; Gneiting et al., 2007; Lauret et al., 2019).

10.2

SCORING RULES FOR PROBABILISTIC FORECASTS

A scoring rule assigns a numerical value based on a probabilistic forecast and an observation. The convention of terminology is worth noting: Scoring rules are more used to refer to those scores that evaluate probabilistic forecasts, whereas scoring functions are more frequent to appear in describing those scores that evaluate deterministic forecasts.2 Like the scoring functions shown in Table 9.1, various aspects of quality of probabilistic forecasts can be assessed with various scoring rules, such as the continuous ranked probability score (CRPS) or ignorance score (IGN). Scores for deterministic forecasts customarily have a negative orientation, which is to say, the smaller the scores are, the better the forecasts are reckoned to be. This is also intuitive for probabilistic scoring rules, for forecaster issues a sharper predictive distribution without sacrificing calibration should be charged with a smaller penalty. However, using the convention of positively oriented scores is mathematically no problem, since one just needs to place a negative sign in front of the scoring rule, as did in Gneiting and Raftery (2007). At any rate, if a scoring rule between a forecast F and a verification y is denoted as S(F, y), one is interested in the average score over n samples: 1 n S = ∑ S(Ft , yt ). (10.15) n t=1 The average score gives rise to the skill score, which can be defined in the same fashion as that of deterministic forecasts: S =

Sfcst − Sref , Sopt − Sref

(10.16)

where Sfcst , Sref , and Sopt are the average scores computed from n samples of forecasts of interest, reference forecasts, and perfect (optimal) forecasts, respectively. The skill score takes a value of 1 when Sfcst = Sopt , and 0 when Sfcst = Sref . Owing to its popularity in deterministic forecast verification, skill score is increasingly accepted as a verification tool for probabilistic forecast (e.g., Wilks, 2019; Hyndman and Athanasopoulos, 2018). However, the form of skill score in Eq. (10.16) is difficult to compute, for evaluating Sopt in a probabilistic setting would require Nature’s 2 Not everyone deliberately distinguishes the usage (e.g., Pinson et al., 2022), but in this book, we shall stick to that convention for clarity.

408

Solar Irradiance and Photovoltaic Power Forecasting

predictive distribution for each sample of concern, which is unknown and cannot be inferred from one-per-scenario observations. Hence, another definition of skill score, in line with that used in deterministic forecast verification, is often advised, for instance, by the European Centre for Medium-Range Weather Forecasts (ECMWF):3 S ∗ = 1−

Sfcst . Sref

(10.17)

The skill score S ∗ should not, in general, be regarded as a reduced form of S , because the optimal values of many scoring rules are not zero (even if they are, the conception of a “no uncertainty” probabilistic forecast is challenged and likely fallacious). Regardless, the choices of scoring rule employed by ECMWF in computing S ∗ include Brier score, ranked probability score (RPS), and CRPS, which are all inspected closely later in this section. A critically important question to ask when selecting scoring rules is whether or not a scoring rule is strictly proper. Since achieving a good score is a major desire of forecasters, some forecasters can choose to game the system by issuing forecasts that are not to their best judgment but are able to result in a better score; this is known as hedging (Brier, 1950), which is allowed by certain scoring rules. For instance, the somewhat popular coverage-width-based criterion (CWC; Khosravi et al., 2011; Khosravi et al., 2013; Khosravi et al., 2013) has been shown to be susceptible to hedging (Pinson and Tastu, 2014), see Section 2.2. The rigorous definition of what constitutes a strictly proper scoring rule requires knowledge of measure theory: “We consider probabilistic forecasts on a general sample space Ω. Let A be a σ -algebra of subsets of Ω, and let P be a convex class of probability measures on (Ω, A ). A function defined on Ω and taking values in the extended real line, R = [−∞, ∞], is P-quasi-integrable if it is measurable with respect to A and is quasi-integrable with respect to all P ∈ P. A probabilistic forecast is any probability measure P ∈ P. A scoring rule is any extended real-valued function S : P × Ω → R such that S(P, ·) is P-quasi-integrable for all P ∈ P. Thus if the forecast is P and ω materializes, the forecaster’s reward is S(P, ω).” Gneiting and Raftery (2007) Strictly Proper Scoring Rules, Prediction, and Estimation With the above background information, we may define the expected score of some forecast F under the true-belief forecast Q as S(F, Q) = EQ [S(F, y)] =



S(F, y)dQ(y) =



S(F, y)q(y)dy.

(10.18)

Then a negatively oriented scoring rule S is said to be proper if for all F, Q ∈ P, we have S(Q, Q) ≤ S(F, Q), (10.19) 3 https://confluence.ecmwf.int/display/FUG/12.B+Statistical+Concepts+-+ Probabilistic+Data

Probabilistic Forecast Verification

409

and S is said to be strictly proper if the equality is only true when F = Q. On this point, if a scoring rule S is proper, the forecaster is not incentivized to issue any F = Q. We may define the score divergence associated with the scoring rule S as d(F, Q) = S(F, Q) − S(Q, Q),

(10.20)

and propriety—the quality of being proper—necessary means d(F, Q) ≥ 0, for F, Q ∈ P. The mathematical characterizations of proper scoring rules are beyond the scope of this book, and interested readers are referred to Gneiting and Raftery (2007). However, Br¨ocker and Smith (2007b) provided an intuitive explanation of Eq. (10.19), which we wish to reiterate here. Of course, it is not possible to know in advance what score a probabilistic forecast is able to achieve, because the observation has yet materialized. However, it is possible to compute based on Eq. (10.18) the expected score of a forecast. On the one hand, S(Q, Q) = S(Q, y)q(y)dy is the average score of forecast Q, should the observation materialize infinitely many times according to distribution Q. On the other hand, S(F, Q) is the average score of forecast F, should the observation materialize infinitely many times according to distribution Q. Propriety implies that S(F, Q) is always larger than S(Q, Q), or stated differently that we should expect F to be less skillful than Q when the expectation is evaluated using Q. If this is not the case, it would lead to the contradiction that: F is better than Q despite all observations are in fact drawn from Q. At this stage, one may already realize that Eq. (10.19) is a property of the score itself, not of the particular forecasts F and Q—i.e., we can likewise write S(F, F) ≤ S(Q, F), which also stands. In the remaining pages of this section, several strictly proper scoring rules are discussed. Unfortunately, the skill score in Eq. (10.16) is not proper, even if a proper scoring rule S is used for its computation (Gneiting and Raftery, 2007). 10.2.1

BRIER SCORE

Brier score, also known as the probability score, is used to gauge the quality of forecasts of categorical variables. If the random variable to be forecast has m categories, the original Brier score as defined by Brier (1950) is: bs(pp, y ) =

m

∑ (p j − y j )2 ≡ (pp − y )2 = (pp − y ) (pp − y ),

(10.21)

j=1

where p = (p1 , . . . , pm ) and y = (y1 , . . . , ym ) are m-dimensional vectors of forecast probability assignment and outcome, respectively. Elements of y are binary, that is, if category j materializes, the respective entry takes the value of 1, and the remaining entries take the value of 0. For example, two forecasts (0.1, 0.1, 0.7, 0.1) and (0.25, 0.25, 0.25, 0.25) are issued by different forecasters for a four-category variable (such as the above-mentioned partitioned clear-sky index). If the observation falls in the first category, i.e., y = (1, 0, 0, 0) , the two forecasters receive scores of 1.32 and 0.75; if it falls in the third category, i.e., y = (0, 0, 1, 0) the received scores

410

Solar Irradiance and Photovoltaic Power Forecasting

are 0.12 and 0.75, respectively. Clearly, the Brier score as defined in Eq. (10.21) has a negative orientation, that is, the smaller the score is, the better the forecast is. In the case of binary events, one is only interested in whether or not the event occurs, and the Brier score is defined as: bs(p, y) = (p − y)2 ,

(10.22)

where p = P(Y = 1), which is the forecast probability that the binary event occurs, and y is 1 if the event occurs, and 0 if the event does not occur. For example, if p = 0.7 and the rain occurs, the forecaster receives a score of 0.09; if p = 0.8 and the rain does not occur, the received score is 0.64. It should be noted that the score in Eq. (10.22) is the currently more common usage of the Brier score, and has inherited the name from the original Brier score. Moreover, for binary events, the original Brier score is twice as large as the Brier score. On this point, the score in Eq. (10.22) is occasionally referred to as the half Brier score (Hersbach, 2000; Wilks, 2019). The original Brier score is strictly proper. Notwithstanding, the Brier score is proper only for binary events, but not for multi-categorical ones. Stated differently, though Eq. (10.22) allows for the calculation of a binary “one against all” Brier score, mathematically, such results can be misleading. 10.2.1.1

Brier score decomposition

An attractive feature of the Brier score is that it allows decomposition into two or three terms, which correspond to various aspects of forecast quality. Firstly, from Eqs. (10.15) and (10.21), the average Brier score over n forecast–observation pairs is given by: BS =

2 1 l nk m  1 n m 2 pki j − yki j . (p − y ) = tj tj ∑ ∑ ∑ ∑ ∑ n t=1 j=1 n k=1 i=1 j=1

(10.23)

Here, the change in indexing is to be taken note of, which is a result of the fact that observations may materialize differently for the same (or very similar) forecast (see Section 2b of Murphy, 1973b). To avoid notation overload, vector representation is used, with which the average Brier score in Eq. (10.23) is written as: BS =

1 l nk ∑ ∑ (ppki − y ki )2 , n k=1 i=1

(10.24)

cf. Eq. (1) of Stephenson et al. (2008). In Eqs. (10.23) and (10.24), the number of distinct probability assignments in the verification dataset is l, and the number of forecasts which assumed the kth assignment is nk , which implies n = ∑lk=1 nk . Therefore p ki and y ki are the m-dimensional vectors of forecast probability assignment and outcome, following the new indexing. Tables 10.1 and 10.2 present a numerical example of how the change of indexing is performed. The index i in p ki may seem redundant, since forecasts that assumed the kth assignment are all the same. Nonetheless, in practice, the extra index i allows grouping

Probabilistic Forecast Verification

411

Table 10.1 A sample collection of forecasts and observations in a four-category situation. Sample index, t

Fcst., pt

Obs. yt

1 2 3 4 5 6 7 8 9 10

(0.1, 0.1, 0.7, 0.1) (0.25, 0.25, 0.25, 0.25) (0.25, 0.25, 0.25, 0.25) (0.1, 0.1, 0.7, 0.1) (0.3, 0.5, 0.1, 0.1) (0.2, 0.2, 0.4, 0) (0.1, 0.1, 0.7, 0.1) (0.1, 0.1, 0.7, 0.1) (0.2, 0.2, 0.4, 0) (0.8, 0, 0.2, 0)

(0, 0, 1, 0) (1, 0, 0, 0) (1, 0, 0, 0) (0, 1, 0, 0) (0, 1, 0, 0) (0, 0, 1, 0) (0, 0, 1, 0) (0, 0, 0, 1) (0, 1, 0, 0) (1, 0, 0, 0)

Table 10.2 Brier score decomposition using samples listed in Table 10.1. Bin, k

Fcst., p

k

nk

Rel. freq. y k , y

Reliability

Resolution 2

1 2 3 4 5

(0.1, 0.1, 0.7, 0.1) (0.25, 0.25, 0.25, 0.25) (0.3, 0.5, 0.1, 0.1) (0.2, 0.2, 0.4, 0) (0.8, 0, 0.2, 0)

4 2 1 2 1

(0, 0.25, 0.5, 0.25) (1, 0, 0, 0) (0, 1, 0, 0) (0, 0.5, 0.5, 0) (1, 0, 0, 0)

0.38 1.5 0.36 0.28 0.08

0.62 1.36 0.68 0.36 0.68

(0.3, 0.3, 0.3, 0.1)

2.6 0.26

3.7 0.37

Total Average

10

of very similar forecasts into a bin-wise average: pk =

1 nk ∑ p ki , nk i=1

(10.25)

despite that such assumption gives rise to two extra components in the Brier score decomposition (Stephenson et al., 2008), with which this book is not concerned, and treat pk = pki , for all i. In contrast, since there may be multiple observations for the same forecast, the relative frequency that the observed events occurred at instances when the probability assignment is pk is obtained by averaging: yk =

1 nk ∑ y ki . nk i=1

(10.26)

The relative frequencies for different k can be further averaged into the climatological

412

Solar Irradiance and Photovoltaic Power Forecasting

base rate (mean probability) for the event to occur in various categories, i.e., y=

1 l ∑ nk y k . n k=1

(10.27)

With the notation all explained, the two-component decomposition of the Brier score can be written as (Murphy, 1972a,b): reliability

resolution 1

  !   ! 1 l 1 l 2 BS = ∑ nk (ppk − y k ) + ∑ nk [yyk (11 − y k )], n k=1 n k=1

(10.28)

whereas the three-component decomposition is (Murphy, 1973b): reliability

resolution 2

  !   ! uncertainty l l   ! 1 1 BS = ∑ nk (ppk − y k )2 − ∑ nk (yyk − y )2 + y (11 − y ), n k=1 n k=1

(10.29)

where 1 is a length-m vector of ones.4 To exemplify the procedure for calculating these terms, the decomposition results following the numerical values in Table 10.1 are shown in the last two columns of Table 10.2. One should note the resemblance of Eq. (10.29) with the calibration–refinement decomposition of mean square error (MSE) in Eq. (9.15), in which the MSE is also decomposed into uncertainty, reliability and resolution terms. In Eq. (9.15), the uncertainty is represented by the variance of observations, whereas in Eq. (10.29), uncertainty is represented by y (11 − y ), which is the Brier score that would be obtained if all p k ’s were replaced with y . In Eq. (9.15), the reliability term, i.e., EX [X − E(Y |X)]2 , is the average squared difference between a deterministic forecast and the mean observation given that forecast. Analogously, the reliability term in Eq. (10.29) is also the (weighted) average squared difference between a probability assignment and the relative frequency of events given that assignment. Similarly, the analogy between the EX [E(Y |X) − E(Y )]2 term in Eq. (9.15) and the (1/n) ∑lk=1 nk (yyk − y)2 term in Eq. (10.29) is immediately found. Both Eqs. (10.28) and (10.29) provide quantification on reliability, which is concerned with the calibration (or conditional bias) of the probability assignments, and resolution, which summarizes the ability of the forecasts to discern forecasting situations. However, as the two measures of resolution differ by an uncertainty term, hence, they are to be interpreted separately. On one hand, the resolution measure in Eq. (10.29) describes the extent to which the sample relative frequencies for the l divisions of forecasts deviate from the all-sample relative frequency. Hence, the resolution 2 term quantifies how well the forecaster is able to divide forecasts into 4 Again, it should be reminded that y (1 1 − y k ), and y (11 − y ) k 1 − y k ) denotes the dot product of y k and (1 is that of y and (11 − y ). Same goes for the squared terms in Eqs. (10.28) and (10.29).

Probabilistic Forecast Verification

413

subcollections of which the sample relative frequencies differ from the climatological base rate. Clearly, the bigger the difference between the sample and the climatology relative frequencies is, the better the forecasts are, as also evidenced by the negative sign in front of this term. On the other hand, from Eq. (10.28), the deviation is measured with respect to a 0 or 1 vector. The best scenario is that each subcollection of events only materializes in one way, i.e., all y ki are equal, which implies y k (11 − y k ) = 0. Hence, the resolution 1 term in Eq. (10.28) gauges the forecaster’s ability to separate the occasions of concern into subcollections of which the sample relative frequencies approach 1 for a particular category. Stated differently, resolution 1 measures the refinement of forecasts. In form, resolution 1 is similar to the measure of sharpness: 1 l (10.30) ∑ nk [ppk (11 − p k )] , n k=1 but the former is concerned mainly with observations, whereas sharpness is only concerned with forecasts. 10.2.2

CONTINUOUS RANKED PROBABILITY SCORE

Besides the Brier score, the ranked probability score (RPS) is another strictly proper score that is highly amenable for the verification of forecasts of multi-categorical events. Particularly when the events are ordinal, which means the outcomes have a notion of size attached to them, as opposed to events that are nominal, which do not incur a natural ordering, the verification measures need to be sensitive to distance, and RPS is adequate in that sense. The RPS between a forecast probability assignment, p = (p1 , . . . , pm ) , and a vector of binary numbers representing the outcome, y = (y1 , . . . , ym ) , is given as: rps(pp, y ) =

m

/



0

k

∑ pj

/ −

j=1

k=1

k

02

∑ yj

,

(10.31)

j=1

see Murphy (1971). The inner two summations represent the cumulative forecast and observation, which allow some aspects of the magnitudes of forecast errors (Wilks, 2019). For n forecast–verification pairs, following Eq. (10.15), the average RPS is: 1 n m 1 n RPS = ∑ rps(ppt , yt ) = ∑ ∑ n t=1 n t=1 k=1

/

k

∑ pt j

j=1

0

/ −

k

∑ yt j

02 ,

(10.32)

j=1

cf. Eq. (1) of Murphy (1972c). A natural extension of RPS, as to moving from multi-categorical to continuous cases, is to replace the summations in Eq. (10.31) with integral, which gives rise to the very popular CRPS. As is the case of RPS, CRPS is also strictly proper. It is given by:

crps(F, y) =



−∞

[F(x) − {x ≥ y}]2 dx,

(10.33)

414

Solar Irradiance and Photovoltaic Power Forecasting

where x is an integration variable, to avoid notation overload with y in the equation, which is the observation; {x ≥ y} is an indicator function which takes the value 1 if x ≥ y, 0 otherwise. CRPS can be interpreted as a measure of the distance between the forecast CDF, F, and a vertical line at x representing the observation, which denotes a cumulative-probability step function that jumps from 0 to 1 at the point where x = y, i.e., the CDF of observation. In this regard, CRPS achieves a minimum value of zero when F(x) = {x ≥ y}, which suggests a perfect deterministic forecast. 10.2.2.1

Computation of CRPS

The integral in Eq. (10.33) can be hard to evaluate, except for some known parametric distributions, of which the CRPS has closed-form expressions. For instance, if F is a step function, i.e., a deterministic forecast, CRPS reduces to the absolute deviation between the forecast and observation, and n-sample CRPS reduces to mean absolute error (MAE). If the forecast distribution is normal, i.e., F ∼ N (μ, σ ), CRPS is:

        y−μ y−μ y−μ 1 N crps F , y = σ , (10.34) 2Φ − 1 + 2ϕ −√ σ σ σ π where Φ(·) and ϕ(·) are the CDF and probability density function (PDF) of the standard normal distribution. If the forecast distribution is normal with left-truncating at zero, i.e., F ∼ N + (μ, σ , 0, ∞), which could be particularly useful since many meteorological variables, such as wind speed or solar irradiance, are non-negative. The CRPS expression for truncated normal is:     μ −2 y − μ  μ    y − μ  μ  + crps F N , y =σ Φ Φ −2 2Φ +Φ σ σ σ σ σ    

  √ y−μ μ μ 1 + 2ϕ 2 Φ −√ Φ . (10.35) σ σ σ π Equation (10.35) was derived by Gneiting et al. (2006), and similarly, some other authors have derived CRPS expressions for some other known parametric distributions. On this point, the reader is referred to Table 9.7 of Wilks (2019), in which a rich list of references on analytic CRPS formulas for various continuous distributions is offered. In addition, most if not all of these formulas are implemented in the R package scoringRules (Jordan et al., 2019). If F has a finite first moment, an alternative expression of CRPS is offered by Gneiting and Raftery (2007):  1  crps(F, y) = EF |X − y| − EF X − X   , 2

(10.36)

where X and X  are independent copies of a random variable with distribution function F. A derivation is given in Appendix A.7. Equation (10.36) is particularly useful when F is represented by a sample, such as members of an ensemble or simulation output of a Markov chain Monte Carlo method (Gneiting et al., 2007). In contrast to Eq. (10.33), the computation of CRPS through Eq. (10.36) is more straightforward.

Probabilistic Forecast Verification

415

For a size-m sample x1 , . . . , xm drawn from m identical distributions F, of which the ECDF is F m , the CRPS is:   1 m    1 m m  crps F m , y = ∑ x j − y − 2 ∑ ∑ x j − xk  . (10.37) m j=1 2m j=1 k=1 Notwithstanding, as noted by Jordan et al. (2019), the programming of Eq. (10.37) is inefficient, with a computational complexity of O(m2 ). In this regard, in the scoringRules package in R (Jordan et al., 2019), a formula with sorted samples is used:      2 m  1 crps F m , y = 2 ∑ x( j) − y m {y < x( j) } − j + , (10.38) m j=1 2 where x( j) is the jth sorted member forecast. This formula has a computation complexity of O(m log m). Besides the ability to calculate CRPS of ensemble forecasts, the scoringRules package also offers an ultra-wide range of functions for CRPS calculation with parametric distributions. All parametric distributions that are of particular concern in solar forecasting, such as the mixtures of normal (Hollands and Suehrcke, 2013), truncated normal (Yang, 2020d), gamma (Bakker et al., 2019), truncate logistic, or truncated skewed Student’s t (Yagli et al., 2020b), are available in that package. There is a direct and well-known relationship between CRPS and Brier score. Brier score in Eq. (10.22) is used to gauge forecast of a specific event. For the current purpose, an event is said to have occurred if the observation y does not exceed some threshold x of F. For a threshold x, the forecast probability of the event “x ≥ y” is p = F(x), and the outcome of this event can be described by an indicator function z = {x ≥ y}, i.e., z = 1 if the event occurs, and z = 0 if it does not occur. It follows that the Brier score between forecast probability p and the outcome z is: bs(p, z) = (p − z)2 = [F(x) − {x ≥ y}]2 .

(10.39)

Since the Brier score is a function of the threshold value, it is customary to denote the Brier score in Eq. (10.39) as bs(x), with that, crps(F, y) =



−∞

bs(x)dx.

(10.40)

Figure 10.4 shows a visualization of the Brier score decomposition of CRPS, in which three identical forecast distributions F ∼ N (0, 1) are plotted, and the respective observations are −0.4, 0, and 0.9. For any x < y, {x ≥ y} = 0, and the Brier score is given by F 2 (x). For any x ≥ y, {x ≥ y} = 1, and the Brier score is given by [1 − F(x)]2 . Consequently, the pair of shaded areas in each subplot of Fig. 10.4 sum up to the CRPS value. It is interesting to note that when the forecast is deterministic, i.e., F(x) is a cumulative-probability step function, the integrand of Eq. (10.33) is either zero or one. The integrand takes the value one in the region where the forecast and observed CDFs differ, and the region has the area |x − y|, where x is the forecast and y is the observation. In other words, CRPS reduces to absolute error when the forecaster issues a deterministic forecast.

416

Solar Irradiance and Photovoltaic Power Forecasting

y = − 0.4

Probability

1.0

y=0

y = 0.9

0.5

0.0

−2

−1

0

1

2

−2

F 2(x )

−1

0 x

1

F (x )

2

−2

−1

0

1

2

1 − (1 − F (x ))2

Figure 10.4 A visualization of CRPS and its relationship with the threshold-specific Brier score. The areas of the two shaded regions in each plot add up to CRPS.

10.2.2.2

CRPS decomposition

Following Eq. (10.15), the average CRPS over n forecasts is CRPS =

1 n ∑ crps(Ft , yt ) = n t=1

where BS(x) =

∞ −∞

BS(x)dx,

1 n ∑ [Ft (x) − {x ≥ yt }]2 . n t=1

(10.41)

(10.42)

Seeing that CRPS is an integral of the Brier score over the range of all possible thresholds, and the Brier score can be decomposed into three terms, it would be natural to explore whether or not the idea of decomposition is, or at least to some degree, transferable to the case of CRPS. This indeed turns out to be possible, but understanding the procedure requires a few pages, which we should wish to spend next. Although a good forecasting system would issue case-dependent probabilistic forecasts, let us consider for now a toy case that, regardless of the forecasting situation, the system only issues climatology forecast, Fc . In this case, CRPS is:

1 n ∞ ∑ −∞ [Fc (x) − {x ≥ yt }]2 dx n t=1  

∞ 2 n 1 n 2 1 n 2 = ∑ Fc (x) − n ∑ Fc {x ≥ yt } + n ∑ {x ≥ yt } dx. −∞ n t=1 t=1 t=1

CRPS =

(10.43)

n n−1 = 1 and 2 {x ≥ yt } = {x ≥ yt }, and deKnowing Fc is independent of t, ∑t=1 n −1 noting Fn (x) = n ∑t=1 {x ≥ yt }, which is the ECDF constructed with verification

Probabilistic Forecast Verification

417

samples, Eq. (10.43) becomes:

∞  CRPS = Fc2 (x) − 2Fc (x)F n (x) + F n (x) dx −∞

=

∞ −∞

Fc (x) − F n (x)

2

dx +



uncertainty



−∞

  ! Fn (x) 1 − Fn (x) dx .

(10.44)

This is a special case of CRPS decomposition, where the first term marks the disagreement between climatology forecast and observed climatology, and the second term depends wholly on the verification samples, in that, it is a measure of uncertainty (Hersbach, 2000). Hence, if we are to produce a climatology forecast, the CRPS is minimized when Fc = F n , which is the internal climatology probabilistic forecast. That said, when case-dependent forecasts are issued instead of Fc , they can no longer be taken out of the summation, as such a new approach is needed. CRPS decomposition is proposed by Hersbach (2000) in a weather forecasting context. In that, the notation and mathematical expressions in that paper are suited for ensemble forecasts. However, as noted by Candille and Talagrand (2005), the original approach of Hersbach (2000) is essentially performing a change in variable to the CRPS expression—the threshold of the CRPS integral is replaced by its PIT, i.e., x → p with p = F(x). Mathematically,

∞ [F(x) − {x ≥ yt }]2 dx CRPS =E −∞

1 dx (p)d p , (10.45) =E [p − {p ≥ pt }]2 dp 0 where pt = F(yt ) is the PIT of the verification. Then, by defining

dx (p) , g(p) =E dp

9 dx o(p) = E {p ≥ pt } (p) g(p) , dp

(10.46) (10.47)

it is straightforward to derive from Eqs. (10.45)–(10.47) after algebraic substitution that: reliability

 CRPS =

1 0



[p − o(p)] g(p)d p + 2

CRPSpot

! 

1 0



!

o(p)g(p)[1 − o(p)]d p,

(10.48)

where the second term is referred to as potential CRPS by Hersbach (2000), as it indicates the CRPS of a perfectly reliable system with p = o(p) and thus a zero first term. Given the expression of the uncertainty term in Eq. (10.44), from which CRPSpot is to be subtracted, the resolution term obtains. CRPS = REL + CRPSpot ≡ REL − RES + UNC.

(10.49)

418

Solar Irradiance and Photovoltaic Power Forecasting

Although the reliability–resolution–uncertainty decomposition in Eq. (10.48) is elegant in form, its evaluation still demands some effort. To do so, let us first consider an m-member ensemble forecast x1 , . . . , xm , and its verification y. The ECDF of the forecast is F m (x) = m−1 ∑mj=1 {x ≥ x j }. Recall that F m (x) is a piecewise constant function that take transition at each x( j) , which is the jth sorted member forecast, i.e., x( j) ≤ x( j+1) , for j = 1, . . . , m − 1. Stated differently, j F m (x) = p j = , m

for x( j) < x < x( j+1) .

(10.50)

Here, it is noted that p j is only related to the number of ensemble members m, but not any particular F m . For convenience, we let x(0) = −∞ and x(m+1) = ∞. Figure 10.5 shows the ECDF of a 5-member ensemble forecast (black solid line) and its verification (dashed vertical line). 1.00

F 5(x )

0.75

0.50

0.25

0.00 x (1) x (2)

x (3) y

x (4)

x (5)

Figure 10.5 The ECDF of a 5-member ensemble forecast with sorted members x(1) , . . . , x(5) , alongside their verification y. The CRPS is represented by the shaded area.

Following Eq. (10.33),   crps F m , y =

m



x ( j+1)

j=0 x( j)

[p j − {x ≥ y}]2 dx ≡

m

∑ c j.

(10.51)

j=0

Since {x ≥ y} only takes the value of zero or one, depending on the relative position of the threshold and the verification, CRPS can be computed by summing: c j = α j p2j + β j (1 − p j )2 ,

for j = 0, . . . , m,

(10.52)

where the values of α j and β j are given in Table 10.3. The c j values are represented # " as gray rectangles in Fig. 10.5. It should be noted that, in cases where y ∈ x(1) , x(m) , one needs not to compute c0 and" cm for their # values are zero. But in some cases, the verification may fall outside the x(1) , x(m) range—the ensemble forecast misses the

Probabilistic Forecast Verification

419

observation completely—as such one needs to additional compute c0 , when y < x(1) , or cm , when y > x(m) , with corresponding α j and β j given in the last two rows of Table 10.3.

Table 10.3 The values of α j and β j for different relative positions of the sorted ensemble members and the verification.   y ∈ x(1) , x(m) 

y∈ / x(1) , x(m)



Condition

αj

βj

y > x( j+1)

x( j+1) − x( j)

0

x( j+1) > y > x( j)

y − x( j)

x( j+1) − y

y < x( j)

0

x( j+1) − x( j)

y < x(1) y > x(m)

0 y − x(m)

x(1) − y 0

For n-sample CRPS, one proceeds from Eqs. (10.51) and (10.52), and yields:  1 n m  1 n m ct j = ∑ ∑ αt j p2j + βt j (1 − p j )2 ∑ ∑ n t=1 j=0 n t=1 j=0   m p2j n (1 − p j )2 n =∑ ∑ αt j + n ∑ βt j . j=0 n t=1 t=1

CRPS =

(10.53)

Assign

then, we obtain CRPS =

αj =

1 n ∑ αt j , n t=1

(10.54)

βj=

1 n ∑ βt j , n t=1

(10.55)

  2 2 α p + β (1 − p ) . j j ∑ j j m

(10.56)

j=0

At this stage, we introduce two sets of quantities, namely, g j and o j , j = 0, 1, . . . , m, with which the CRPS decomposition is performed. Firstly, for 0 < j < m, g j =α j + β j , oj =

βj α j +β j

(10.57) =

βj . gj

(10.58)

From Table 10.3, one can see that g j = x( j+1) − x( j) ,

for 0 < j < m,

(10.59)

420

Solar Irradiance and Photovoltaic Power Forecasting

which is the average width of jth bin. Then,  to the average frequency  o j is related that the verification is found to be below x( j+1) + x( j) /2. For j = 0, o0 is the frequency that the verification falls below x(1) , whereas g0 is the average distance of the verification away from x(1) : o0 =

1 n ∑ {xt(1) − yt }, n t=1

(10.60)

g0 =

β0 . o0

(10.61)

om =

1 n ∑ {xt(m) − yt }, n t=1

(10.62)

gm =

αm . 1 − om

(10.63)

Similarly, for j = m,

It is then obvious that, for each j = 0, . . . , m, the following is true: α j p2j =g j (1 − o j ) p2j ,

(10.64)

β j (1 − p j )2 =g j o j (1 − p j )2

(10.65)

Substitute these into Eq. (10.56), the CRPS expression becomes:  m  CRPS = ∑ g j (1 − o j ) p2j + g j o j (1 − p j )2 j=0



m

reliability



! 

m

CRPSpot



!

= ∑ g j (p j − o j ) + ∑ g j o j (1 − o j ) . j=0

2

(10.66)

j=0

The analogy between Eqs. (10.66) and (10.48) is plain to see. In practice, CRPS decomposition can be conducted via the verification package in R (NCAR, 2015), which outputs CRPS, potential CRPS, and the reliability term, but neither the resolution nor the uncertainty term. Nonetheless, the computation of the uncertainty term is direct, as noted in Eq. (10.44); one just needs to numerically integrate the second term therein. Equivalently, one can compute the uncertainty term through: UNC =

1 n t ∑ ∑ |yt − yk | , n2 t=1 k=1

(10.67)

which, after subtracting the potential CRPS, yields the resolution term. CRPS, being one of (if not) the most celebrated scoring rules, has been studied extensively in the literature. Instead of going through all the details, some important results regarding the CRPS decomposition are listed, and interested readers can explore them on their own.

Probabilistic Forecast Verification

421

• CRPS is additive, in that, for the CRPS of the union of two sets of forecast– verification data is the weighted (arithmetic) average of the CRPSs of the two datasets with the weights proportional to the respective sample sizes. • The components of the CRPS are not additive. • Although CRPS is the integral of the Brier score, the reliability and resolution components of the CRPS are not the integrals of the reliability and resolution components of the Brier score. • The reliability component of the CRPS is related to the rank histogram (see Section 10.3) but not identical. 10.2.3

IGNORANCE SCORE

The ignorance score (IGN), which also frequently goes by the name logarithmic score, is commonly attributed to Good (1952), who investigated the relationship between the theory of probability and the theory of rational behavior. By assigning the logarithm of a probability estimate, as a measure of the merit for predicting a dichotomous event, the amount of information lost can be quantified. Although it is unclear whether the idea of the strictly proper scoring rule was known and therefore applied during the proposal of IGN, it turned out to be one anyway. IGN assigns a score to a PDF forecast in the following form: ign( f , y) = − log f (y),

(10.68)

where f is the forecast PDF, and y is the verification; the notation f (y) reads, in this context, “PDF f evaluated at y.” The unit in which IGN is measured depends on the base of the logarithm used to calculate the score. For the natural logarithm, the unit is nats. Based on Eq. (10.68), one may immediately conclude that IGN reflects only the local behavior of the forecast, since the score is related only to the value of the density function at the verification. In other words, this property implies that IGN ignores the forecast probability of events/values that could have happened but did not (Kr¨uger et al., 2021). This contrasts the CRPS, which evaluates the CDF over the entire threshold range, although it rewards predictive distributions that place centers of mass close to the verification—Matheson and Winkler (1976) described this property as “sensitivity to distance.” As far as multiple choices present, especially when no choice strictly dominates the rest, there would be debates as to which option should be favored and in what ways it should be favored. Since both IGN and CRPS are strictly proper scoring rules, whether to choose one over the other appears ultimately subjective (Kr¨uger et al., 2021). Since IGN is the negative of the logarithm of the value of the forecast PDF at the observation, its evaluation is trivial for parametric predictive distributions. If the forecast distribution is normal, i.e., F ∼ N (μ, σ ), IGN is:   ln(2πσ 2 ) (y − μ)2 ign f N , y = + . 2 2σ 2

(10.69)

422

Solar Irradiance and Photovoltaic Power Forecasting

The gamlss.dist package in R (Stasinopoulos and Rigby, 2022) contains more than 100 distributions, each of which has its PDF implemented.5 However, for ensemble forecasts, unlike CRPS which can be computed using Eq. (10.37) or (10.38), the computation of IGN requires a predictive density. An estimator can be obtained with classical nonparametric kernel density estimation (KDE), as how it is implemented in the scoringRules package.6 However, Jordan et al. (2019) noted that this estimator is “fragile” in practice: If the materialized observation is at the tail of the estimated predictive density, the score may be highly sensitive to the choice of the bandwidth, which is the tuning parameter needed for kernel density estimation. Figure 10.6 provides a visualization of this undesirable effect, in which the KDE-based IGN estimate as a function of bandwidth is plotted. On this point, some authors have discussed an alternative approach using Markov chain Monte Carlo (Kr¨uger et al., 2021).

IGN [nats]

10 9 8 7 6 0

100

200 Bandwidth [W/m2]

300

Figure 10.6 KDE-based IGN estimate as a function of bandwidth. The forecasts used are the ECMWF 50-member ensemble GHI forecasts at Table Mountain, Boulder, Colorado (40.125◦ N, −105.237◦ W, 1689 m), averaged over all daytime samples from the year 2020, whereas the observations used are NSRDB’s satellite-derived irradiances.

10.2.4

QUANTILE SCORE

The quantile score (QS), which is also known as pinball loss, differs from CRPS and IGN, in that, it evaluates individual quantiles. We have seen several expressions of quantile score, depending on the context, in previous sections of the book (Sections 8.4.1 and 8.5.3.2). Here, letting qτ = F −1 (τ) be the τ th quantile of the predictive 5 All distributions in R have four related functions: (1) the PDF, (2) the CDF, (3) the quantile function, and (4) a function which generates random numbers from the distribution. These four functions are prefixed with “d,” “p,” “q,” and “r” respectively. For instance, the PDF of a normal distribution is dnorm, whereas the CDF of a beta distribution is pbeta. The gamlss.dist package is designed in the same way, and allows its users to input location, scale, and shape parameters. 6 A less appealing way is given by Roulston and Smith (2002), who approximated the forecast PDF by a uniform distribution between each ensemble member.

Probabilistic Forecast Verification

423

CDF, we consolidate these expressions of quantile score: qsτ (F, y) =ρτ (y − qτ ) = ( {qτ ≥ y} − τ) (qτ − y)  for y ≥ qτ , (y − qτ )τ, = (qτ − y)(1 − τ), for y ≤ qτ .

(10.70)

An interesting relation between the quantile score and CRPS is that the former can be expressed as an integrand of the latter. More specifically, recall that CRPS is the integral of the Brier score over the range of threshold—see Eqs. (10.39) and (10.40). Analogously, CRPS is (two times of) the integral of the quantile score over the range of probability (Gneiting and Ranjan, 2011): crps(F, y) = 2

1 0

qsτ (F, y) dτ.

(10.71)

To understand this, we perform a change of variable—the probability is replaced by threshold, i.e., τ → x with x = F −1 (τ). Mathematically, crps(F, y) =2 =2 =

1

0 ∞

( {qτ ≥ y} − τ) (qτ − y) dτ

−∞



−∞

[ {x ≥ y} − F(x)] (x − y) f (x)dx

[F(x) − {x ≥ y}]2 dx,

(10.72)

which returns to the familiar CRPS expression. The last step of Eq. (10.72) is due to integration by parts (Laio and Tamea, 2007),7 with u(x) = − (x − y),

(10.73)

v(x) = [F(x) − {x ≥ y}]2

(10.74)

and noting that F(x) − {x ≥ y} = 0 for x → ±∞. Quantile score was chosen as the scoring rule for the Global Energy Forecasting Competition 2014, during which the contestants were instructed to submit 0.01 to 0.99 quantiles with a step size of 0.01 (Hong et al., 2016). The final score then took the aggregated (averaged) form. Knowing that CRPS integrates quantile score over all probabilities, the contestants’ performance ranked by aggregated quantile score would be the same as that ranked by CRPS; the ranking is the same, but the score for each contestant in the former case would be approximately (considering numeric errors) half of that in the latter case. Hence, in most circumstances where CRPS is reported, nor is there any motivation for reporting the aggregated quantile score, and vice versa. On the other hand, since CRPS evaluates the entire predictive distribution, it is unable to capture the deficiencies (if any) in different parts of the distribution, e.g., the tails. Bentzien and Friederichs (2014), for that reason, advised to report the quantile score for numerous τ’s, separately. 7 Recall





integration by parts goes: u dv = uv + v du.

424

Solar Irradiance and Photovoltaic Power Forecasting

Figure 10.7 The CRPS, IGN, and pinball loss for τ = 0.3 as functions of the threshold of three generalized extreme value distributions with zero location (μ = 0), standard scale (σ = 1), and shape parameters of ξ = −0.5, 0, 0.5.

To visualize how different scoring rules work, Fig. 10.7 depicts how the CRPS, IGN, and pinball loss vary according to the threshold values of three different generalized extreme value (GEV) distributions (selected without loss of generality) with zero location (μ = 0), standard scale (σ = 1), and shape parameters of ξ = −0.5, 0, and 0.5. It can be seen that the CRPS has sensitivity to distance, for it gives best scores at thresholds where the centers of mass of the PDFs maximize (i.e., the medians). IGN is evidently local, in that, it gives the best scores at thresholds that correspond to the peaks (i.e., the modes) of the PDFs. Lastly, pinball loss is asymmetrical (except for τ = 0.5, which is not drawn here), and gives best scores at the τ th quantiles.

10.3

TOOLS FOR VISUAL ASSESSMENT

Generally speaking, any means of graphically displaying some quantification of quality can aid visual assessment of forecast performance. For instance, one may plot the CRPSs of forecasts from several competitive models, alongside that from a standard of reference, to learn about the relative performance of various models with respect to the benchmark. However, it is a widely accepted fact that whenever visual assessment of probabilistic forecasts of continuous random variables is mentioned, it always points to one of the two classes of tools, namely, those assessing calibration and those assessing sharpness, upon which the discussion of this section is based. 10.3.1

ASSESSING CALIBRATION

There are three conceptually equivalent tools for assessing calibration graphically, namely, the rank histogram, PIT histogram, and reliability diagram. Their usage choice often depends on the form of probabilistic forecasts under verification, that

Probabilistic Forecast Verification

425

is, the rank histogram is used for ensemble forecasts, PIT histogram for distributional forecasts, and reliability diagram for quantile forecasts. Because the conversion among various forms of probabilistic forecasts is possible and convenient, the exact choice of using which one of these tools becomes only marginally relevant. That said, both rank histogram and PIT histogram are straightforward to generate, whereas the development of and variations in reliability diagram require some effort to explain. Being a slightly more sophisticated idea to grasp, reliability diagram also gains certain flexibility in its construction, which is not possessed by the other two options. In other words, reliability diagram contains and thus is able to reveal more information than rank and PIT histograms. It is perhaps for that reason, that reliability diagram has been recommended by Lauret et al. (2019), as the “go-to” visual verification tool for probabilistic solar forecasts. 10.3.1.1

Rank histogram

The (verification) rank histogram, which is a verification tool for assessing the calibration (i.e., reliability) of ensemble forecasts, offers a visual justification of the evaluation criterion of “uniformity of verification rank.” In theory, if an ensemble forecast comes from a perfect model, with its members being equally likely, and the verification is a plausible member of the ensemble (i.e., statistically indistinguishable from the ensemble members), the histogram of the rank distribution of the verification should be nearly uniform. Stated differently, if the verification is sorted together with the ensemble members, it has an equal chance of appearing anywhere in the ranking, and thus after sufficient repetitions (with many samples), the relative frequency of the ranks of verifications should be close to 1/(m + 1), where m is the size of the ensemble. The idea of rank histogram (Hamill and Colucci, 1997) was developed in the 1990s by several authors, contemporaneously, it is thus also known as the Talagrand diagram (Talagrand et al., 1997) or the binned probability ensemble (Anderson, 1996). This might be due to the fact that the rank histogram is very similar to the PIT histogram that was established at a much earlier time (Dawid, 1984), which verifies the consistency of distributional forecasts (see below). In any case, as noted by Hamill (2001), a flat rank histogram is a necessary but not a sufficient condition for calibration. Making a rank histogram out of n forecast–observation pairs is straightforward. For each forecast–observation pair, one proceeds to find the rank of the observation within the combined set of the observation and the ensemble member forecasts—for example, give a three-member ensemble forecast {300, 100, 200}, and the observation 150, the rank of the observation in the set {100, 150, 200, 300} is 2. That is, a verification rank of 1 indicates that the observation is the lowest value in the combined set and thus smaller than all ensemble members, a rank of 2 indicates that the observation is greater than only one of the ensemble members, and so on and so forth until a rank of m + 1, which indicates that the observation is greater than all ensemble members. Then, the histogram plotted using the verification ranks is the rank histogram. For m-member ensemble forecasts, the number of bars in the histogram

426

Solar Irradiance and Photovoltaic Power Forecasting

is m + 1.

Relative frequency

10+0 10−1 10−2

10

20

30

40

50

Rank

Figure 10.8 Rank histogram of ECMWF 50-member ensemble GHI forecasts at Table Mountain, Boulder, Colorado (40.125◦ N, −105.237◦ W, 1689 m). Figure 10.8 shows the rank histogram of ECMWF 50-member ensemble global horizontal irradiance (GHI) forecast at Table Mountain, Boulder, Colorado (40.125◦ N, −105.237◦ W, 1689 m), over the year 2020 with nighttime (i.e., zenith angle greater than 85◦ ) data removed. If the forecasts are reliable, each bar in the plot should approximately have a relative frequency of 1/51 (as represented by the horizontal line in the figure). However, this is evidently not the case here. A U-shaped rank histogram indicates under-dispersed forecasts. In fact, under-dispersion is the most commonly encountered form of lack of calibration in numerical weather prediction (NWP) ensemble forecasts of not just irradiance but also wind speed (Sloughter et al., 2010), precipitation (Eckel and Walters, 1998) among other meteorological variables. As such, probabilistic-to-probabilistic (P2P) post-processing has hitherto been a necessary practice. 10.3.1.2

PIT histogram

The PIT is the value that the predictive cumulative distribution function attained at the observation, i.e., pt = Ft (yt ). It is a continuous analog of the verification rank. For calibrated forecasts, the observations behave as random samples of the predictive distributions, which suggests the uniformity of PIT. Naturally, if we are to plot the histogram of the pt ’s derived from some distributional forecasts and their corresponding observations, it can be used to assess reliability in the same way as rank histogram does for ensemble forecasts. In that, a more uniform PIT histogram indicates better calibration, and a flat PIT histogram is a necessary but not a sufficient condition for calibration (see the simulation study in the introduction section of Gneiting et al., 2007, for a counter example). Figure 10.9 provides several examples of PIT histograms, from which various deficiencies in the forecasts may be concluded. There is a caveat with using PIT, especially when the forecasts are generated using those classical time series frameworks (Gneiting et al., 2007). Assessing calibration through PIT is established upon the theory that the PIT series {pt }, t = 1, . . . , n,

Probabilistic Forecast Verification

427

(a)

(b)

(c)

(d)

(e)

(f)

Density [dimensionless]

4 2 0

4 2 0 0

0.5

1 0

0.5 1 0 PIT, p t = F t (x ) [0−1]

0.5

1

Figure 10.9 Shapes of probability integral transform (PIT) for 10,000 simulated X ∼ N (0, 1) observations under different forecasts. Forecast distributions are: (a) positively biased, F ∼ N (0.5, 1); (b) negatively biased, F ∼ N (−0.5, 1); (c) under-dispersed, F ∼ N (0, 0.5); (d) under-dispersed and positively biased, F ∼ N (0.25, 0.5); (e) over-dispersed, F ∼ N (0, 1.5); and (f) generalized extreme value distribution, F ∼ GEV(0, 1, −0.5).

is independent and identically distributed (iid) U (0, 1). We may conduct simple hypothesis tests, such as the Kolmogorov–Smirnov test or the Cram´er–von Mises test, for iid U (0, 1). However, the tests do not have much practical relevance, for they are not constructive. If a test rejects the null hypothesis that the samples are drawn from the U (0, 1) distribution, it provides no information on why the null is rejected. The rejection can be due to either a violation of uniformity or a violation of the iid assumption. In this regard, Diebold et al. (1998) proposed to supplement the PIT histogram with a series of correlograms, as to check whether pt is iid. A correlogram is a plot of correlation statistics, i.e., the autocorrelation of a time series as a function of time lag. Serial correlation in {pt } indicates that conditional mean dynamics have been inadequately captured by the forecasts. Similarly, nonlinear forms   2 , p) − of dependence can be checked through visualizing the correlograms of (p t     (pt − p)3 , and (pt − p)4 series, which reveal dependence operative through the conditional variance, conditional skewness, or conditional kurtosis (Diebold et al., 1998).

428

10.3.1.3

Solar Irradiance and Photovoltaic Power Forecasting

Reliability diagram

Besides rank histogram and PIT histogram, another popular visualization tool for assessing probabilistic forecasts is the so-called reliability diagram. As its name suggests, this tool also assesses, in the main, reliability, although it additionally contains information pertaining to resolution and sharpness. The history of reliability diagram can be traced at least back to 1963, when Sanders (1963) plotted the conditional observed relative frequency against the forecast probability for some binary events. Conceptually, reliability diagram is the counterpart of the conditional distribution plot under calibration–refinement factorization for deterministic forecasts of continuous variables, see Fig. 9.4—if the conditional distributions are (equivalently) represented as conditional quantiles, such a plot is known as the conditional quantile plot as described in Section 9.3.3 of Wilks (2019). Recall that in deterministic forecast verification, the calibration–refinement factorization is f (x, y) = f (y|x) f (x), where X and Y are random variables representing “forecast” and “observation,” and calibration means E(Y |X = x) = x, ∀x. In probability forecast (for binary events) verification, the observation variable Y is binary and the forecast variable P takes value in the [0, 1] range; calibration in this case implies P(Y = 1|P = p) = p, ∀p, that is, when a forecast probability p is issued, the probability of the event Y occurring is equal to p. The procedure of producing a reliability diagram for probability forecasts of binary events is similar in spirit to that of producing Fig. 9.4. In Fig. 9.4, the forecasts are allocated into several mutually exclusive and collectively exhaustive bins— Fig. 9.4 selects {50, 150, . . . , 950} W/m2 as bin centers—and the observations corresponding to forecasts in each bin form a conditional distribution f (y|x). Analogously, for reliability diagram of probability forecasts of binary events, the forecast probability also needs to be allocated into several bins, and the observed relative frequency of occurrence conditional on each bin-center probability can be computed. In both cases, the binning essential divides the all-sample forecast–observation pairs into sub-samples, each consisting of a bin center (i.e., a forecast value) and numerous observations. To give perspective, Fig. 10.10 shows the reliability diagram for probability forecasts of “clear-sky event,” which is said to have occurred if clear-sky index κ > 0.9, at Table Mountain, Boulder, Colorado (40.125◦ N, −105.237◦ W, 1689 m), over the year 2020. The probability forecasts and the binary observations are derived from ECMWF 50-member ensemble GHI forecasts, REST2 clear-sky irradiance, and National Solar Radiation Data Base (NSRDB) GHI. In this figure, the probability forecasts are allocated into 10 bins, with centers {0.05, 0.15, . . . , 0.95}, which form the abscissa. For each sub-sample, the observed relative frequency is computed by counting the fraction of event occurrences, which forms the ordinate. Since calibration implies that the observed relative frequency at large samples should approach the corresponding forecast probability, if that is true, then all dots would be tightly around the identity line. In terms of the algebraic decomposition of the Brier score—see Eq. (10.75) below—such forecasts exhibit excellent reliability, because the squared differences in the reliability term correspond to squared vertical distances between

Probabilistic Forecast Verification

429

Observed relative frequency, y k

the dots and the identity line in the reliability diagram. The reader is referred to Wilks (2019) for detailed instructions on how to interpret a reliability diagram. 1.00 0.75 0.50 0.25 0.00 0.00

0.25 0.50 0.75 Forecast probability, p k

1.00

Figure 10.10 Reliability diagram for probability forecasts of “clear-sky event,” which is said to have occurred if clear-sky index κ > 0.9, at Table Mountain, Boulder, Colorado (40.125◦ N, −105.237◦ W, 1689 m), over the year 2020. ECMWF 50-member ensemble GHI forecasts, REST2 clear-sky irradiance, and NSRDB GHI are used to derive the probability forecasts and binary observations of clear-sky events. The histogram at the bottom illustrates the distribution of the n = 4075 forecast values. Since its initial proposal, the reliability diagram has undergone numerous major upgrades. One of those, for instance, is that Hsu and Murphy (1986) included in the diagram two reference lines related to the Brier score decomposition and the Brier skill score. The three-component decomposition of the n-sample averaged Brier score, for binary observation variable Y and probability forecast variable P, is: reliability

resolution 2

  !   ! uncertainty l l   ! 1 1 BS = ∑ nk (pk − yk )2 − ∑ nk (yk − y)2 + y (1 − y), n k=1 n k=1

(10.75)

where pk , k = 1, . . . , l, are the l possible forecast values (i.e., the bin centers); nk is the number of instances when pk is issued; yk is the observed relative frequency computed from those nk observations in question; and y is the overall observed relative frequency, with y = n−1 ∑lk=1 nk yk . The physical interpretation of y is the long-term base rate estimated from the sample, i.e., the climatology. Hence, if we plot the horizontal line yk = y in a reliability diagram, it represents the no-resolution scenario. Next, recall Eq. (10.17), which states S ∗ = 1 − Sfcst /Sref , the Brier skill score can be expressed as:

S∗ =

1 l 1 l nk (yk − y)2 − ∑ nk (pk − yk )2 ∑ n k=1 n k=1 y

,

(10.76)

430

Solar Irradiance and Photovoltaic Power Forecasting

where the overall observed relative frequency y, which can be interpreted as the long-term base rate estimated from the sample, constitutes a natural choice of na¨ıve standard of reference. Since both components in the numerator of the right-hand side of Eq. (10.76) compute the weighted average of their respective arguments, they can be combined, and the Brier skill score becomes: S∗ =

1 l ∑ nk Sk∗ , n k=1

(10.77)

where Sk∗ is the Brier skill score of the kth sub-sample of forecasts, and Sk∗ =

(yk − y)2 − (pk − yk )2 . y

(10.78)

Observed relative frequency, y k

From this equation, it is clear that for no-skill forecasts, Sk∗ = 0, and yk = (pk +y)/2, which can be displayed as a straight line in the reliability diagram. Figure 10.11 depicts the same reliability diagram as Fig. 10.10, with the no-resolution and no-skill lines added. It is noted that points close to the no-resolution line indicate forecasts in those sub-samples lack resolution, whereas points in the shaded region contribute positively to the Brier skill score, see Eqs. (10.77) and (10.78). 1.00 0.75 No resolution

No

ll

ski

0.50 0.25 0.00 0.00

0.25 0.50 0.75 Forecast probability, p k

1.00

Figure 10.11 Same as Fig. 10.10, but in the style of Hsu and Murphy (1986). Moving beyond binary events, Hamill (1997) extended the utility of reliability diagram to the cases with multi-categorical probabilistic forecasts. The idea, simply put, is to convert a vector of forecast probability assignment to a vector of category numbers at a discrete set of nominal probabilities, from which the probability that the observed category is less than the forecast category at each nominal probability can be computed. Instead of plotting the observed relative frequency against the forecast probability, reliability diagram for multi-categorical probabilistic forecasts plots the probability of observations below the nominal probability against the

Probabilistic Forecast Verification

431

nominal probability. Extending this idea further, the approach for plotting reliability diagrams for distributional forecasts is also available (Pinson et al., 2010). Given n forecast–verification pairs, namely, (Ft , yt ), t = 1, . . . , n, we can define a series of indicator variables: (10.79) zτ,t = {yt < qτ,t } = {pt < τ}, where qτ,t is the τ th forecast quantile for time t, pt = Ft (yt ) is the PIT of yt , and τ is a particular nominal probability of interest. Then, by defining the observed pron zτ,t , it can be used as the basis of drawing reliability diagrams portion, zτ = n−1 ∑t=1 for density forecasts of continuous variables. In other words, we plot zτ against τ. Figure 10.12 illustrates the reliability diagram for some ensemble GHI forecasts, in which 10 nominal probabilities are considered τ ∈ {0.05, 0.15, . . . , 0.95}. This shape of the reliability diagram reveals that the forecasts are under-dispersed, which is typical for NWP-based ensembles.

Figure 10.12 Reliability diagram for ensemble GHI forecasts at Table Mountain, Boulder,

Colorado (40.125◦ N, −105.237◦ W, 1689 m), over the year 2020. ECMWF 50-member ensemble GHI forecasts are verified against NSRDB GHI. A total of 10 nominal probabilities, τ ∈ {0.05, 0.15, . . . , 0.95}, are considered.

Regardless of whether the reliability diagram is used for binary, multicategorical, or distributional forecasts, it has been known that ambiguities may arise from a lack of stability under unavoidable subjective implementation decisions (such as the binning choice) and limited counting statistics (Br¨ocker and Smith, 2007a). As such, even the reliability diagram of a collection of perfectly reliable forecasts may deviate from the exact diagonal. It follows that the evaluation of probability (or probabilistic) forecasts requires some idea as to how far the observed relative frequencies (or observed proportions) are expected to deviate from the diagonal if they were indeed reliable. Methods of increasing the reliability of reliability diagrams find relevance. The method of consistency resampling was first proposed by Br¨ocker and Smith (2007a) for the case of binary forecasts, and was later extended to the case of distributional forecasts by Pinson et al. (2010). Since solar irradiance is a continuous

432

Solar Irradiance and Photovoltaic Power Forecasting

random variable, we discuss the generation of consistency bars (or consistency band if interpolated) for the reliability diagram of distributional forecasts in what follows. Consistency resampling is established upon the fact that if the forecasts are perfectly reliable, the PITs are uniformly distributed. As such, if random numbers are drawn from a uniform distribution U (0, 1), the surrogate observations backtransformed through the quantile functions can be considered as random realizations of those forecast distributions of concern. Given a set of distributional forecasts {Ft }, t = 1, . . . , n, a single resampling cycle consists of the following steps: 1. Draw n samples with replacement from the set {Ft }, which are taken to be the surrogate forecasts, denoted as {Ft∗ }. 2. Draw n samples from a uniform distribution U (0, 1), namely, {pt∗ }, which are then taken into their respective quantile functions, from which a set of surrogate observations {yt∗ }, where each yt∗ = (Ft∗ )−1 (pt∗ ), obtains. 3. Compute the surrogate indicator variables for the τ th nominal probability, i.e., for each t, z∗τ,t = {Ft∗ (yt∗ ) < τ}, cf. Eq. (10.79). 4. Aggregate z∗τ,1 , · · · , z∗τ,n into z∗τ . 5. Repeat steps 3 and 4, for all τ at which the consistency bars are needed. When the above resampling procedure is repeated numerous times (typically a few hundred), one obtains a collection of z∗τ for each τ, which can be used to plot the consistency bars. It should be noted that, in the above procedure, due to the consecutive use of quantile functions and CDFs, which are canceled, one can directly use {pt } to calculate {z∗τ,t }. The above procedure follows the one suggested by Br¨ocker and Smith (2007a), but modified to suit a continuous forecast variable. However, as noted by Pinson et al. (2010), obtaining the consistency bars for reliability diagrams of distribution forecasts actually does not need to use such a sampling approach. From its definition in Eq. (10.79), zτ,t may be regarded as a realization of a Bernoulli random variable with parameter τ. Consequently, the observed proportion z¯τ , which is an outcome of summing n Bernoulli trials, is a realization of a binomial distribution with parameters n and τ, but scaled by the number of trials n. Mathematically, denoting the random variable representing the sum of zτ,t ’s as Zτ , we have Zτ ∼ B(n, τ), with   Zτ E = τ, (10.80) n   Zτ τ(1 − τ) . (10.81) = V n n What this implies is that instead of sampling, one may directly elicit quantiles from B(n, τ) and divide by n to obtain z∗τ ’s. Figure 10.13 compares the two approaches of acquiring consistency bands. Consistency bands are made under two cases, one uses all forecast–observation pairs from the year 2020, as in Fig. 10.12, which contains

Probabilistic Forecast Verification

433

Brocker

Pinson

1.00 0.75 n = 4075

Observed proportion, z τ

0.50 0.25 0.00 1.00 0.75

n = 259

0.50 0.25 0.00 0.00

0.25

0.50

0.75 1.00 0.00 0.25 Nominal probability, τ

Confidence

0.50

0.75

1.00

25% 50% 75%

Figure 10.13 (Top) Reliability diagrams using the same data as Fig. 10.12, but with consistency band computed with the approaches of Br¨ocker and Smith (2007a) and Pinson et al. (2010). (Bottom) Same as the top row, but only using a subset of forecast–observation pairs, namely, those from January 2020.

n = 4075 samples, and the other uses a sub-sample of n = 259 data points from January of that year. Clearly, the width of the consistency band varies according to both n and τ. 10.3.2

ASSESSING SHARPNESS

In comparison to assessing calibration, the assessment of sharpness requires much less effort and is much more straightforward. Sharpness is a property of forecast alone, which suggests its visualization does not involve observations at any rate. Sharpness can be gauged in several ways. For instance, one may quantify sharpness using the root mean variance of density forecasts, that is, the square root of the average of the predictive variances over the verification set (M¨oller and Groß, 2016).

434

Solar Irradiance and Photovoltaic Power Forecasting

Interval width [W/m2]

To integrate more information into the diagram, so as to show how sharpness varies with probability coverage, the average prediction interval size can be plotted against the nominal coverage ratio, which is known as the δ -diagram (Pinson et al., 2007). Extending the idea thereof, Gneiting et al. (2007) and Bremnes (2004) argued that the average interval width is still insufficient to characterize sharpness, and the box plot (or violin plot for that matter), which reveals several summary statistics (or density) of the distribution of interval width, is thought more informative. Moving beyond the idea of the box plot, the strategy of conditioning is available. That is, plotting a sharpness diagram for each binned range of observations, from which the quantification of resolution obtains (see Pinson et al., 2007, for an example). 600 400 200 0 10% 20% 30% 40% 50% 60% 70% 80% 90% Nominal coverage rate [%]

Figure 10.14 Sharpness diagram, in the style of Tukey’s box and whisker plot, using the same data as Fig. 10.12, for nominal coverage rates of central prediction intervals ranging from 10% to 90%.

In conclusion, the literature does not seem to have a consensual definition of what constitutes a sharpness diagram, but in one way or another, inasmuch as some quantifier of sharpness can be displayed graphically, the visual assessment can be carried out. Figure 10.14 shows one version of the sharpness diagram using the same data as Fig. 10.12. One can see that the interval width gradually spreads out with the nominal coverage rate. Moreover, the box plots evidence positive skews, with outliers (points outside 1.5 times the inter-quartile range above the upper quartile) being abundant; this echoes the argument that depicting just the mean or median interval width is not sufficient. Nonetheless, the sharpness diagram of forecasts from a single forecasting system or forecaster alone is hardly conducive to learning about the performance, because whether or not the forecasts are sharp is a relative notion, and sharper forecasts are only rewarding when they are calibrated. A last point to note in regard to the sharpness diagram is that the nominal coverage rate often defaults and corresponds to the central prediction interval. More specifically, when 90% nominal coverage is mentioned, for example, it is by default referred to the equal-tailed form, which ranges from the predictive quantile at level 0.05 to that at level 0.95. However, the same coverage can also be obtained from predictive quantiles at levels 0.04 and 0.94, 0.06 and 0.96, 0 and 0.9, 0.1 and 1, and more pairs may be drawn ad infinitum, from which ambiguity results.

Probabilistic Forecast Verification

10.4

435

CALIBRATION AND SHARPNESS: A CASE STUDY ON VERIFICATION OF ENSEMBLE IRRADIANCE FORECASTS

To enhance the understanding of the content of this chapter, a case study on ensemble irradiance forecast verification is put forward. GHI forecasts from the ECMWF’s Ensemble Prediction System (EPS), over a two-year (2019–2020) period, are taken as the subject of investigation, alongside the NSRDB’s satellite-derived irradiance as verifications. To echo the case study in Chapter 9, forecasts and verifications from the same three locations are used, namely, (1) Bondville (BON, 40.052◦ N, −88.373◦ W, 230 m), (2) Desert Rock (DRA, 36.624◦ N, −116.019◦ W, 1007 m), and (3) Pennsylvania State University (PSU, 40.720◦ N, −77.931◦ W, 376 m). Since the timeaveraging convention of EPS is at the end of the hour, instantaneous NSRDB irradiance from the middle-hour time stamps is used to ensure the time alignment between forecasts and observations. The overarching goal of probabilistic verification is to examine the calibration and sharpness of the forecasts of interest. Knowing that dynamical ensemble forecasts from NWP models have the tendency to be under-dispersed and therefore lack of calibration, P2P post-processing is motivated. The two-year dataset at each location is split into two halves according to the calendar year: Data from 2019 is used for training the post-processing models, and data from 2020 is retained as hold-out samples for verification. All three strategies for calibrating ensemble forecasts, as introduced in Section 8.6.1, are considered: (1) calibration via ensemble model output statistics (EMOS), (2) calibration via dressing, and (3) calibration via quantile regression (QR).

Table 10.4 CRPS [W/m2 ] and IGN [nats] of raw EPS and post-processed forecasts, at three locations. Row-wise best results under each scoring rule are in bold. CRPS

IGN

Stn.

Raw

NGR1

NGR2

BMA

QR

NGR1

NGR2

BON DRA PSU

55.5 34.2 60.0

50.6 24.2 54.4

51.7 26.0 55.3

56.3 28.5 59.9

49.1 25.0 54.6

5.80 5.28 5.87

5.80 5.04 5.85

Although EMOS can be carried out under different assumed predictive distributions, such as normal, truncated normal, lognormal, censored GEV, or censored and shifted gamma, it is not our immediate interest to inter-compare these possibilities. Therefore, EMOS with normal predictive distribution, which is otherwise known as nonhomogeneous Gaussian regression (NGR), is used without loss of generality. That said, there are two distinct strategies of parameter estimation for NGR models, one of which is minimizing the CRPS, and the other of which is maximizing the log-likelihood, or equivalently, minimizing the IGN. Since these two strategies are parallel and neither dominates the other, for their corresponding scoring rules are

436

Solar Irradiance and Photovoltaic Power Forecasting

both strictly proper, they are both considered. As for calibration via dressing, the classic method of Bayesian model averaging (BMA) with Gaussian kernel functions is employed. Lastly, the third calibration approach adopts the original QR. Although QR neural network or QR forest may have superior performance to QR, it is again not our primary goal here to contrast these different variants. In using these P2P postprocessing methods on dynamical ensemble forecasts, it should be recalled that the member forecasts are equally likely. This implies that, in the case of NGR and BMA, the models should adopt equal weights, and in the case of QR, the members are to be sorted before entering the model. (a)

Raw

(b) 3

10−1

2

10−2

1

10+0 10−1 10−2 10+0

BMA

QR

2 1 0 3

10−2

1 0

PSU

2

30 50 Rank

NGR2

0 3

10−1

10

NGR1

DRA

Density [dimensionless]

10

BON

Relative frequency

+0

0

0.5

10

0.5 10 0.5 PIT, p t = F t (x ) [0−1]

10

0.5

1

Figure 10.15 (a) Rank histograms of raw ensemble forecasts and (b) PIT histograms of the post-processed forecasts, at three locations. The post-processing methods are implemented in the R programming language. In particular, three packages are found useful, namely, (1) ensembleMOS (Yuen et al., 2018), (2) ensembleEMA (Fraley et al., 2021), and (3) quantreg (Koenker, 2020), which offer swift workflows and efficient algorithms for EMOS, BMA, and QR, respectively. It is remarked that all three packages are written by authors of the original papers in which the techniques are first proposed, and the credibility of these packages can be assumed at once. Before the data are fed into the models, all nighttime time stamps (herein defined as those points with zenith angle > 85◦ ) are removed from both the training and validation sets. It should also be noted that post-processing models built on probabilistic GHI forecasts and those on clear-sky index forecasts

Probabilistic Forecast Verification

437

often only result in negligible differences (Schulz et al., 2021; Yang, 2020d), so all models are herein built on GHI. BON

1000 750 500 250

Interval width [W/m2]

0 DRA

1000 750 500 250 0

PSU

1000 750 500 250 0 10% Method

30% 50% 70% Nominal coverage rate [%] Raw

NGR1

NGR2

BMA

90% QR

Figure 10.16 Sharpness diagrams of raw ensemble forecasts and four versions of postprocessed forecasts, at three locations. To begin with the verification sequence, the scores of the raw and post-processed forecasts are displayed in Table 10.4. The main score of interest is CRPS, adhering to the recommendation of Lauret et al. (2019). However, we also use IGN to compare the two versions of EMOS, and check whether or not the statistical theory on propriety can be empirically validated. To distinguish the two versions, the EMOS optimized by minimizing CRPS is referred to as NGR1, and that optimized by minimizing IGN is referred to as NGR2. Comparing NGR1 and NGR2, the former dominates the latter under CRPS at all three stations, whereas the situation reverses when

438

Solar Irradiance and Photovoltaic Power Forecasting

the forecasts are scored under IGN; this offers empirical evidence supporting the aforementioned theory on propriety. Next, given that CRPS is a composite score that assesses both calibration and sharpness simultaneously, the forecasts with the lowest CRPS are the most favorable. In this regard, NGR1 and QR may be concluded as more advantageous than NGR2 and BMA. It should be noted that the goodness of post-processing necessarily depends on implementation and the dataset in question, so the current results should not be regarded as general, for opposite findings (i.e., BMA is better than EMOS) have been reported in the literature (Doubleday et al., 2020); this is yet one more warning sign to those who are apt to quote a single study and draw opinionated conclusions about the superiority of one method over another. Figure 10.15 shows the rank and PIT histograms of the raw and post-processed forecasts at the three locations of interest. Based on the histograms, one may instantly presume the inadequacy of EMOS and BMA in calibrating the forecasts at hand, as non-uniformity in the histograms is evident; histograms with similar shapes have also been reported by others (e.g., Yagli et al., 2020b). In contrast, the PIT histograms for QR-post-processed forecasts appear fairly flat, which echoes the somewhat welltested advantages of nonparametric predictive distributions over the parametric or semiparametric ones (Mayer and Yang, 2022; Bakker et al., 2019), but again, oppositions to that view are not rare (Le Gal La Salle et al., 2020). Finally, we inspect the sharpness diagrams of various versions of forecasts, as shown in Fig. 10.16. As is the case of Fig. 10.14, outliers of the box plots are numerous and the positive skews are common. Both phenomena can be explained by the zenith-dependent variance of GHI, which leads to larger spreads in predictive distributions during solar noon than during sunrise or sunset. In terms of the median interval width (i.e., the middle bar in a box plot), aside from the raw ensemble, NGR1 and QR seem should be preferred over NGR2 and BMA. Moreover, as evidenced by the small inter-quartile range of interval width of BMA forecasts, one can deduce that the predictive distributions from BMA are similar in terms of spread, and thus are not resolved.

11 Irradiance-to-power Conversion with Physical Model Chain “First, it should be understood that forecasts possess no intrinsic value. They acquire value through their ability to influence the decisions made by users of the forecasts.” — Allan Murphy

The first step towards photovoltaic (PV) power forecasting consists in becoming conscious of the limitations of pairing statistical and machine-learning algorithms with power data alone, not because the algorithms themselves are deficient, but because the kind of problem that we are dealing with has a physical nature. It is for this reason that the preceding chapters of this book have been devoted primarily to the forecasting theory and practices for irradiance, which is the principal meteorological variable governing PV power generation. Indeed, while much more energy meteorology is known now than was known then, deriving PV power forecasts from irradiance forecasts has always been the default procedure, at least among those learned solar energy meteorologists and engineers, since the day solar forecasting became needed (Lorenz et al., 2009). The two-stage PV power forecasting procedure, first of irradiance forecasting, and second of irradiance-to-power conversion, resembles in strategy with wind power forecasting, but the difference is that the latter proceeds from wind speed forecasting (Sweeney et al., 2020). There are, of course, many who have hitherto been showing idealistic hopes and obstinate support for the single-stage extrapolative PV power forecasting procedure through statistical and machine learning on historical data (e.g., Qu et al., 2021; Wang et al., 2020a; Liu et al., 2019; Lin et al., 2018). A shared characteristic of studies of this sort is that the learning algorithms are either mathematically or procedurally sophisticated, typically taking up several pages to explain, as if the recency and complexity of the learning algorithms are able to reflect scientific superiority and novelty; in parallel, the adaptation to various salient characteristics of solar forecasting is only minimally described. While acknowledging that in the distant future and after a whole lot more data are collected, this single-stage extrapolative forecasting procedure might have a possibility of attaining a grade of quality comparable to that of the two-stage procedure, it still would not be exempted from the limitations on scalability, which constitutes the premise of forecasting for grid integration purposes. There are, in the main, two limitations pertaining to the scalability of the single-stage extrapolative solar forecasting procedures, one relates to data, and the other to technique. Data-wise, it is a universally admitted fact that statistical and DOI: 10.1201/9781003203971-11

439

440

Solar Irradiance and Photovoltaic Power Forecasting

machine-learning methods depend for their success upon the quality and amount of input data. A vast majority of data-driven solar forecasting methods that have been acclaimed triumphant require some very specific form of in situ data to operate, but it is hardly to be expected that the sort of data details will ever be available on a global or regional basis, or for newly constructed PV plants. Technique-wise, the plethora of model selection, training, tuning, and programming particularities of the statistical and machine-learning algorithms remains a pursuit of the learned, which will unlikely become the tools or habits of ordinary forecast practitioners, who are to be regarded as the eventual beneficiaries of any progress made in forecasting science. Indeed, excessive data requirements and technical complexity can cause those forecast practitioners (e.g., PV plant owners), who are in fact the ones who are tasked to produce forecasts and submit them to grid operators, to lose all zest and interest in participating in grid integration. These two limitations on scalability, alongside those current limitations on forecast quality, make the single-stage extrapolative PV power forecasting procedure far less attractive than the two-stage one, inasmuch as solar forecasting is intended to be of practical support to grid integration. As the need for increasing penetration of renewable energy becomes more and more desperate, under the pursuit of carbon neutrality by mid of this century, many countries have come to the realization that the two-stage forecasting procedure is an inescapable means to an end for maintaining stable and reliable grid operations under high penetration of renewables. For instance, soon after China’s leadership promulgated the directive of constructing a modern power system predominated by renewable energy, which is closely tied to China’s “3060” project—reaching carbon peak by 2030 and neutrality by 2060—the China Meteorological Administration (CMA) announced its 5-year (2021–2025) action plan for enhancing meteorological service capability for wind and solar resourcing and forecasting (Yang et al., 2022b). Just weeks after that, the collaboration agreement between CMA and the National Energy Administration (NEA) was signed. On the CMA’s part, the 5-year action plan touched upon many critically important aspects of forecasting, including augmentation of numerical weather prediction (NWP) models, formal forecast verification, and early warning and mitigation of extreme weather events. But central to these aspects is a gridded operational forecasting product, which is to be provided to provincial weather departments nationwide, who are then tasked to downscale and disseminate the forecasts to individual plant owners. On the NEA’s part, the two sets of grid codes “Specification of operation and control for photovoltaic power station” (GB/T 33599-2017) and “Technical requirements for dispatching side forecasting system of wind or photovoltaic power” (GB/T 40607-2021), in which it is specified how solar power forecasts should be submitted and penalized, are to be continually enforced, but with modifications and moderations in accord with the actual quality of the CMA product. This top-down–bottom-up flow of forecast information, therefore, illustrates solar forecasting for grid integration in a form that reflects its current ideal. From forecasts of meteorological variables issued by the national weather centers, PV power forecasts are obtained via a conversion, and by submitting these power forecasts to the grid operators the subsequent scheduling and dispatch strategies are devised. Because how weather forecasts are issued and how power forecasts

Irradiance-to-power Conversion with Physical Model Chain

441

are penalized are not within the control of individual plant owners, the accuracy and skill of irradiance-to-power conversion become dominating factors in determining the eventual value of the PV power forecasts, which plant owners seek to maximize. Throughout this section, the phrase “irradiance-to-power conversion” is to be used. However, one should be aware that such terminology simply suggests solar power curve modeling. The concept of the solar power curve has been previewed in Section 4.5.5. Terminology-wise, the mapping from wind speed (and other auxiliary variables) to wind power is known as a wind power curve, so analogously, the term “solar power curve” must be deemed appropriate for describing the process of irradiance-to-power conversion. There are in fact some authors (e.g., Nu˜no et al., 2018) who used this exact term in their papers. The word “curve” suggests as its literal meaning a line. However, it must be well understood by now that neither the mapping from wind speed to wind power nor solar irradiance to solar power is injective. Hence, when more dimensions are introduced as inputs, the term “power response surface” or “power hyperplane” seems more expressive. Inspecting the wind energy meteorology literature, wind engineers often view the obtaining of the wind power curve as a prediction problem (Wang et al., 2019c; Lee et al., 2015). But in the solar energy meteorology literature, this was not the case historically, so the term “solar power curve” never popularized. Be that as it may, this section examines the state-of-the-art irradiance-to-power conversion (i.e., solar power curve modeling), with a focus on the model chain, which takes the design and physical construct of solar energy systems into consideration. Besides the model chain, one may also use regression, or a hybrid of regression and model chain, to convert irradiance to power. These three classes of techniques are contrasted next.

11.1

AN OVERVIEW OF THE THREE CLASSES OF TECHNIQUES FOR IRRADIANCE-TO-POWER CONVERSION

Is a PV plant a mere system, of which, with known input, the output is governed wholly by the principles of physics, chemistry, and engineering? The principle governing the intensity of irradiance reaching an inclined surface is one of physics, governing the composition and accumulation of dust on panel surfaces is one of chemistry, and governing DC and AC voltage and current is one of engineering. The answer to that question, wherever it is understood, seems to be so, but there are still mechanisms that are yet to be completely understood. More importantly, despite that some principles are known in theory, they are scarcely applicable in practice due to the demanding information on which the working of the principles are based. Two schools of thought naturally emerge, which divide techniques for irradiance-topower conversion into two classes, first of regression, and second of physical model chain. Some authors seek a distinction between the two classes of techniques through the words “direct” and “indirect,” or through “statistical” and “physical” (e.g., Yang and van der Meer, 2021; Markovics and Mayer, 2022). One should note that even though the procedure of concern is referred to as irradiance-to-power conversion, it depends for its execution upon other information, such as geographical location, the design of the PV plant, and auxiliary

442

Solar Irradiance and Photovoltaic Power Forecasting

meteorological variables. The regression class of techniques creates via statistical or machine learning a mapping between all available input information and the power output, and therefore constitutes a single-stage (or direct) conversion. In contrast, model chains leverage multiple models, each outputting some intermediate quantities that are to be used as the inputs for the subsequent stage, until the PV plant’s AC power injection into the grid is eventually arrived at; conceptually, the cascade of component models resembles a chain, which constitutes a multiple-stage (or indirect) conversion. In terms of execution and implementation, as we shall closely examine below, the model chain approach evidently entails more energy meteorology knowledge than the regression approach. Hence, advancing from the premise that the increase in energy meteorology knowledge is accompanied by an increase in wisdom, one should logically expect a better performance from model chain. The difficulty, however, is that model chain requires design information and operating conditions of the PV plant, such as soiling, wiring diagram, row spacing, or choice of inverter, to be known with quantitative exactitude, in order to achieve its best possible accuracy. Yet, such detailed information is rarely available to researchers and grid operators, or even the plant owners themselves. Consequently, insofar as the present small amount of research can show, the regression approach is not necessarily worse than the model chain approach (Markovics and Mayer, 2022), which might present as somewhat discouraging to those who have had devoted significant amounts of time to investigating the physics, chemistry, and engineering principles of PV power generation. Notwithstanding, the course of scientific research has not been directed by mechanistic forces, but by a forward-looking purpose, in that, it is wise not to rush to a conclusion, but to become conscious that both the direct and indirect approaches are still open to much refinement. The clash between statistical and physical modeling of the PV plant gives an opportunity to a third class of techniques, which takes a hybrid form of both. Indeed, irradiance-to-power conversion has been the battleground between those who regard all phenomena as subject to physical methods, and those who hope that, among vital phenomena, there are some, at least, which demand “black-box” treatment of statistical and machine learning. The hybrid approach for its purpose must capture the advantages of both, by employing only the very mature part of the model chain that can be supported by the information available at hand, while leaving the remaining portion of the conversion to regression (Mayer, 2022). In this way, the hybrid approach lends itself to versatility, and should, in principle, predict PV power in a more favorable fashion than either of the two former classes alone. One caveat to the hybrid approach, which is also shared by the regression approach, is that it only applies to plants with long historical data, such that proper training and evaluation of the learning algorithms can be carried out. (Besides data length, there are other difficulties confronting the model training when the seasonal period is long, see the blog1 of Rob Hyndman for example.) For the regression approach, because PV power, just like irradiance, has a double-seasonal pattern, at least three years (or three complete cycles) of training data should be in place for any learning algorithms 1 https://robjhyndman.com/hyndsight/longseasonality/

Irradiance-to-power Conversion with Physical Model Chain

443

to detect the seasonal effects. The data-length requirement on the hybrid approach might be slightly relaxed, since the double-seasonal pattern can be handled by solar positioning, and clear-sky modeling, among other well-known physical models.

11.2

IRRADIANCE-TO-POWER CONVERSION BY REGRESSION

Numerous PV power forecasting papers have adopted the regression-based irradiance-to-power conversion, and the strategy existed before 2010 (Bacher et al., 2009; Huang et al., 2010). Among the various reported forecasting experiment setups, the most representative ought to be the one designed for the Global Energy Forecasting Competition 2014 (GEFCom2014; Hong et al., 2016). GEFCom2014 is definitely not the earliest to endorse the two-stage PV power forecasting involving first NWP and second regression of power, but it is likely the most influential to date. In the solar track of GEFCom2014, a total of 12 forecast fields, such as surface solar radiation downward, 2-m temperature, or total cloud cover, from the European Centre for Medium-range Weather Forecasts (ECMWF) were provided, alongside solar power generation profiles from three Australian sites. Solar power generation was to be forecast on a rolling basis in hourly resolution for the next day, based on the ECMWF forecasts at the same resolution over the same horizon issued every day at midnight. The contestants were tasked to perform variable selection and feature engineering on the provided weather variables, and to construct a regression of PV power on the final features using the training dataset, such that when new ECMWF forecasts were made progressively available during the competition, power forecasts could be estimated via the fitted regression. GEFCom2014 brought to light several strategies for enhancing the performance of irradiance-to-power conversion using regression, which, also in view of other evidence from the literature, are thought to be quite general. They are, in the order of importance, utilization of clear-sky information, feature selection and engineering, probabilistic and ensemble modeling, and other known general guidelines for regression applications in solar engineering, such as opting for nonparametric and/or tree-based methods. The main reason that the utilization of clear-sky information is ranked with highest importance is this: The winning team of the solar track of GEFCom2014 (Huang and Perry, 2016) is the only team that integrated clear-sky information into its forecasting process, which explains at large the substantial leading margin between forecast performance of the winning team and that of other teams, confirming the necessity of proper handling of seasonal components during PV power forecasting, which had has been recognized since at least Chowdhury and Rahman (1987). The second important is the utility of proper feature selection and engineering, which is evidenced by the fact that all top-five teams in GEFCom2014 used such a strategy in one form or another. Additionally, Markovics and Mayer (2022) have also highlighted through their extensive comparison of 24 machinelearning-based irradiance-to-power conversion models that the selection of the predictors might have an even higher effect than the model selection. On the third order of importance is probabilistic and ensemble modeling of PV power output, which serves as an uncertainty quantification tool. As the competition requested the PV

444

Solar Irradiance and Photovoltaic Power Forecasting

power forecasts to be submitted in the form of quantiles, all top-five teams chose nonparametric approaches, among which variants of quantile regression and gradient boosting were most popular. These strategies of different orders of importance are elaborated further in the next three subsections. 11.2.1

INTEGRATING CLEAR-SKY INFORMATION DURING PV POWER FORECASTING

In a purely regressive setting, clear-sky information, to be included as a predictor for PV power, can resume representation in either irradiance terms or power terms. Sections 4.5.1 and 5.4 have introduced three popular clear-sky irradiance models that appeal to solar forecasting applications, namely, the Ineichen–Perez model (Ineichen and Perez, 2002), the McClear model (Lef`evre et al., 2013; Gschwind et al., 2019), and the REST2 model (Gueymard, 2008). The Ineichen–Perez model, which only relies on calculable and climatology inputs, does not require forecasts of any other atmospheric variable, and thus can be computed over any future time period. What this implies is that, for any time period, the ex ante clear-sky irradiance and the ex post clear-sky irradiance calculated using the Ineichen–Perez model are identical. As for the McClear model, its output is provided as a web service but with a 2-day delay in its dissemination. In order to access the forecast version of McClear, one has to know the “secret door,” which is another web service called “HelioClim-3 Real Time and Forecast” from the SoDa website,2 and the clear-sky column in the data corresponds to forecast McClear clear-sky irradiance out to two days (pers. comm. with Philippe Blanc, MINES ParisTech, 2022). In contrast, the highest-performance REST2 model demands nine inputs to operate, namely, extraterrestrial beam normal irradiance (E0n ), zenith angle (Z), ground ˚ albedo (ρg ), surface pressure (p), aerosol optical depth at 550 nm (τ550 ), Angstr¨ om exponent (α), total column ozone (uO3 ), total nitrogen dioxide amount (uNO2 ), and total precipitable water vapor (uH2 O ). In that, if REST2 is to be employed during operational solar forecasting, forecasters need to first obtain forecasts of the input atmospheric variables. Because the ability to forecast some of these input atmospheric variables is limited, one should expect some difference between the ex ante and the ex post REST2 clear-sky irradiance.3 (At the time of writing, the literature is short of a dedicated study on this topic.) Among the nine inputs to REST2, E0n and Z can be calculated via solar positioning, whereas ρg , p, and uH2 O are common output fields of NWP models. Information related to aerosol (i.e., τ550 and α) and other chemical species (e.g., uO3 and uNO2 ), however, is usually not available in regular NWP models, but in atmospheric composition models. Indeed, neither ECMWF’s High Resolution (HRES) model nor the National Centers for Environmental Prediction’s 2 https://www.soda-pro.com/web-services#radiation 3 The

urrent recommendation for using REST2 is to power it with reanalysis data, such as the ModernEra Retrospective Analysis for Research and Applications, version 2 (MERRA-2) or Copernicus Atmosphere Monitoring Service (CAMS) (Fu et al., 2022). Since both MERRA-2 and CAMS are retrospective weather models with an accuracy that is not dramatically different from the operational ones, one can expect whatever has been formerly concluded based on REST2 powered by MERRA-2 or CAMS to be transferable to circumstances with REST2 powered by operational forecasts.

Irradiance-to-power Conversion with Physical Model Chain

445

(NCEP’s) North American Mesoscale (NAM) model offers forecasts of aerosol and other chemical species. On this point, using REST2 operationally means that the forecaster has to solicit forecasts of different input variables from more than one data source. One possible source of acquiring aerosol, ozone, and nitrogen dioxide forecasts is the CAMS Global Atmospheric Composition Forecasts from ECMWF, which produces forecasts twice daily at an approximately 40-km spatial grid on 137 vertical levels, and has been operational since July 2015. Besides acquiring clear-sky irradiance, to be used as a predictor for regressionbased PV power forecasting models, one may opt to get the clear-sky PV power output directly, which can be achieved in three ways. The most intuitive way is to pass, instead of the all-sky irradiance, the clear-sky irradiance and other auxiliary variables through a model chain (e.g., Engerer and Mills, 2014). This approach, however, should no longer be viewed as purely regressive, owing to the involvement of model chain. Hence, its mention should be deferred to Section 11.4, which deals with hybrid irradiance-to-power conversion methods. The second approach starts by identifying the clear-sky situation in the PV power time series, and constructs thence a separate regression for clear-sky situations only, which can be subsequently used to estimate the clear-sky power output expectation for arbitrary instances. Arguably, the identification of clear-sky situations in terms of PV power is not as straightforward a task as one may perceive—the reader should refer to Peratikou and Charalambides (2022) for an example of such methods. The last and simplest method is to apply time series decomposition methods on the PV power time series, so as to obtain the seasonal components of the time series by statistical means. This was in fact the method of Huang and Perry (2016), which had led to the team’s success in GEFCom2014. More particularly, the seasonality was approximated by a Fourier series with lowpass frequency components determined from the data. Although the Fourier-based and other time series decomposition methods can be traced to early 2010s (Yang et al., 2012; Dong et al., 2013), they are known inferior to proper clear-sky models. The reason why Huang and Perry (2016) chose the Fourier-based method is that the site locations were not revealed during the competition, so solar positioning was not possible. However, in hindsight, the choice must be deemed fruitful and hence offers some penetrating insights into circumstances of this sort—the design parameters of PV plants are not always available, or may be too inhomogeneous for model chain to be effective (e.g., sites over complex terrain or mixed use of panels and inverters). 11.2.2

FEATURE SELECTION AND ENGINEERING

The next important consideration in regression-based irradiance-to-power conversion is feature selection and engineering. The weather system is complex in the sense that everything is related to everything. That said, although a typical NWP model outputs hundreds of variables, one can never expect all of these variables to be statistically significant predictors for PV power. Inspecting the literature, statistical methods for feature selection and dimension reduction have been widely applied to weather variables, for identifying the most relevant ones that can contribute to the explanatory power of a regression (e.g., Persson et al., 2017; Nagy et al., 2016; Juban

446

Solar Irradiance and Photovoltaic Power Forecasting

et al., 2016). This strategy is instinctive and therefore trivial to anyone with an introductory level of understanding of data science, and it has gained major acceptance among solar forecasters since the birth of the subject. Nonetheless, a comparatively small amount of effort has been paid to feature engineering, especially as to how meteorological knowledge can be best integrated. Indeed, the feature engineering approaches found in the literature have been largely limited to those general-purpose ones, such as inclusion of lagged version of predictors (e.g., Persson et al., 2017), statistical aggregation and smoothing of predictors (e.g., Pedro et al., 2019), or automatic feature generation and extraction via machine learning (e.g., Acikgoz, 2022). Since there are infinite ways of doing feature selection and engineering, it has hitherto been impossible to conclude from reviewing the literature whether one method is more effective than another, especially when the test datasets presented in different works are very rarely identical. Thus, instead of conducting any further the “who did what” kind of roster review, which has been done numerous times with high repetitiveness (e.g., Ahmed et al., 2020; Sobri et al., 2018; Voyant et al., 2017), this section should present in the most concise manner those features that are absolutely essential for PV power forecasting. The rationale for selecting each below-mentioned feature is as follows: • Irradiance: Whereas it must be universally admitted that global horizontal irradiance (GHI) is the utmost decisive factor, other shortwave and longwave irradiance components have also been argued to be more useful than not in explaining the power generation of PV. In particular, the proportions according to which the diffuse and beam components are split from GHI have been known to be related to the amount of global tilted irradiance (GTI) received by an inclined surface, since at least the 1960s (Kamphuis et al., 2020). On the other hand, research has shown that longwave radiation has a profound impact on the energy budget and temperature dynamics of PV (Yang et al., 2021b; Heusinger et al., 2020; Barry et al., 2020). Last but not least, extraterrestrial GHI, which is also output by most NWP models, can replace clear-sky irradiance as the multiplicative seasonal component, although the ratio of GHI and extraterrestrial GHI is called clearness index, of which the zenith dependency is stronger than that of clear-sky index. • Temperature: Effect on PV power output due to temperature, often quantified through the temperature coefficient, is well recognized. An increase in temperature reduces the bandgap of a semiconductor, which corresponds to an increase in the energy of the electrons in the material. Solar cells under higher temperatures have a slightly elevated short-circuit current but a much lower open-circuit voltage, which translates to an overall 2–4.5%/◦ C decrease in cell efficiency. This temperature coefficient is with respect to cell temperature, which can be estimated through module temperature, which can further be related to the ambient temperature (King et al., 2004). • Wind: Near-surface wind speed and to a lesser extent wind direction, which are commonly output at a height of 10 m by NWP models, have a noticeable

Irradiance-to-power Conversion with Physical Model Chain

447

effect on the module temperature. Owing to the aforementioned negative correlation between module temperature and PV power output, the cooling effect of wind speed is but secondary in effecting PV power. Heat transfer through convection from PV panels by wind has been studied fairly thoroughly (e.g., Ceylan et al., 2019; Wu et al., 2017), and in-depth scientific details, such as how 10-m wind translates to rear-side wind or how the convection Nusselt number varies with tilt angle, wind direction and velocity, have been understood to a great extent. That said, to what degree the effect on PV power due to the intricate heat transfer to fluids flowing across a flat plate can be captured by regression models remains largely mysterious. Regardless, the inclusion of wind speed, at least, is thought necessary, if such an effect is to be explained in part. • Albedo: As PV panels are almost always installed with an inclination, the portion of GTI due to ground reflection and backscattering is not to be neglected at any rate, and albedo determines the fraction of global upwelling and downwelling irradiance (see Gueymard et al., 2019, for a review on albedo in the context of solar applications). Whereas ground reflection is a process that can be easily conceptualized, strong backscattering can also affect the incident diffuse radiation—the phenomenon is known as albedo enhancement (Gueymard, 2017a,b). During irradiance-to-power conversion, the broadband albedo has hitherto been used, since it can be acquired via remote sensing for worldwide locations. The more recent wave of investigations on the power production of bifacial PV, which is more sensitive to albedo, has further spawned interest in examining spectral albedo effects (Riedel-Lyngskær et al., 2022). • Cloud cover: Almost all clear-sky index variability can be attributed to cloud modulation, and cloud cover either in Okta scale or as a percentage was early presented as a useful predictor in regression-based forecasting (Yang et al., 2015a, 2012). In both the GEFCom2014 dataset (Hong et al., 2016) and the ECMWF HRES dataset (Yang et al., 2022c), which are two open-access datasets published to facilitate NWP-based solar forecasting research, the data curators included cloud cover. What is different, however, is that the former listed total cloud cover whereas the latter gave low cloud cover. In most NWP systems, the total cloud cover is cloud layers integrated from the top of the atmosphere down to the surface using assumptions about the overlap between the subgrid clouds in the vertical. In that, the total cloud cover is smaller or equal to the sum of low, medium, and high cloud cover. Nonetheless, since high and medium clouds have marginal effects on surface irradiance, e.g., thin cirrus clouds are highly transparent to shortwave radiation, the total cloud cover will inevitably exaggerate the cloud modulation. Hence, using just the low cloud cover may be deemed more appropriate. One should bear in mind that forecasting clouds is a main scientific challenge for modern NWP (Bauer et al., 2015), therefore the inaccuracy in cloud forecasts would render cloud cover as a less effective predictor than it is often supposed to be.

448

Solar Irradiance and Photovoltaic Power Forecasting

As for feature engineering, energy meteorology knowledge should play a part. Let us suppose the response variable is PV power, it is then obvious that cloud cover information should be transformed into irradiance or power terms, by multiplying it with the clear-sky information. According to the Sandia Array Performance Model (SAPM; King et al., 2004), the rare-side module temperature relates to wind speed through scaled exponents. Hence, linearly scaling the wind speed and then taking the exponential is likely to be more effective than letting the regression figure out such a relationship on its own. One may notice from these two simple examples the fact that the kind of feature engineering which we discuss here may be viewed as hybrid irradiance-to-power conversion, since it integrates regression with those components of a model chain. Section 11.4 elaborates such possibilities further. Another highly effective feature engineering tactic is to consider spatial information. Although spatio-temporal information is already embedded in irradiance forecasts from physics-based methods, it has been reported that using forecasts from pixels or lattice points neighboring to the focal location is able to drive the forecast accuracy even higher (Yagli et al., 2022; Pedro et al., 2019; Mazorra Aguiar et al., 2016). 11.2.3

OTHER CONSIDERATIONS FOR REGRESSION-BASED IRRADIANCETO-POWER CONVERSION

Insofar as the regression methodology is concerned, conversion from irradiance (and other auxiliary variables) to PV power is conceptually identical to post-processing; it is just that the response variable, instead of measured irradiance, as in the case of forecast post-processing, is now PV power. In this regard, techniques and tricks discussed in Chapter 8 apply conceptually in every respect. Following the typology of forecast post-processing outlined in Chapter 8, irradiance-to-power conversion can also be separated according to deterministic-to-deterministic (D2D), probabilisticto-deterministic (P2D), deterministic-to-probabilistic (D2P), and probabilistic-toprobabilistic (P2P) directions of conversion. As is the case of D2D post-processing of forecasts, D2D irradiance-to-power conversion constitutes the most fundamental approach in the current literature. The early demonstrations of D2D irradiance-to-power conversion sought to modify the extrapolative time series methods, such as the autoregressive model or neural network, with NWP forecast as exogenous inputs (Bacher et al., 2009; Huang et al., 2010; Kardakos et al., 2013). With the lapse of time, solar forecasters have pushed the boundary of knowledge in several directions, in that, the models are now more numerous, procedures more tortuous, and comparisons more thorough. For instance, Visser et al. (2022) compared 11 regression-based irradiance-to-power conversion methods to a model chain, on data from 152 PV systems in the Netherlands, in a dayahead market (DAM) setting. Using 17 weather variables from the ECMWF HRES and site-related information, it was concluded that ensemble learning and deep learning are more advantageous conversion methods than the variants of linear regression and support vector regression. It is interesting to highlight that although ensemble learning and deep learning methods were able to outperform model chain in terms of mean absolute error (MAE), from an economic perspective, which considers both the

Irradiance-to-power Conversion with Physical Model Chain

449

initial revenues made on the DAM and the net imbalance costs due to the observed forecast error, model chain was found to be superior. P2D irradiance-to-power conversion requires slightly more consideration than P2D post-processing in the sense that two alternatives are present. One can either first post-process an ensemble irradiance forecast to a deterministic irradiance forecast and convert that into PV power, or first convert an ensemble irradiance forecast to an ensemble power forecast and then summarize that to a deterministic one. Regardless, the preference for ensemble modeling (or forecast combination) over leveraging just one model has been justified (Yang and Dong, 2018; Wolff et al., 2016). To create an ensemble of PV power forecast, the strategies for statistical ensemble forecasting as introduced in Section 3.3.2 apply, in that, one may choose data ensemble (i.e., by using forecasts from different NWPs), model ensemble (i.e., by using different regression models), parameter ensemble (i.e., by creating multilayer perceptrons with different numbers of hidden neurons and layers), or a combination of them. One example of P2D irradiance-to-power conversion with ensemble modeling is the study conducted by Pierro et al. (2016), who considered ECMWF and an original and a post-processed version of the Weather Research and Forecasting model (WRF) as three sources of NWP inputs, which, when paired with four regression models, resulted in many combinations. It was concluded that the discrimination in forecast performance of the component models is due mainly to the choice of NWP input, whereas different regression models with the same NWP input produce very similar results. Similarly, for D2P irradiance-to-power conversion, the forecaster also faces the choice of whether the D2P irradiance forecast post-processing should take place before or after the power conversion. But again, this issue is thought to be minor. The setup of the GEFCom2014 solar track is one of D2P form, for the contestants were tasked to convert the deterministic NWP forecasts into quantiles (Hong et al., 2016). As introduced in Section 8.5.3, D2P regression can be either parametric or nonparametric, and the latter is often more suited for solar forecasting, as also evidenced by the outcome of the competition. Besides probabilistic regression, two other forms of D2P irradiance-to-power conversion exist—their mentioning is organized in this section because they are more statistical than physical—namely, analog ensemble (AnEn) and the method of dressing. Conceptually, D2P irradiance-to-power conversion using AnEn and the method of dressing is no different from D2P post-processing of irradiance forecasts. In AnEn, current NWP forecasts are compared to the historical ones, and historical PV power corresponding to the best-match analogs is used as the forecast; in the method of dressing, errors of historical PV power forecasts are dressed onto the current forecast. The reader is referred to Pierro et al. (2022) for a case study using AnEn-based irradiance-to-power conversion, but the literature seems to lack an example of the method of dressing at the time of writing. The last category of irradiance-to-power conversion is P2P. The conversion of this category of methods requires the input to be ensemble NWP forecasts, which can be produced either by running the same NWP model with perturbed initial conditions, or by assembling forecasts from several deterministic NWP models. Doubleday et al. (2021) presented the first PV power forecasting application using

450

Solar Irradiance and Photovoltaic Power Forecasting

Bayesian model averaging (BMA). Forecasts from a poor man’s ensemble with four NWP models were individually converted to PV power using a method similar to the regression used by Ayompe et al. (2010). With those ensemble PV power forecasts, each member is dressed with a two-part density function, which is a discrete– continuous mixture, as intended to explicitly model the effect of inverter clipping, which refers to the trimming of power output when the maximum capacity of the inverter is reached. The method has been thoroughly compared to and showed superiority over the ensemble model output statistics (EMOS), which is another P2P method. One drawback of this method may be the lack of comparison to nonparametric approaches. But so far as the application of BMA in solar forecasting is concerned, the case study put forth by those authors ought to be regarded as quite complete, and therefore deserves applause.

11.3

IRRADIANCE-TO-POWER CONVERSION WITH PHYSICAL MODEL CHAINS

Irradiance-to-power conversion was initially perceived as more of a calculation rather than a prediction. Stated differently, if the plant information and design parameters are known to a high exactitude, the calculation of the PV system’s energy yield can be accomplished with exceptional quantitative precision. This belief predates the advent of modern solar forecasting, because PV system design, monitoring, and performance evaluation, among other resource assessment tasks, which also require irradiance-to-power conversion, were already quite advanced at the dawn of the 21st century (Drews et al., 2008, 2007). In the early work of Beyer et al. (2004), when hourly modeled AC power and measured AC power at two kW-scale PV systems were arranged in the form of scatter plots, the points were so intimately packed around the identity line, such that the irradiance-to-power conversion error seemed nothing worthy of concern as compared to the size of forecast error. In the industry, one of the earliest commercial PV simulation software tools can be dated back to 1992, when the founder of the PVsyst software started to develop tools for threedimensional (3D) shading constructions, the simulation of stand-alone PV systems, and pumping PV systems (Mermoud, 1994). By the early 2010s, PVsyst version 6 was released, which was already very powerful in its functionalities and thus had gained wide acceptance by not only the PV industry but also the banks. Since both the approach of Beyer et al. (2004) and PVsyst employ a series of models in cascade, the irradiance-to-power conversion is better known as the model chain, and the later popularization of the term may be attributed to the much-celebrated python package pvlib (Holmgren et al., 2018). Figure 11.1 depicts the anatomy of a model chain. Starting from the time and location of interest, the first step of any model chain should perpetually be solar positioning, which computes zenith angle, incidence angle, extraterrestrial GHI, and apparent solar time (AST), among other calculable parameters. Subsequently, the solar positioning outcome is passed to separation and transposition models, alongside GHI and albedo information, to estimate the amount of irradiance reaching the inclined surface of concern. After accounting for the reflection losses at the panel

Time and location

Model chain:

GHI

Albedo

Temperature

Wind speed

Solar positioning Separation

Cell temperature

Shading losses

Transposition Soiling losses

Reflection DC cable losses

PV model Inverter losses AC cable losses

Irradiance-to-power Conversion with Physical Model Chain

Input:

Transformer Output:

PV power output

Figure 11.1 Schematic of irradiance-to-power conversion via model chain. A model chain takes global horizontal irradiance as the main input and outputs PV power. Arrow(s) going into a block indicates the required input, whereas arrow(s) leaving the block indicates the output. 451

452

Solar Irradiance and Photovoltaic Power Forecasting

surface (due to the layers of encapsulation of the solar panels), the effective GTI is converted to DC power via a PV model, in which the temperature effects are also accounted for. Then, by considering various loss mechanisms that take place in an actual environment, such as shading, soiling, wiring, and inverter clipping, the “lossless” version of the DC power is converted to the lossy AC power injection into the grid, which concludes the model chain. One should note, however, that not all parts of a model chain are of equal importance, since some of the stages may sometimes be conveniently approximated by a percentage reduction without sacrificing too much accuracy. In each of the following subsections, the major stages of model chain are elaborated on a minimum “need-to-know” basis—this is because some topics such as separation and transposition modeling have been studied for more than half a century, and one can hardly summarize all knowledge within a few pages. 11.3.1

SOLAR POSITIONING

One may describe the positioning of the sun relative to a ground observer standing on a horizontal surface with two angles, namely, the solar zenith angle (Z) and azimuth angle (φS ). In a usual solar engineering textbook (e.g., Vignola et al., 2020; Masters, 2013), formulas for computing these angles are given as follows: sin α = cos L cos δ cos H + sin L sin δ , sin φS =

− cos δ sin H , cos α

(11.1) (11.2)

where α = π/2 − Z is the elevation angle, L is the latitude of the location at which solar positioning is conducted, δ is the solar declination, and H is the hour angle. The journal article reference of these formulas is Michalsky (1988). One should note that some computation issues may arise owing to the involvement of trigonometry. The azimuth angle in Eq. (11.2) has a range of 0 ≤ φS < 360 with a zero-north–eastpositive convention. (In many solar energy applications, the calculation takes a zerosouth–east-positive–west-negative convention.) In either case, because the inverse of sine is ambiguous, in that, sin φS = sin(π − φS ), it is necessary to invoke some test to determine the correct solution (see pg. 17 of Vignola et al., 2020, for example). Besides the solar zenith and azimuth angles, a third angle results from solar positioning, which is the incidence angle (θ ). Geometrically, the incidence angle is the angle between the sun and the normal of the inclined surface. The textbook formula of θ is (Vignola et al., 2020; Masters, 2013): cos θ = cos S cos Z + sin S sin Z cos(φS − φC ),

(11.3)

where S and φC are the tilt and azimuth angles of the inclined collector surface; φC again follows the zero-north–east-positive convention. Unlike the case of Eq. (11.2), no ambiguity results from the trigonometry in Eq. (11.3). The above solar-positioning equations are but approximations, and there are more accurate algorithms available (e.g., Blanc and Wald, 2012; Grena, 2012; Reda and Andreas, 2004). The main differences between these alternative algorithms are in

Irradiance-to-power Conversion with Physical Model Chain

453

terms of accuracy and computational complexity, which have been compared by Hoadley (2021). All evidence from Table 1 of Hoadley (2021) points to the fact that the accuracy of positioning positively corresponds to computation complexity. For instance, the algorithm of Michalsky (1988) has a quoted error of 0.01◦ and requires 530 steps; the SG2 algorithm of Blanc and Wald (2012) 0.00078◦ and 1305 steps; and the solar position algorithm (SPA) of Reda and Andreas (2008) 0.0003◦ and 13,623 steps. Since SPA has become the standard high-accuracy implementation used by the solar community (Hoadley, 2021), it should be used in solar forecasting applications whenever possible. Moreover, because the solar positioning only needs to be performed in a rolling manner during operational forecasting, computational complexity is never a concern. SPA is available from both the pvlib library of Python (Holmgren et al., 2018) and the insol package of R (Corripio, 2021). Owing to potentially different conventions used by different software packages, a sanity check after solar positioning should always be welcome. 11.3.2

SEPARATION MODELING

Separation modeling, alongside the estimation of the solar constant, spectral irradiance modeling, modeling of the angular distribution of radiance, clear-sky modeling, transposition modeling, and site-adaptation of solar irradiance, are jointly known as radiation modeling, which accounts for about fifty percent of the scope of solar energy meteorology, with the other half being forecasting itself (Yang et al., 2022d). Radiation modeling revolves around irradiance components and k-indexes, which have been introduced in Section 5.2.1, and the main ones relevant to separation modeling are GHI (Gh ), diffuse horizontal irradiance (DHI, Dh ), clearness index (kt ), and diffuse fraction (k). As the main aim of separation modeling is to split, or to decompose, GHI into the beam and diffuse components, it also goes by the name decomposition modeling. In some cases the beam component is output by the NWP model, e.g., ECMWF offers both high-resolution and ensemble forecasts of beam horizontal irradiance (BHI, Bh ), and one can skip the separation modeling part of the model chain, since with Gh and Bh forecasts, Dh forecasts can be obtained from the three-component closure relationship. In other circumstances where the beam component is not output by the NWP model or is not made available to the forecaster, one has to estimate it via a separation model, as is the case of how beam and diffuse components in satellitederived irradiance databases are usually produced (Yang, 2022b). Owing to the zenith dependence of surface irradiance, which, as a general rule-ofthumb, is preferred to be set aside as far as possible during radiation modeling, separation modeling predicts k using kt , which are both normalized quantities. Notwithstanding, since the kt –k mapping is again non-injective, in that, a single kt value may correspond to a range of k values, additional input variables need to be introduced to explain the spread of the scatters. To graphically demonstrate the non-injective nature of the kt –k relationship, Fig. 11.2 plots as the gray background the measured kt –k scatter, using four years of 1-min data from Carpentras, France. If kt is used as the sole input and fed into a logistic function, k can only be predicted as those values

454

Solar Irradiance and Photovoltaic Power Forecasting

Logistic

BRL

Starke1

Diffuse fraction, k

1.00 0.75 0.50 0.25 0.00 0.0

0.3

0.6

0.9

0.0

0.3 0.6 0.9 Clearness index, k t

0.0

0.3

0.6

0.9

Figure 11.2 One-minute diffuse fraction prediction using the logistic function, BRL, and

S TARKE 1 models, using data from Carpentras (44.083◦ N, 5.059◦ E), France, over 2015– 2018. Measurements are shown as a gray background, and predictions are shown as black scatters.

along the logistic curve, which cannot explain most data points in the gray background, and the error can be expected to be substantial. When auxiliary variables, such as zenith angle, AST, or clear-sky index, are added to the model, as in the cases of the BRL model (Ridley et al., 2010) and the S TARKE 1 model (Starke et al., 2018) shown in Fig. 11.2, the response is no longer a line but a hyperplane, which, after projection onto the 2D plot for visualization purpose, is able to cover a larger portion of the gray background. As the largest class of radiation models, there have been more than 150 separation models proposed in the literature thus far. Because the performance of models is heterogeneous across geographical locations and time periods, heated debates in regard to model predictive performance often take place whenever a new model is proposed. Fortunately, the perennial controversy was settled at large by the seminal comparative study conducted by Gueymard and Ruiz-Arias (2016), who compared 140 separation models with data from 54 research-class radiometry stations, distributed across all seven continents and on islands of all four oceans. The scope of validation was, at that time, unprecedented, in terms of both the number of data points and the models compared. It was concluded that the E NGERER 2 model (Engerer, 2015) obtained the best overall results and thus was deemed quasi-universal. Ever since, newly proposed models have been compelled to benchmark their performance against that of E NGERER 2, in order to convincingly support the acclaimed superiority. Years have passed, and owing to the popularity of the topic, the literature has again accumulated many more separation models, which once more make the comparative analysis difficult and reconciliation urgently needed. To that end, Yang (2022b) published another review, specifically dealing with separation models proposed after 2016. In that review, more than 80 million valid 1-min data points from 126 research-class radiometry stations worldwide were utilized, and it was found that

Irradiance-to-power Conversion with Physical Model Chain

455

the YANG 4 model (Yang, 2021a) has the highest overall performance among those models compared, and thus should replace E NGERER 2 as the new quasi-universal model. To understand the working principle of E NGERER 2 and YANG 4, a short overview of separation modeling is presented over the following pages. 11.3.2.1

A brief review of some major separation models

Many early separation models take kt as the sole predictor. Typifying such models is the E RBS model (Erbs et al., 1982), which models k as a piecewise function of kt : ⎧ ⎪ kt ≤ 0.22; ⎨1.0 − 0.09kt , E RBS k = 0.9511 − 0.1604kt + 4.388kt2 − 16.638kt3 + 12.336kt4 , 0.22 < kt ≤ 0.80; ⎪ ⎩ 0.165, kt > 0.80. (11.4) The coefficients were obtained from experimental data, which, at that time, were hourly in resolution and few in count. Regardless, owing to the aforementioned need for expanding the dimensionality during separation modeling, numerous models with multiple predictors have been proposed. In developing the theories for these models, exceedingly few have considered atmospheric physics, namely, Hollands and Crha (1987); Hollands (1985), but all others have submitted to empirical approaches for their simplicity. Among those empirical approaches, the BRL model4 (Ridley et al., 2010) has gained perhaps the most traction, and it takes the form: k BRL =

1

, (11.5) 1+e where β0 = −5.38, β1 = 6.63, β2 = 0.006, β3 = −0.007, β4 = 1.75, and β5 = 1.31 are the fitted values of model coefficients, α = 90◦ − Z is the solar elevation angle in degrees, kt,daily is the daily average of kt , and ψ is a variability index, which is computed by smoothing successive kt values, or formally the three-point moving average of kt . The coefficients of the BRL model were originally fitted using hourly data, and thus the model can no longer meet the requisite of high-resolution solar applications today. Notwithstanding, its modeling strategy—using a logistic function to resemble the shape of the kt –k scatter—has strongly impacted the evolution of separation modeling. Indeed, many succeeding models (e.g., Starke et al., 2021, 2018; Every et al., 2020), including E NGERER 2 and YANG 4, are based upon this function form. The success of E NGERER 2 can be attributed mainly to its explicit modeling of cloud enhancement, which is a phenomenon observable only in high-resolution irradiance. Recall that cloud enhancement occurs when the sun is completely exposed to the observer while the clouds in the vicinity of the sun contribute additional diffuse radiation, such that the cloud-enhanced irradiance is higher than the irradiance under clear skies. Engerer (2015) took advantage of that and proposed a new input:   Ghc kde = max 0, 1 − , (11.6) Gh 4 The

β0 +β1 kt +β2 AST+β3 α+β4 kt,daily +β5 ψ

naming of this model follows the initials of its inventors’ last names.

456

Solar Irradiance and Photovoltaic Power Forecasting

which accounts for the excess amount of diffuse irradiance due to cloud enhancement. The criterion for detecting cloud enhancement is based on GHI. If GHI is below the clear-sky GHI (Ghc ), i.e., Ghc ≥ Gh , kde = 0, and the occurrence of cloud enhancement is not assumed, whereas when Gh > Ghc , kde > 0, cloud enhancement is assumed to have occurred. Besides kde , another input variable introduced for E N GERER 2 is Δktc , which is the difference between clearness index of Ghc and clearness index of Gh , i.e., Ghc Gh − , (11.7) Δktc = ktc − kt = E0 E0 where E0 is the extraterrestrial GHI. The reader is referred to Engerer (2015) for the motivation of introducing Δktc . Putting these together, E NGERER 2 takes the form: k E NGERER 2 = C +

1 −C 1 + eβ0 +β1 kt +β2 AST+β3 Z+β4 Δktc

+ β5 kde ,

(11.8)

with C = 0.042336, β0 = −3.7912, β1 = 7.5479, β2 = −0.010036, β3 = 0.003148, β4 = −5.3146, and β5 = 1.7073 being the parameters fitted using 1-min data collected at six Bureau of Meteorology stations in southeastern Australia. At a later time, the coefficients of E NGERER 2 were refitted using global data at different temporal resolutions (Bright and Engerer, 2019). However, it has been reported that, due to the overwhelming diversity of the fitting samples, the new E NGERER 4 model suffers from under-fitting, which results in an inferior performance compared to the original model (Yang, 2022b, 2021a). After E NGERER 2 outperformed all of its competitors in the 2016 separation modeling “contest,” the subsequent advances in separation modeling took off in different directions. The lowest-hanging fruit is model refitting, of which the procedure is exemplified by Bright and Engerer (2019). Since models fitted with local data usually result in better performance locally, one may opt to refit separation models using local data, which is a general strategy for accuracy improvements but with the loss of generalizability as the trade-off. The second direction of advance is piecewise modeling, which has already been realized as useful in the 1970s (Orgill and Hollands, 1977), except that now the piecewise functions often resume more intricate mathematical forms, as typified by the work of Starke et al. (2018). Next, Yang and Boland (2019) explored the possibility of including satellite-derived irradiance during separation modeling, and it seems that the strategy is truly effective, since the YANG 2 model proposed thereof shows dominating performance against E NGERER 2 and other recently proposed models. Another general strategy for accuracy improvements is engaging condition-specific model coefficients. In the work of Engerer (2015), the E NGERER 3 model was in fact fitted with only clear-sky data, in that, the conditioning is one of clear versus cloudy sky. Alternatively, Starke et al. (2021); Every et al. (2020); Abreu et al. (2019) considered the conditioning on the K¨oppen–Geiger climate classes, that is, a separate set of model coefficients is fitted for each climate class. Whereas there are a few other displays of innovation (see Yang, 2022b, for a review), we should mention just one more—temporal resolution cascade—which has led to YANG 4, which is the current-best 1-min separation model.

Irradiance-to-power Conversion with Physical Model Chain

457

YANG 4 is based upon YANG 2, which is based upon E NGERER 2. While retaining all input variables of E NGERER 2, YANG 2 additionally uses as input the half-hourly or hourly satellite-derived diffuse fraction, ksatellite , which can be thought of as a low-frequency estimate of the diffuse fraction. YANG 2 is given as: k YANG 2 = C +

1 −C satellite 1 + eβ0 +β1 kt +β2 AST+β3 Z+β4 Δktc +β6 k

+ β5 kde ,

(11.9)

where C = 0.0361, β0 = −0.5744, β1 = 4.3184, β2 = −0.0011, β3 = 0.0004, β4 = −4.7952, β5 = 1.4414, and β6 = −2.8396. The problem of YANG 2 is that the acquisition of ksatellite translates to additional effort, especially when satellitederived irradiance databases use somewhat different conventions for data dissemination schemes; also, there is usually a delay in the production of satellite-derived irradiance, which voids the real-time applications of YANG 2. To remedy this problem, Yang (2021a) proposed replacing ksatellite with other forms of low-frequency estimate of diffuse fraction, for instance, the hourly diffuse fraction estimate by apE NGERER 2 , YANG 4 is written plying E NGERER 2 on hourly data. Denoting that as khourly as: k YANG 4 = C +

1 −C E NGERER 2 β0 +β1 kt +β2 AST+β3 Z+β4 Δktc +β6 khourly

1+e

+ β5 kde ,

(11.10)

in which all coefficients follow those of YANG 2. YANG 4 applies sequentially two separation models of low and high temporal resolutions, which has led to the coining of the “temporal resolution cascade.” Temporal resolution cascade is a general strategy, which is not restricted to pairing YANG 4 and E NGERER 2. However, the strategy itself still welcomes further extension. For instance, one may incorporate conditioning during model coefficient fitting. 11.3.2.2

Comparison of recent separation models

This section compares the performance of some recent separation models. In particular, 10 models are considered: (1) E NGERER 2 (Engerer, 2015), (2) E NGERER 4 (Bright and Engerer, 2019), (3) S TARKE 1 (Starke et al., 2018), (4) S TARKE 2 (Starke et al., 2018), (5) S TARKE 3 (Starke et al., 2021), (6) A BREU (Abreu et al., 2019), (7) PAULESCU (Paulescu and Blaga, 2019), (8) E VERY 1 (Every et al., 2020), (9) E VERY 2 (Every et al., 2020), and (10) YANG 4 (Yang, 2021a), which, besides E N GERER 2 itself, were all proposed after the Gueymard and Ruiz-Arias (2016) review. Among these models, E NGERER 2 and YANG 4 have been expressed in their full form earlier. As for the other models, their explicit forms are thought to be less critical for the current purpose of discussion, and therefore are omitted, but the reader can find full details in Yang (2022b). The experimental datasets come from Yang (2022b), which contain 1-min measurements of GHI, DHI, and BNI from 126 radiometry stations worldwide. For each station, all available data over the period 2016–2020 are used. The gathering and quality control of the datasets were carried out collectively by members of the International Energy Agency, Photovoltaic Power Systems

458

Solar Irradiance and Photovoltaic Power Forecasting

Programme, Task 16, which is a consortium of 53 institutions of 21 countries, represented by relevant experts in solar resourcing for high penetration and large-scale applications. In short, the effort in ensuring both data quantity and quality is unprecedented, and the authority of the datasets at hand can hence be assumed at once. A practical difficulty in assessing the predictive performance of large collections of data and models is dimensionality, for each of the m models is to be paired with each of the n datasets, resulting in a total of nm experiments. On top of that, if several evaluation metrics and/or several predictands are considered, the dimensionality further multiplies. To avoid tabulating model–dataset-specific results individually, reporting the aggregated values is more efficient and the results are more interpretable. However, since the sample size of each dataset may differ, one may consider weighted averaging. Besides computing the average errors, there are other alternatives for depicting the overall performance of high-dimensional verification exercises. In particular, assessments based on ranking statistics and assessments based on pairwise Diebold–Mariano (DM) test often find relevance. These methods are general, and thus also apply to high-dimensional forecast verification. A linear ranking method (Alvo and Yu, 2014), which seeks to obtain the mean rank of different models under assessment, is herein considered. Given m models, the mean rank for each model can be collected into a vector r = (r1 , r2 , · · · , rm ) , and the rank for the jth model is expressed as: rj =

nk νk ( j) , n k=1 m!



(11.11)

where νk with k = 1, 2, · · · , m! represents all possible rankings (i.e., the permutation) of the m models; nk is the frequency of occurrence of ranking k; n = ∑m! k=1 nk is the number of samples, or in the present case, the number of datasets; and νk ( j) denotes the score of model j in ranking k. A negatively oriented ranking is herein followed, so that a better model would receive a smaller νk ( j). Stated differently, if model j ranks the highest in ranking k, νk ( j) = 1; if it ranks the lowest, then νk ( j) = 10. The ranking is placed, without loss of generality, according to the root mean square error (RMSE) of each model. Separation models predict k, which can be converted back to Dh by multiplying with Gh . Then, the beam normal irradiance (Bn ) can be obtained through the threecomponent closure relationship, i.e., Bn = (Gh − Dh )/ cos Z. The ranking results for Bn and Dh predictions from the 10 separation models are shown in Tables 11.1 and 11.2, respectively. The mean ranks are displayed in the last column, whereas the ranks at individual stations are tabulated in the preceding columns. For example, YANG 4 ranks first at Station 1, and E VERY 1 ranks last at that station. Overall, with a mean rank of 2.47 for Bn predictions, and 2.45 for Dh , YANG 4 has attained the best rank among the 10 models compared. Next, DM tests are conducted to further assess and compare to its peers the overall performance of each model. The null hypothesis of the DM test is that the expectation of loss differential is zero. Denoting a prediction from model A as xA , that of model B as xB , and the corresponding observation as y, the loss differential is defined

Irradiance-to-power Conversion with Physical Model Chain

459

Table 11.1 Ranking results of 10 separation models, based on the root mean square error of B n estimates, at 126 sites. For each site, the best model is ranked “1,” and the worst model is ranked “10.” Whereas the middle columns are omitted, the last column shows the mean rank of each model, the smaller the better. Model

E NGERER 2 E NGERER 4 S TARKE 1 S TARKE 2 S TARKE 3 A BREU PAULESCU E VERY 1 E VERY 2 YANG 4

Station

Mean rank

1

2

3

···

125

126

4 7 3 5 2 8 6 10 9 1

1 6 5 8 4 7 2 9 10 3

6 7 2 3 4 8 5 9 10 1

··· ··· ··· ··· ··· ··· ··· ··· ··· ···

5 7 4 3 2 10 6 9 8 1

6 8 3 2 1 7 5 9 10 4

4.32 6.68 3.00 5.04 3.01 7.87 4.67 9.10 8.84 2.47

Table 11.2 Same as Table 11.1, but based on the RMSE of D h estimates. Model

E NGERER 2 E NGERER 4 S TARKE 1 S TARKE 2 S TARKE 3 A BREU PAULESCU E VERY 1 E VERY 2 YANG 4

as:

Station

Mean rank

1

2

3

···

125

126

4 7 3 5 2 8 6 10 9 1

1 6 5 7 2 8 3 9 10 4

4 7 2 6 3 8 5 9 10 1

··· ··· ··· ··· ··· ··· ··· ··· ··· ···

5 7 4 2 3 10 6 9 8 1

6 8 4 3 1 7 5 9 10 2

    d = S eA − S eB ,

4.81 7.01 3.47 5.04 2.55 7.68 4.06 9.08 8.86 2.45

(11.12)

where S is a scoring function of choice, and eA = xA − y and eB = xB − y are the errors (or losses) of the predictions made by models A and B, respectively. Mathematically, the null hypothesis is written as H0 : E(d) = 0, and because either model could be better than the other, a two-sided alternative H1 : E(d) = 0 is used. To decide whether

460

Solar Irradiance and Photovoltaic Power Forecasting

the null hypothesis should be rejected, one compares the test statistic to the critical value. For instance, for a two-sided alternative with a 95% confidence, the critical values are ±1.96. If the test statistic is smaller than −1.96 or greater than 1.96, the null hypothesis is rejected, which suggests that one set of predictions is significantly better than the other. Given that there are 10 models and 126 stations, a total of C210 × 126 = 5670 DM tests need to be conducted. Yang4 23

8

41 20 49

23

5

7

0

Every2 121 113 122 114 122 100 122 55

0 119

Every1 123 117 121 117 122 104 123 0

69 120

Paulescu 75 17 101 57 97

3

0

3

3

99

Abreu 122 101 123 108 121 0 123 20 25 120

# wins 120 90

Starke3 31 15 53 33

0

24

4

4

75

60

Starke2 71 35 98

0

91 18 68

8

12 106

30

Starke1 24

27 69

25

5

4

8

0

5

3

82

2

5 103

Every1

Every2

0

Yang4

50

Starke3

4

Starke2

4 102 53 92 Starke1

0

Paulescu

Engerer2

Abreu

13 118

Engerer4

Engerer4 122 0 118 90 111 24 109 9

Engerer2

Model B

5

Model A

Figure 11.3 Pairwise Diebold–Mariano (DM) tests for comparing the predictive accuracy of various separation models, in terms of Bn . The numbers show the number of instances the DM test statistic falls in the lower or upper 2.5% tail of a standard normal distribution. In other words, the entries denote the number of “Model A is better than Model B” instances. For example, the entry in the lower-right corner denotes that YANG 4 performs significantly better than E NGERER 2 at 103 out of 126 sites.

Figure 11.3 graphically summarizes the outcome of the 5670 DM tests conducted for Bn predictions, and that of the tests conducted for Dh predictions is depicted in Fig. 11.4. The entries in both figures denote the number of times Model A outperforms Model B. For example, the entry in the top-left corner of Fig. 11.3 reads “4,” which means E NGERER 2 outperforms YANG 4 at 23 out of 126 stations. To that end, as larger entries correspond to darker colors, higher-performance models would have darker columns and lighter rows (e.g., YANG 4), whereas lower-performance models would have lighter columns and darker rows (e.g., E VERY 2). It can then be concluded from Fig. 11.3 that YANG 4 has the best overall performance with a total of 942 wins, followed by S TARKE 1 (879 wins), S TARKE 3 (874 wins), E NGERER 2 (712 wins), so on and so forth. This result is consistent with earlier findings based on ranking statistics. And the conclusion made from Fig. 11.4 is similar.

Irradiance-to-power Conversion with Physical Model Chain

Yang4 18 11 32 12 56

32

5

7

0

Every2 119 110 120 117 121 102 122 58

0 119

Every1 122 112 121 120 121 110 123 0

68 120

Paulescu 40 10 82 46 96 Model B

4

461

5

0

3

4

93

Abreu 122 83 120 113 120 0 120 16 21 122

# wins 120 90

27

5

5

67

60

Starke2 61 22 96

0 105 13 79

6

8 113

30

Starke1 24 10

27 94

43

5

6

Starke3 21

9

29 19

0

0

6

6

90

0

Paulescu

Every1

Every2

Yang4

6 106

Abreu

4

Starke3

82

Starke2

4 101 62 103 4 Starke1

0

Engerer4

Engerer2

Engerer2

Engerer4 122 0 116 99 117 39 115 13 16 115

Model A

Figure 11.4 Same as Fig. 11.3, but for the results of DM test on Dh predictions.

11.3.2.3

Common pitfalls of separation modeling in model chain

A vast majority of the latest-generation separation models are designated to work with 1-min data. When these models become part of a model chain, several pitfalls should be understood. The first of these pitfalls is related to temporal resolution, for most NWP forecasts are produced at 15-min, 1-h, or 3-h resolutions, which do not match the data resolution at which the separation models are fitted. One can certainly feed the low-resolution GHI forecasts to a 1-min separation model and obtain some estimates of Bn and Dh , but due to the different variabilities of these forecasts and those data used for separation model fitting, the degree to which the models can consistently perform on low-resolution data is unclear. From a methodological viewpoint, the success of many high-performance models can be attributed to the special treatment they include when dealing with high-frequency irradiance features, such as cloud enhancement. Since these high-frequency irradiance features do not exist in low-resolution data, the advantage which the high-performance models possess could diminish, as compared to some of the older models fitted using hourly data. There have been some works investigating the issue, but the conclusions can hardly be deemed useful, due to the small number of models compared. To put forth any compelling recommendation in this regard, the investigation must be conducted on the scale of Gueymard and Ruiz-Arias (2016) in terms of both the number of models and the number of stations.

462

Solar Irradiance and Photovoltaic Power Forecasting

The next pitfall, which is not general but applies only to some models, is the inability to serve real-time applications. The BRL model includes several auxiliary input variables among which two require future information with respect to the time stamp for which separation is performed. The daily average kt (i.e., kt,daily ) needed by the BRL model can only be computed after the day has elapsed, and similarly, the three-point average kt (i.e., ψ) needs 1-step-ahead kt values. Since this sort of averaging variable is the basis of several subsequent models, such as those appearing in Starke et al. (2021, 2018); Every et al. (2020); Rojas et al. (2019); Paulescu and Blaga (2019), these latter models are subject to the same problem as the BRL model. Third, conditioning is a conspicuous technique for improving the accuracy of separation models. As seen in the cases of Starke et al. (2021); Every et al. (2020); Abreu et al. (2019), climate class is often taken to be the conditioning variable. The problem however is this. The K¨oppen–Geiger climate class of a location, for instance, is mostly defined by the temperature (afterward, by precipitation) of that location, which is only marginally related to the diffuse–beam composition of surface solar radiation. On this point, if conditioning variables are to be introduced into separation modeling, in order to devise different sets of model coefficients, one should consider the climatology of cloudiness as a dominant conditioning variable. Besides cloudiness, aerosol and albedo are variables which are thought to have the next-level importance (Hollands and Crha, 1987). When multiple conditioning variables prevail, a clustering method such as the k-means is useful. 11.3.3

TRANSPOSITION MODELING

Whereas separation models retrieve diffuse and beam components from the global one, transposition models convert the three horizontal irradiance components to GTI via the transposition equation: Gc = Bn cos θ +Rd Dh +ρg Rr Gh . Insofar as PV system design, simulation, performance evaluation, and forecasting are concerned, it is not feasible to measure and forecast GTI for all possible tilt and azimuth angles, and thus transposition is essential. Among the various quantities on the right-hand side of the transposition equation, the incidence angle (θ ) is obtained from solar positioning with the known orientation of the inclined surface, whereas the transposition factor due to the ground’s reflection (Rr ) and the diffuse transposition factor (Rd ) are to be modeled. In dealing with Rr , most studies consider the ground reflection process to be isotropic (Gueymard, 2009), in that Rr is only a function of the surface tilt:     1 − cos S 2 S 2 S RISO = sin = 1 − cos . (11.13) = r 2 2 2 Generally, isotropy means that a property of a physical quantity is the same in any direction of space and time. The isotropic reflection process on a surface, which is often called “Lambertian reflectance” (Kamphuis et al., 2020), has been known to be an approximation. Coulson et al. (1965) early showed a considerable dependence on the reflection properties of semi-desert and grass surfaces on the solar zenith and

Irradiance-to-power Conversion with Physical Model Chain

463

azimuth angles. Hence, in a later paper, Temps and Coulson (1977) proposed an anisotropic formulation of Rr :     T EMPS 2 S 2 Z Rr = sin (11.14) 1 + sin | cos(φS − φC )|, 2 2 in which the latter two multiplicative terms serve as a correction for the anisotropy of surface reflection. Notwithstanding, owing to the coupling between the reflected and diffuse irradiance components on the tilted surface, which requires dedicated experimental design, there seem to be very few attempts of validating the extent to which RrT EMPS is superior to RISO r . On the other hand, because most studies use just and the results appear satisfactory at large, the bulk of work on transposition RISO r modeling has been focusing on the modeling of Rd . Arbitrary

Zenith

Normal of plane L

ϑ

i Sun Z θ

S

Figure 11.5 An illustration of an inclined plane under an isotropic sky. Indeed, as pointed out by Yang (2016), whenever transposition models are mentioned, it is mainly the different formulations of Rd that are referred to. As Rd = Dc /Dh , which is the ratio between the diffuse titled irradiance (DTI, or Dc ) and DHI, many transposition modeling techniques proceed from the fundamental definitions of atmospheric radiation. Dc can be obtained via integration of sky radiance L(ϑ , ϕ), where ϑ are ϕ are the polar and azimuthal angles of a spherical coordinate system, respectively. Sky radiance has a unit of W/m2 /sr, where sr is steradian, which is the unit for solid angles (see Appendix B). Rekioua and Matagne (2012) gave the following equation: Dc =



i 2δ = wire 2 δ 2rwire πσ 2πσ rwire where lwire and rwire are the length and radius of the cylindrical conductor respec2 R tively. The AC ohmic loss can then be calculated with Joule’s law as Irms ac, wire , where Irms is the root mean square (RMS) amplitude of the transmitted current. In practice, to acquire the power entering the transformer, which is the final stage of the model chain, the AC power output by the inverter is subtracted by the AC loss:  Ptrans = Pac, inv −

Pac, inv Vrms

2 Rac, wire ,

(11.104)

where Vrms is the RMS value of the line voltage at the low-voltage side of the transformer. Besides the skin effect, the current density distribution of one conductor is also impacted by other current-carrying conductors nearby, which is known as the proximity effect. While the proximity effect is also caused by electromagnetic induction, it differs from the skin effect in that the proximity effect results from the mutual induction between insulated conductors rather than self-induction. However, the proximity effect enhances the AC resistance of a conductor as well as its thermal loss by preventing current distribution from being even over the cross-section. The quantification of proximity effects must be performed by considering the distance between, as well as the cross-sectional area of, the acting and acted conductors: The nearer the conductors are placed and the larger the cross-sections are, the more prominent the proximity effect. On this point, if one is to conduct an investigation in regard to

Irradiance-to-power Conversion with Physical Model Chain

505

the quantification of proximity effect on AC resistance, the usual strategy is to engage the finite element method, which is not only a very specialized skill but also time-consuming. In this regard, it is possible to estimate the AC resistance based on the DC resistance, as exemplified in the work of Hafez et al. (2014); this approach is suitable for medium-voltage grid inter-connection, which is typical for MW-scale PV plants. 11.3.7.6

Transformer

The transformer, which is the final component of a grid-tied PV system, steps up the AC voltage of the inverter output to match the grid voltage. There are various international and national standards on how low, medium, and high grid voltage ratings are categorized. For solar energy systems, the Institute of Electrical and Electronics Engineers (IEEE) 1547 Standard for Interconnection and Interoperability of Distributed Energy Resources with Associated Electric Power Systems Interfaces (IEEE Std 1547-2018) is often taken as one of the foundational documents (Narang et al., 2021). In IEEE Std 1547-2018, low voltage is defined to be a class of nominal voltages below 1 kV, medium voltage ranges from 1 kV to 35 kV, and high voltage beyond that. The topology of a PV system refers to the way in which constituent parts are connected and interrelated. AC collection grid topologies can be described as radial, ring, or star (Cabrera-Tobar et al., 2016); these three configurations also work for wind power plants (De Prada Gil et al., 2015). A radial collection grid connects in series several transformers. With the benefit of being economical, the radial configuration has low reliability, since losing one transformer disables the entire collection grid. A ring collection grid is able to improve reliability, for it closes the serially connected transformers by joining the open ends, such that when one transformer is lost, the grid still can act as a radial one. In a star collection grid, all transformers share the same medium voltage connection point, which implies the highest reliability as compared to the other two configurations. The procedure of sizing a transformer is closely analogous to that of sizing an inverter, in that, one generally seeks a balance between the conservative approach of sizing according to the rated power of the plant and the economical approach of under-sizing. In the former case, the transformer tends to operate at a reduced efficiency for most instances—this is because the operational ambient conditions rarely match the STC—which may introduce large oscillations in power injected into the grid (Testa et al., 2012). In the latter case, a transformer that is too small presents as a bottleneck of the exported power, which means energy waste. Testa et al. (2012) proposed a transformer sizing strategy based on the loss of produced power probability (LPPP) index, which estimates the probability that the transformer is unable to deliver a part of power reaching the transformer to the grid, due to either power losses in transformer or transformer overloads. Conceptually, LPPP should naturally be minimized, and LPPP depends on the load profile, solar resource availability, and presence of energy storage devices (Testa et al., 2012). In any case, if the sizing information is opaque to the forecaster, her ability to precisely model the transformer loss would again be impaired.

506

Solar Irradiance and Photovoltaic Power Forecasting

Power losses due to transformers are caused by two mechanisms. One of those is known as the core loss, which is an umbrella term for the various losses occurring in the transformer under no-load situations, namely, eddy current loss, hysteresis loss, stray eddy current loss, and dielectric loss. Since the transformer core is made of iron, core loss is also referred to as the iron loss (PFe ). Core loss depends upon the voltage level and the operating frequency, and can be assumed to be constant, insofar as the transformer is excited. The other type of loss is the copper loss (PCu ), which is the heat produced by the currents in the transformer windings. Copper loss depends upon the loading of the transformer, and therefore appears only during operation. In the case of oil-cooled (oil-immersed) transformers, both the iron and copper losses vary quasi-linearly with the nominal transformer rating Ptrans , which has a unit of kVA: PFe = β0 + β1 Ptrans, ref ,

(11.105)

PCu = β2 Ptrans, ref ,

(11.106)

of which the coefficients β0 , β1 , and β2 for oil-cooled transformers, with a nominal transformer rating ranging from 50 to 2500 kVA, with the highest voltage for equipment not exceeding 36 kV, have been tabulated by Malamaki and Demoulias (2014). For cast-resin (dry-type) transformers, the iron loss still varies quasi-linearly with Ptrans, ref , but the copper loss is quadratic with Ptrans, ref , i.e., PFe = β0 + β1 Ptrans, ref ,

(11.107)

2 PCu = β2 Ptrans, ref + β3 Ptrans, ref ,

(11.108)

of which the coefficients β0 , . . . , β3 for cast-resin transformers, with a nominal transformer rating ranging from 100 to 3150 kVA, with highest voltage for equipment not exceeding 36 kV, have again been tabulated by Malamaki and Demoulias (2014). Suppose there are Ntrans identical transformers, which are connected in parallel, the total transformer loss under an input power Ptrans —recall Eq. (11.104)—is given as the sum of iron and copper losses, which needs to be subtracted from Ptrans during model chain evaluation, and the final power injected into the grid is:   2  Ptrans PCu . (11.109) Pgrid = Ptrans − Ntrans PFe + Ntrans Ptrans, ref Alternatively, when information for identifying PFe and PCu is unavailable, one may account for the transformer loss via a lump medium-voltage loss factor at nominal power, which is herein denoted as ltrans . The power injection into the grid in this case may be written as:   Ptrans Pgrid = Ptrans 1 − ltrans . (11.110) Ptrans, ref As a rule of thumb, transformers consume approximately 1.5% of the nominal energy output by the PV plant, i.e., ltrans ≈ 1.5%.

Irradiance-to-power Conversion with Physical Model Chain

11.4

507

IRRADIANCE-TO-POWER CONVERSION VIA HYBRID METHODS

The preceding section explains and exemplifies the irradiance-to-power conversion via physical model chains in great detail, alongside the practical difficulties in fully exploiting such approaches. In short, the most desirable kind of realization of a physical model chain requires the PV design parameters (i.e., system specifications) to be fully known by the forecaster. A follow-up question is therefore: How does the unavailability of certain design parameters affect the eventual forecast performance? The preceding section also reveals that there are many and often overwhelming alternative model choices present for each stage of the model chain. The second question of concern is thus: How can one identify the optimal model chain, may it be complete or only partial, based on given information? Needless to say, both questions do not lend themselves to simple answers, as the regimes under which forecasting is needed and thus performed can be profoundly diverse. There are, fortunately, some early studies that shed light on these issues. The first above-mentioned question has been the subject of investigation of Mayer (2021), who compared the forecast performance of model chains with design parameters known to different extents, at 16 PV plants in Hungary, with nominal power ranging from 0.5 to 20 MW. A total of five data-availability scenarios were depicted, as listed in Table 11.6. Whereas S0 corresponds to the scenario in which the design parameters are known to the best extent, scenarios S1 , S2 , and S3 gradually censor the available design parameters, until scenario S4 , in which only the location and nominal power rating of the system are known. The day-ahead NWP forecasts used in that study come from the M´et´eo-France’s Application of Research to Operations at Mesoscale (AROME) model, which has a 15-min resolution and a 48-h horizon. The last two rows of Table 11.6 tabulate the overall normalized MAE (nMAE) and normalized RMSE (nRMSE) of the five scenarios. Somewhat surprisingly, the experiment has led to the conclusion that having full information (scenario S0 ) is only marginally better than other cases using reduced information (scenarios S1 , . . . , S3 ); the only exception is scenario S4 , which has a much lower performance. Putting it differently, the outcome of the study suggests that if the location, nominal power, and orientation of the PV system are known, irradiance-to-power conversion can already be done rather satisfactorily, although such a claim might attract objections that more validation under other climatic and weather regimes needs to be conducted if one wishes to truly affirm the finding. Identifying an optimal model chain has only attracted attention in the literature for a few years. Mayer and Gr´of (2021) classified the literature, in what is likely the first comprehensive comparison of model chains. A set of 9 separation models, 10 transposition models, 3 reflection loss models, 5 cell temperature models, 4 DC power models, 2 shading models, and 3 inverter models, was taken as elements, resulting in 32,400 combinatorial model chain choices. These model chains were each used to convert AROME forecasts to power output forecasts, under both intra-day and day-ahead settings, for each of the 16 PV systems in Hungary. The conclusion of that work is again surprising in two respects: First, no model chain

508

Solar Irradiance and Photovoltaic Power Forecasting

Table 11.6 Five data-availability scenarios devised by Mayer (2021), for studying the effect of availability of design parameters on model-chain-based PV power forecast accuracy. The nMAE and nRMSE for each scenario are shown respectively in the last two rows. Design parameters

S0

S1

S2

S3

S4

Geographical location Nominal DC power Nominal AC power Module tilt angle Module azimuth angle Row distance Module rows on a structure line Nominal DC cable losses AC cable and transformer losses Other losses Detailed PV module data Detailed inverter data

           

         

      

    

  

29.42% 47.53%

29.42% 47.56%

29.48% 47.64%

29.79% 47.75%

36.25% 51.91%

nMAE nRMSE

is optimal for all systems, and second, the stage-wise best models often result in suboptimal model chains. Averaging all plants and forecast horizons, the nMAEs of the model chains range from 28.3% to 32.6%, whereas nRMSEs range from 46.1% to 52.1%, which translate to differences of (32.6 − 28.3)/32.6 = 13% in nMAE and (52.1 − 46.1)/52.1 = 12% in nRMSE between the best and worst model chain in relative terms, or 32.6 − 28.3 = 4.3% in nMAE and 52.1 − 46.1 = 6.0% in nRMSE in absolute terms. Such differences are quite substantial, making the conclusions from those studies that used only arbitrarily chosen model chains open to doubt. In view of these two field-advancing studies, there is an urge to explore hybrid model chains. A hybrid model chain considers both the physical and statistical aspects of irradiance-to-power conversion. In that, those relatively mature energy meteorology models, in particular, separation, transposition, reflection loss, and cell temperature models, are first applied, so as to arrive at the intermediate quantities, such as the effective incident irradiance or cell temperature. Then, these intermediate quantities are fed to a regression model, as to obtain the final PV power forecast. As compared to a purely regressive irradiance-to-power conversion model, hybrid conversion methods leverage the knowledge of energy meteorology, which is preferred a priori. One may regard those intermediate quantities as transformed variables or as engineered features, e.g., Gh is transformed/engineered to Gc and Tamb is transformed/engineered to Tcell , which have been shown to be useful in numerous forecasting contexts. As compared to a full model chain, hybrid methods demand

Irradiance-to-power Conversion with Physical Model Chain

509

much less site-specific information to operate, which enhances their practicality. The drawback of hybrid methods, on the other hand, inherits that of the purely regressive methods, that is, model training necessitates historical PV power data, which is often insufficient in amount or not at all available for newly commissioned power plants. Table 11.7 presents a quick comparison of the three classes of methods.

Table 11.7 Summary of regression, physical, and hybrid methods for irradiance-topower conversion. Regression

Model chain

Hybrid

Design parameter

Not required

Historical data

Required (only partially) Required (for both training and optimization)

Knowledge base

Required (for both training and optimization) Data science

Required (as complete as possible) Optional (only for optimization)

Solar energy meteorology

Advantages

Simplicity

Both data science and solar energy meteorology Possible best accuracy

11.4.1

Applicable since commissioning of the plant

CONSTRUCTING A HYBRID MODEL CHAIN

One must not take hybrid model chain as a new concept, since the idea is too intuitive to be missed by anyone who faces the problem of unknown design parameters during forecasting. And machine learning, which is capable of performing a “blackbox” mapping from input to output, is the most apparent solution. This impulse of constructing hybrid model chains was felt by Ogliari et al. (2017), who engaged the clear-sky irradiance as a predictor of the so-called physical hybrid artificial neural network (PHANN), to inject a physical sense into machine learning. It was also acknowledged by Luo et al. (2021), who proposed the so-called physics-constrained long short-term memory network (PC-LSTM), which is an LSTM model with a preprocessing filter, a post-processing filter, and a penalty function stronger than the squared loss employed by the original LSTM, which are designed to ensure the input and output values are always within the physically possible limits (e.g., no negative irradiance and power). Although the common conclusion of works of this sort is the superiority of hybrid models, which promotes further proposals of such models, few studies actually consider a complete physical model chain, and which part should be modeled physically and which part statistically have been selected in an expedient manner without too much justification. From these past works a systematic way of constructing hybrid model chains is needed.

510

Solar Irradiance and Photovoltaic Power Forecasting

What are the intermediate quantities one can compute based upon the available system design parameters? Can the benefits of using a particular energy meteorology model outweigh its uncertainty? These are the questions that ought to be contemplated when constructing a hybrid model chain. Suppose Gh , Tamb , W , and ρg are available from NWP, then from the geographical location alone, one can compute Ghc from a clear-sky model, Bn and Dh from a separation model. If the system’s orientation is known, one can further perform transposition, as to arrive at Gc . With the PV module type (i.e., material and encapsulation) known, Tcell results from a cell temperature model, and Gc from a reflection loss model. In a similar fashion, when more design parameters are known, more intermediate quantities can be derived. That said, when some design parameters are missing, it is entirely possible to make inferences on them, such that energy meteorology models can still be of use, but probably are associated with a higher uncertainty. For instance, if the orientation of the system is unknown, different combinations of tilt angle S and PV azimuth angle φC can be tested via a model chain, and the set that maximizes the estimated and observed power concludes the inference. The drawback of this inference rests upon the fact that one can never ascertain its validity, because no two model chains can give the exact same inferred set of S and φC . This limitation is particularly prominent when the input to the model chain is forecast, which, in itself, is highly uncertain. In this regard, the forecaster might be better off by not making any inference on unknown design parameters, but simply leaving that to regression. Although hybrid model chain is a topic that is hitherto insufficiently studied, there are some strategies that are thought to be useful moving forward. The first strategy consists of optimizing a hybrid model chain, which is to be further discussed in the next section. As the performance of virtually all energy meteorological models is location-dependent, betting on a pre-defined model chain is unwise, even if that model chain has gained traction elsewhere. The second direction of investigation is towards how the regression model should be constructed. In that, various well-tested practices for feature selection, model identification, parameter estimation and tuning can all be applied. Moreover, one does not need to limit the regression predictors to those intermediate quantities available at a particular stage of a model chain; instead, intermediate quantities available at different stages of a model chain or from different model chains can all act as predictors, with which various data-driven methods, such as deep learning or penalized regression, are able to automatically pick up the most relevant ones while shrinking or discarding the rest. Next, in order to understand the contribution of each part of the construction process to the final observed error, it is useful to imitate the practice of the physicists, and study each separate part in artificial isolation. This sort of sensitivity analysis would be very helpful in identifying the weakest part, and thus devising targeted action for improvements. Last but not least, to echo the earlier discussion in Section 11.2.1, it is logically attractive to exploit the clear-sky PV power output, as to arrive at the clear-sky index of PV power, also known as kPV (Engerer and Mills, 2014). Consequently, the regression takes kPV as the predictand, which can separate the variability due to diurnal movement of the sun—reducing or stabilizing heterogeneity in response is a well-tested theory of regression.

Irradiance-to-power Conversion with Physical Model Chain

11.4.2

511

OPTIMIZING A HYBRID MODEL CHAIN

Optimizing a model chain means finding a particular combination of component models (i.e., separation model A plus transposition model B plus temperature model C, so on and so forth) that minimizes a prespecified loss function. Whenever optimization is mentioned, one unhesitatingly perceives the problem as a mathematical one, in which rigor and elegance are of primary importance. Owing to the intricacy of model chain, there is very unlikely any analytic form for the objective function, let along framing the problem into one of those that can be handled by standard mathematical optimization routines. Optimization, of course, is an exceedingly bulky knowledge body, and we cannot rule out at any rate the possibility that there exists some mathematical optimization technique that handles this form of problem in exactness. But from a practical viewpoint, it seems that optimization by enumeration, though blunt, is sufficient. Stated differently, the optimization of a model chain is achieved by first testing a large number of model chains, each composed of a different set of component models, and then selecting the combination that results in the lowest error; this has indeed been the method adopted by Mayer and Gr´of (2021). One advantage of a hybrid model chain over a full model chain is that the former requires fewer component models to operation, since models in the second half of the model chain can be lumped and handled as a whole by regression. This greatly reduces the number of enumerations that one has to conduct to optimize a model chain. Even so, notwithstanding, the pool of available models can still be too large—recall that there are, for instance, hundreds of separation models and tens of transposition models. Clearly then, one has to resort to a subset of available component models. Subsetting component models can be done in many ways, as long as the justification carries a notion of objectiveness. For example, Gueymard and Ruiz-Arias (2016) categorized separation models according to the number of input variables required, and it is reasonable to select one representative from each category. Yang (2016) divided transposition models into generations based on their sophistication, and it is possible to pick one or two models from each generation. Besides pooling component models, the optimization of model chains is also concerned with the goodness of the final PV power forecasts. Most, if not all, previous content in this book concerning verification is therefore relevant. Knowing the forecasts are to be penalized based on one scoring rule, there is no motivation to optimize the model chain according to another. This may appear to be trivial, but one must not overlook the notion of consistency throughout the optimization process. To give perspective, suppose the model chain is to be optimized for MAE, using a least squares regression as part of the hybrid model chain, in itself, could already be problematic, even if the combination of certain energy meteorology models in the preceding part of the model chain could lead to better performance under MAE than other combinations. In other words, the overall optimality of a hybrid model chain can only be ensured if the optimality of individual parts is ensured.

512

11.5

Solar Irradiance and Photovoltaic Power Forecasting

PROBABILISTIC IRRADIANCE-TO-POWER CONVERSION

Irradiance-to-power conversion constitutes the third main step of PV power forecasting, with the first two being generating and post-processing weather forecasts. Neither of the first two steps concerns this chapter, and they have been discussed in earlier chapters of this book. Nonetheless, one should be aware that forecasts of weather variables can be either deterministic or probabilistic. Hence, how these forecasts interact with model chain gives rise to a new topic—probabilistic model chain. As uncertainty quantification is an integral part of any forecasting process, it is thought important to discuss that in the last bit of this chapter. Recall that Section 11.2.3 discusses several post-processing means to improve the accuracy and to alter the form/style of those PV power estimates resulting from regression-based irradianceto-power conversion methods. Among those options, all D2P and P2P methods can by definition lead to probabilistic PV power forecasts. In a sense, those methods should be regarded as ways to perform probabilistic irradiance-to-power conversion. Nonetheless, this section should move beyond those statistical means of generating probabilistic PV power forecasts. More specifically, the following discussion deals exclusively with the probabilistic methods for solar power curve modeling that involve model chains. Indeed, regardless of whether the design parameters are known or to what degree they are known, each stage of the model chain is still subject to modeling uncertainties of varying degrees, and the error propagation and communication are intricate and complex. Probabilistic modeling is definitely not a new concept, yet, its realization in model chain applications, or solar radiation modeling in general, is but exceedingly recent. Probabilistic solar radiation modeling started with Yang (2020c,d), who performed site adaptation—a class of bias-correction methods for satellite-derived irradiance—on eight different products. In the following months, several other studies of a like nature have been published, each dealing with a different (sub)stage of the model chain (Yang and Gueymard, 2021a,b, 2020; Quan and Yang, 2020). In general, two distinct approaches of probabilistic modeling for energy meteorology models (i.e., the component models of a model chain) are possible. One of those is to modify an existing model by upgrading its construct, as to integrate the notion of uncertainty into the modeling itself. For instance, the Perez model, recall Section 11.3.3, is essentially a least squares problem assuming homogeneous Gaussian errors (Yang et al., 2014b; Perez et al., 1988), which by nature offers not just the predictive mean but also the standard error, which in turn allows probabilistic estimation of GTI (see Quan and Yang, 2020). Notwithstanding, making probabilistic extensions to an energy meteorology model can be challenging, and statistical calculations such as the one done for the Perez model can become laborious. Additionally, the predicted distributions are often confined to a prespecified parametric structure. As a result, utilizing an ensemble is a more favorable alternative for probabilistic modeling. In that, a straightforward approach is to gather multiple models of a particular class and consider their outputs as members of an ensemble. This method was utilized by Quan and Yang (2020); Yang and Gueymard (2020) for transposition and decomposition modeling. Another benefit of ensemble-based probabilistic

Irradiance-to-power Conversion with Physical Model Chain

513

Inverter

Shading

DC model

Temperature

Reflection

Transposition

Separation

Weather forecasts

PV power forecasts

modeling of energy meteorology models is that such predictions allow further calibration. This can be achieved through various P2P post-processing methods, such as BMA or EMOS, which provide predictive densities of different kinds. Ensemble modeling can be applied to every stage of a model chain, given the availability of numerous component models for each stage. However, the main challenge lies in dealing with the large dimensionality involved. To give perspective, suppose m1 predictions result from an ensemble of separation models, one may feed each member to each of the m2 transposition models, and then each of the m3 reflection models, and so on and so forth, resulting in a total of ∏M k=1 mk predictions, where M is the number of stages in the model chain; the dimensionality scales exponentially. A more effective method would therefore be to utilize a collection of model chains. This entails treating each model chain as a member of an ensemble. In the provided illustration, Fig. 11.20, the thin dashed paths indicate potential members, whereas the thick solid path indicates the most likely option, which is the optimized model chain. This framework was first formally put forward by Mayer and Yang (2022), who presented a model-chain ensemble in a PV power forecasting setting.

Figure 11.20 A schematic diagram of the probabilistic model chain. Figure inspired by Mayer and Yang (2022).

The study conducted by Mayer and Yang (2022) utilized power measurements from eight ground-mounted PV plants and deterministic day-ahead forecasts (24– 48 h in advance) from the Hungarian Meteorological Services’ operational AROME model. The study tested five different probabilistic model chains, which varied in terms of the number of members or the method used to select the members. Two strategies for selecting member model chains were easily thought of: (1) random selection and (2) ensuring each component model gets selected at least once—the latter is termed ACM, which stands for “all component models.” The model-chain ensemble’s predictive performance was analyzed by the authors, who also examined

514

Solar Irradiance and Photovoltaic Power Forecasting

the impact of applying quantile regression as a post-processing tool. The conclusion obtained therein is highly informative. The initial discovery was that the model-chain ensemble lacked proper dispersion, highlighting the need for calibration. From the gathered evidence, it appears that linear quantile regression utilizing one year of fitting data is adequate, considering that the model chains have already dealt with nonlinearity in PV power modeling. Calibration also assists in mitigating the impact of subjective decision-making during ensemble member selection, reducing the importance of the number of members. Lastly, it has been determined that model-chain ensembles are advantageous even when only deterministic forecasts are required. Naturally, one follow-up question is whether using deterministic weather input is already good enough, or would there be any added benefits if probabilistic weather forecasts are used in conjunction with a model-chain ensemble? Recall Fig. 3.10, in which a schematic illustration of the trajectory of a dynamical ensemble is shown— an ensemble NWP model issues several equally probable forecast trajectories with slightly perturbed initial conditions. Similarly, other forms of ensemble weather forecasts, such as the poor man’s ensemble or analog ensemble, could all deliver probabilistic weather input. In this regard, there are three ways to materialize a probabilistic model chain without post-processing: (1) deterministic weather forecasts with an ensemble model chain, (2) probabilistic weather forecasts with a single optimized model chain, and (3) probabilistic weather forecasts with an ensemble model chain. The exploration of the first possibility has been examined by Mayer and Yang (2022), whereas the discussion of the two latter options has been conducted by Mayer and Yang (2023b) in a follow-up work. The ensemble in its raw form does not fully cover all sources of uncertainties, resulting in under-dispersion. This is particularly noticeable in the model-chain ensemble as it only considers the uncertainty of the irradiance-to-power conversion, whereas the majority of uncertainties stem from the NWP. Reliability is a prerequisite of good probabilistic forecasts, therefore, ensemble models always need to be calibrated, which can be done practically with any P2P method. In a more general sense, even deterministic predictions can be post-processed into probabilistic ones, which makes the workflows of creating probabilistic solar power curve with postprocessing extremely versatile: 1. Deterministic weather input + D2P post-processing + single model chain, 2. Deterministic weather input + D2D post-processing + ensemble model chain, 3. Deterministic weather input + D2D post-processing + single model chain + D2P post-processing, 4. Deterministic weather input + D2P post-processing + single model chain + P2P post-processing, 5. Deterministic weather input + D2D post-processing + ensemble model chain + P2P post-processing, 6. Probabilistic weather input + P2P post-processing + single model chain,

Irradiance-to-power Conversion with Physical Model Chain

515

7. Probabilistic weather input + P2D post-processing + ensemble model chain, 8. Probabilistic weather input + P2D post-processing + single model chain + D2P post-processing, 9. Probabilistic weather input + P2P post-processing + single model chain + P2P post-processing, 10. Probabilistic weather input + P2D post-processing + ensemble model chain + P2P post-processing. It seems that having the freedom to choose comes with a significant amount of effort required to determine the best workflow and its reasoning. As of now, there are no publications on this matter, but it has been noted that several are currently being prepared (pers. comm. with Sebastian Lerch, Karlsruhe Institute of Technology, 2023). Preliminary results from Lerch, as well as the present authors, suggest that including an intermediate post-processing step for irradiance may not be efficient, as the final stage of processing is able to fix most of the calibration issues without sacrificing too much accuracy. If that is the case, the possible options then reduce to: 1. Deterministic weather input + single model chain + D2P post-processing, 2. Deterministic weather input + ensemble model chain + P2P post-processing, 3. Probabilistic weather input + single model chain + P2P post-processing, 4. Probabilistic weather input + ensemble model chain + P2P post-processing. These four workflows correspond to methods 0, 1C, 2C, and 3C in the article by Mayer and Yang (2023b), where “C” stands for “calibration,” which contrasts methods 1R, 2R, and 3R that denote the corresponding “raw” versions without P2P postprocessing. It is worth noting that all four workflows involve post-processing, which could be seen as a hybrid approach where the model chain and probabilistic regression collaborate closely. Data from 14 utility-scale PV plants in Hungary was analyzed by Mayer and Yang (2023b) during the years of 2019–2020. The analysis was conducted using ensemble NWP forecasts from the ECMWF at a 15-min temporal resolution. Whereas the ensemble model chain construction follows the previous work largely (Mayer and Yang, 2022), quantile regression is selected as the P2P post-processing tool. It was found that the most precise method is the combination of ensemble NWP and ensemble model chain with post-processing, i.e., method 3C. Despite other options producing only slight declines in the continuous ranked probability score (CRPS), the use of post-processing makes all the difference. However, when post-processing is not applied, the CRPS is generally higher. In terms of deterministic forecasting, i.e., eliciting deterministic forecasts from ensemble forecasts, method 3C again achieved the smallest error. This indicates the importance of utilizing a probabilistic model, even if the end goal is to obtain deterministic forecasts. This pioneering work has shed light on the ways in which probabilistic irradiance-to-power conversion can be of service.

Forecasting 12 Hierarchical and Firm Power Delivery “In all social animals, including Man, cooperation and the unity of a group has some foundation in instinct.” — Bertrand Russell

Throughout the history of humankind, social cohesion has always been a vital premise of any progress of civilization. Early men knew hunting and gathering could be more efficient if they were cooperative, which led to the transition from families to small tribes. The size of tribes grew with the passing of time until they reached the vast conglomerations that are now known as nations. In today’s society, the consensus on the needs and benefits of cooperation is stronger than ever, in that, synergy—the interaction or cooperation of two or more entities, substances, or other agents to produce a combined effect greater than the sum of their separate effects— is so desirable an effect that it is ubiquitously emphasized and pursued by almost all industries and social groups. On this note, this chapter of the book is centered around cooperation in forecasting, not just in the mere form of ensemble where a decision-maker solicits opinions from several forecasters, but under the conception of a hierarchical structure, where forecasts are first produced separately by individual entities in the hierarchy and then reconciled into a coherent set of forecasts that maximizes the collective benefits of everyone. In many forecasting applications, the process of concern is not just an isolated time series, but consists of a set of time series that can be organized into one or more hierarchies. The various levels in the hierarchy have a two-fold arithmetic connection, traveling one up, one down; the upward connection proceeds by aggregation, the downward by allocation. Depending on the context, the hierarchy can be organized by product, location, time period, or some other features (or attributes) that have a natural appeal to grouping. To give perspective, several examples are enumerated next. In the retail industry, the demand structure in a supply chain can be viewed as a hierarchy, where goods produced by a manufacturer are shipped to its distributors, who then distribute their shares of allocation of the goods to various retailers (Yang and Zhang, 2019). In tourism, the total domestic tourism market can be split into state-wise markets, each of which can be further split into city- or attraction-wise markets (Athanasopoulos et al., 2009). In power systems, the electricity demand over a region can be regarded as the sum of the electricity demand in each city, which, in itself, is the sum of the demand of each substation within each city (Ben Taieb et al., 2021). Similarly, and to what this book is concerned the most, the photovoltaic (PV) power generation within an interconnection can be disaggregated into PV power generation in each transmission zone, by each utility PV plant, at each distribution node, by each distributed PV plant, or even by each inverter, DOI: 10.1201/9781003203971-12

516

Hierarchical Forecasting and Firm Power Delivery

517

depending on the level of granularity required (Yang et al., 2017b), see Fig. 12.1 for an illustration. Albeit the hierarchical structure of the power system is by construct and therefore must have been known since its existence, the focus of traditional forecasting has mainly been on the load and renewable power generation for the whole system or at the nodal level, using only information as available at one particular level. This is true for both the industry and research. For example, the Global Energy Forecasting Competition 2017 (GEFCom2017), though being quite recent with respect to the history of load forecasting, is nevertheless the first forecasting competition that provides its contestants with hierarchical load data on more than two levels (Hong et al., 2019). Unfortunately, perhaps it was due to the unfamiliarity with the hierarchical forecasting framework, the use of hierarchical information during the competition was rather modest—amongst the 12 finalists that emerged from the qualifying round, only one-third of them leveraged the hierarchy and none was ranked amongst the top finishers (Hong et al., 2020). Regardless, forecasting limited to a system or nodal level is already being challenged due to the increasing share of renewables in power systems and decreasing visibility to the actual consumption, which pose substantial barriers to high-quality net load forecasting, which is essential to the subsequent power system operations, such as unit commitment or economic dispatch. For load forecasting particularly, with the prevalence of smart meters, electricity consumption information with the user and household granularity is now being revealed at an unprecedented level of detail (Ben Taieb et al., 2021). In this regard, we must reevaluate and reimagine the forecasting procedures in a power system setting, as to how they can best adapt to the hierarchical forecasting framework. Insofar as the motivation for developing hierarchical forecasting methods for solar is concerned, arguments may be easily presented to justify why solar forecasts are needed at various levels of the hierarchy. Individual plant owners can benefit from plant-level forecasting, in managing and optimizing their electricity selfconsumption using on-site load or energy storage and energy economics (Beltran et al., 2021). Distribution system operators (DSOs) are motivated by minimizing solar power integration costs at the medium- and low-voltage levels, and forecasting for nodal solar power injection benefits voltage control, state estimation, and other operations in distribution systems (Bessa et al., 2015a). Transmission system operators (TSOs) require solar forecasts to conduct net load forecasting which is a necessary premise in unit commitment (Orwig et al., 2015). Clearly, the main challenge here is this: When the relevant entities become too numerous and with diverse goals, there would be a need for some mechanism for arriving at collective decisions. In terms of hierarchical forecasting, making good collective decisions motivates a set of forecasts that can benefit every entity in the hierarchy. Ben Taieb et al. (2021) summarized three difficulties of making hierarchical forecasts. First, time series at various levels of a hierarchy exhibit notably different features, and the choice of forecasting methods therefore depends upon these features, as well as on the information available to capture these features during modeling. Whereas plant-level forecasts, as elaborated in the preceding chapter, should be generated by passing numerical weather prediction (NWP) forecasts through a

518

Interconnection

Transmission zone

Transmission zone

Central inverter1

Central inverter2

...

UPV2

...

Central inverteri

Inverter1

UPVm

DPV1

Inverter2

Distribution node

DPV2

...

...

...

Distribution node

DPVn

Inverter j

Figure 12.1 An illustration of the hierarchical nature of grid-tied PV plants in a power system. Abbreviations: UPV, utility-scale PV; DPV, distributed PV.

Solar Irradiance and Photovoltaic Power Forecasting

UPV1

Transmission zone

...

Hierarchical Forecasting and Firm Power Delivery

519

complete model chain—plant owners would have the system design information available for irradiance-to-power conversion—nodal-level forecasts are often generated by extrapolative methods based on historical observations—knowing the time series at the nodal level are less variable, statistical extrapolation may be sufficient. Alternatively, forecasts at higher levels of the hierarchy may be issued by simply summing up the bottom-level forecasts submitted by plant owners, which constitutes a bottom-up aggregation (Tuohy et al., 2015). This bottom-up approach, on the one hand, has been demonstrated to be suboptimal on many occasions, both in and outside of the solar context (e.g., Yang et al., 2017b; Hyndman et al., 2011), but on the other hand, bottom-up forecasts are aggregate consistent, which is a desirable trait. Aggregation consistency or coherency refers to the desirable property that the forecast of an aggregated series being equal to the sum of the forecasts of the corresponding disaggregated series. However, if one wishes to go beyond the bottom-up approach and produce the forecasts for each series in the hierarchy independently, having aggregate-consistent forecasts is nearly impossible, due to the different forecasting methods used by and inhomogeneous information available to forecasters at different levels; this is the second major challenge of hierarchical forecasting, i.e., ensuring coherency. The last challenge of hierarchical forecasting lies in its high dimensionality, which is a defining characteristic of power systems, in that, the hierarchy contains tens of thousands of time series; this dimensionality introduces various mathematical and computational inconveniences, for example, during the estimation of the variance–covariance matrix of forecast errors (Hyndman et al., 2016), which is to be discussed in more details below. Despite its challenges, hierarchical forecasting presents a list of benefits, among which the most prominent is the capacity to allow coherent decision-making across the hierarchy (Athanasopoulos et al., 2020). Since hierarchical forecasts, in their ideal form, are aggregate consistent, one has reason to infer the decisions made thereof being more consistent than those made based on unrelated and disagreeing forecasts. This links to a second significant benefit of hierarchical forecasting, which is its proficiency in exploiting spatio-temporal dependencies among various series in the hierarchy (Wickramasuriya et al., 2019). As we shall see from the technical details below, optimal hierarchical forecasting can be regarded as a post-processing tool, which assigns weights to and therefore adjusts the importance of individual forecasts made on each level. Third, in a hierarchical forecasting setting, lower-level entities submit their forecasts to the higher-level entities, which relieves the burdens of TSOs and DSOs, who would otherwise need to produce lower-level forecasts on their own and maintain a large database of plant-level information including design parameters and historical observations. Stated differently, the hierarchical forecasting framework decentralizes the forecasting task to individual lower-level participants, but the utility that would have benefited from centralized decision-making is not at all undermined. Given the current trend of enacting energy policy related to grid integration, mandating forecast submission is becoming the norm (Yang et al., 2022b), and in that respect, hierarchical forecasting is no longer a theoretical conception as it was just a few years ago but possesses actual applicability.

520

Solar Irradiance and Photovoltaic Power Forecasting

Before we move on to the main content of this chapter, previewing some definitions and concepts is thought to be useful. Hierarchical forecasting is a two-step procedure, the first of which is to be performed by each individual player in the hierarchy, who is responsible for generating forecasts for the respective series, and the second involves an aggregator, who is tasked to post-process the initial forecasts into a set of aggregate-consistent ones. Forecasts generated in the first step are known as the base forecasts and those generated in the second step are known as the reconciled forecasts. It should be noted that the aforementioned bottom-up hierarchical forecasting approach does not require reconciliation, since the higher-level forecasts can directly result from aggregating the base forecasts at the bottom level; in this special case, base and reconciled forecasts are identical. In most other circumstances, where reconciliation is explicitly needed, forecasters are concerned with two types of errors, the base forecast errors and the reconciled forecast errors. As is the case of many other statistical forecasting procedures, the optimality of a technique is reached if the variance of the forecast errors is minimized. In that, optimal forecast reconciliation seeks to minimize the variances of the reconciled forecast errors, which is further related to the variances of the h-step-ahead base forecast errors. However, due to error propagation in multi-step-ahead forecasting, even estimating the variances of the h-step-ahead base forecast errors is challenging, which has led to the proposals of several alternative but weaker solutions that are much simpler to estimate. It should be noted that hierarchical forecasting can be performed in both deterministic and probabilistic settings. Recent developments in hierarchical forecasting were mostly contributed by Hyndman’s research group (Hyndman et al., 2011, 2016, 2020; Athanasopoulos et al., 2009, 2017, 2020; Wickramasuriya et al., 2019; Ben Taieb et al., 2021; Panagiotelis et al., 2021; Spiliotis et al., 2021; Ashouri et al., 2022; Abolghasemi et al., 2022), to whom those who are interested in knowing the statistical theory and proofs are referred.

12.1

NOTATION AND GENERAL FORM OF HIERARCHICAL FORECASTING

Suppose there exists a multi-level hierarchy, in which level 0 (L0 ) denotes the most aggregated series, level 1 (L1 ) the first-level disaggregation, so on and so forth until level b (Lb ) the most disaggregated time series, or bottom-level time series. One may then assign as subscript a letter to index a particular series at L1 . And for each additional level ranging from L2 to Lb , an additional letter is added to the subscript. To elaborate, Fig. 12.2 depicts a simple two-level1 hierarchy (b = 2), with two distributed PV (DPV) plants connected to a distribution node, and two inverters connected to each DPV. Following the convention of this book where x and y denote respectively forecast and observation, we may write the observed power generation of DPVA at time t as yA,t and that of DPVB as yB,t . Similarly, observed power generation from the bottom-level series can be written as yAA,t , yAB,t , yBA,t , and yBB,t . If the total observed PV power generation at the distribution node is denoted as y0,t , 1 The

convention of describing the number of levels in a hierarchy is to exclude the top level, or L0 .

Hierarchical Forecasting and Firm Power Delivery

521

all observations within the hierarchy over the time instance t can be gathered into a vector: (12.1) yt = (y0,t , yA,t , yB,t , yAA,t , yAB,t , yBA,t , yBB,t ) , where symbol “ ” denotes the matrix transpose, suggesting yt is a column vector. Distribution node

DPVA

InverterAA

DPVB

InverterAB

InverterBA

InverterBB

Figure 12.2 An illustration of a simple two-level hierarchy that contains a distribution node and two distributed PV systems, each with two inverters.

To further simplify the notation, it is possible to collect all observations at time t at level i into a vector, denoted as y i,t , e.g., y 1,t = (yA,t , yB,t ) ,

y 2,t = (yAA,t , yAB,t , yBA,t , yBB,t ) ,

(12.2)

and Eq. (12.1) is simplified to:  

yt = y0,t , y

. 1,t , y 2,t

(12.3)

More generally, when there are b levels, one writes:  

yt = y0,t , y

. 1,t , · · · , y b,t

(12.4)

Since observations, with measurement uncertainty assumed to be negligible, are aggregated consistent, we may arrive at the following relationship: yt = S y b,t , where S is known as the summing matrix, as: ⎛ 1 1 ⎜1 1 ⎜ ⎜0 0 ⎜ S=⎜ ⎜1 0 ⎜0 1 ⎜ ⎝0 0 0 0

(12.5)

which, in our specific example, is given ⎞ 1 1 0 0⎟ ⎟ 1 1⎟ ⎟ 0 0⎟ (12.6) ⎟. 0 0⎟ ⎟ 1 0⎠ 0 1

522

Solar Irradiance and Photovoltaic Power Forecasting

The idea of the summing matrix is intuitive and mathematically concise. The first row of S , when multiplied with y b,t , is equivalent to saying that the sum of all bottom-level observations gives the top-level observation, or, in our specific example, y0,t = yAA,t + yAB,t + yBA,t + yBB,t .

(12.7)

The next two rows of S give: yA,t =yAA,t + yAB,t , yB,t =yBA,t + yBB,t .

(12.8) (12.9)

Last but not least, the last few rows of the summing matrix always form an identity matrix with a dimensionality equal to the number of bottom-level series. Mathematically, multiplying an identity matrix by a column vector returns the column vector itself. The summing matrix has a dimension of Rm×mb , where mb is the number of bottom-level time series, and m = 1 + m1 + · · · + mb is the total number of time series in the hierarchy. Observations in a hierarchy are aggregate consistent by construct, but if we are to replace the y’s in Eq. (12.5) by x’s, the equation is generally invalid, unless some very special prerequisites can be satisfied. Notating the h-step-ahead base forecasts produced for various levels of the hierarchy at time t in the same fashion as observations, one has, in the case of Fig. 12.2, 

 xt+h = x0,t+h , xA,t+h , xB,t+h , xAA,t+h , xAB,t+h , xBA,t+h , xBB,t+h  

= x0,t+h , x

, x , 1,t+h 2,t+h

(12.10)

and more generally when there are b levels, 



xt+h = x0,t+h , x

, · · · , x . 1,t+h b,t+h

(12.11)

Mathematically, aggregation inconsistency reads: xt+h = S xb,t+h .

(12.12)

Since aggregation consistency is a profoundly advantageous characteristic of hierarchical forecasting over any other forecasting framework, one should wish to have a certain set of bottom-level reconciled base forecasts = x b,t+h that satisfies = xt+h = S= x b,t+h ,

(12.13)

where = xt+h is the vector holding the aggregate-consistent forecasts for the entire hierarchy. On this point, Athanasopoulos et al. (2009) noted that all existing hierarchical methods, insofar as linearity in aggregation can be assumed, can be written into the general form: = xt+h = S P x t+h , (12.14)

Hierarchical Forecasting and Firm Power Delivery

523

where P ∈ Rmb ×m is to be appropriately chosen, and differs among different hierarchical forecasting approaches. Functionality-wise, matrix P extracts and combines various elements of the base forecast vector xt+h , which are then summed by S to give the aggregate-consistent (or reconciled) forecasts; P may be referred to as the reconciliation matrix. Dimension-wise, P xt+h is a length-mb column vector, which holds the post-processed (i.e., reconciled) version of the bottom-level forecasts, that is, = x b,t+h . In the case of bottom-up hierarchical forecasts, P takes the form:   (12.15) P BU = 0 mb ×(m−mb ) |II mb , where 0 mb ×(m−mb ) is a matrix of zeros with dimension as indicated by the subscript and I mb is an identity matrix of size mb . In the case of top-down hierarchical forecasts, P takes the form:   (12.16) P TD = P mb ×1 |00mb ×(m−1) , where P mb ×1 = (g1 , g2 , · · · , gmb ) is a column vector of known proportions subject mb gi = 1, which describes how the top-level forecast is disaggregated into the to ∑i=1 forecast of each bottom-level series. Both the bottom-up and top-down approaches do not generate complete xt+h , i.e., not every series in the hierarchy is forecast. Since the aggregation, as in the case of bottom-up, and the allocation, as in the case of top-down, automatically make the hierarchical forecasts aggregate consistent, both approaches do not revise forecasts. This can be seen mathematically, in that, BU = x BU xt+h = x BU b,t+h = P b,t+h ,  TD = x TD b,t+h = P x t+h = g1 x0,t+h

12.2

g2 x0,t+h

· · · gmb x0,t+h



(12.17) = x TD b,t+h .

(12.18)

OPTIMAL FORECAST RECONCILIATION USING REGRESSION

The trivial bottom-up and top-down approaches are not conceptually attractive, because neither option explores the hierarchy in full. Instead, it is of interest to have a statistical method to estimate P , such that reconciliation can be conducted with a notion of optimality for any set of base forecasts xt+h . To that end, Hyndman et al. (2011) used a regression to describe the relationship between the base forecasts for all series xt+h and the statistical mean of the bottom-level series. This ingenious approach has been the basis of a large number of works on hierarchical forecasting, and terminology-wise, the regression-based approach is often referred to as the optimal reconciliation, where the word “optimal” suggests that the variance of the forecast error is minimized—this idea is to be revisited several times in this section. The regression goes as follows. The vector of random variables denoting the base X t+h ), of which xt+h is a particular realization, can be written as: forecasts (X

where

X t+h = S β t+h + ε h ,

(12.19)

  β t+h = E Y b,t+h |yy1 , · · · , yt

(12.20)

524

Solar Irradiance and Photovoltaic Power Forecasting

is the unknown mean of the bottom-level series2 at time t + h given all observations up to and including t, and ε h is the h-step-ahead reconciliation error or coherency error, which is independent of observations y 1 , . . . , yt , with zero mean and variance– covariance matrix V(εε h ) = Σ h . In Eq. (12.19), the predictand is the base forecasts X t+h , the predictors are the summing matrix S , and the regression coefficients are β t+h ; this setting should not be confused with the usual regression which we use during solar forecasting, which is carried out between forecasts and observations. Before we examine various approaches to estimate β t+h , the statistical property of unbiasedness is first discussed. 12.2.1

CONDITION FOR UNBIASED RECONCILED FORECASTS

In forecast reconciliation, there are two kinds of forecast error, one is the base forecast error, et+h = yt+h − xt+h , (12.21) and the other is the reconciled forecast error, = xt+h . et+h = yt+h − =

(12.22)

It is noted that the reconciled forecast error = et+h is conceptually different from and thus ought not to be confused with the reconciliation error ε h in Eq. (12.19); the initial proposition on optimal reconciliation by Hyndman et al. (2011) confused = et+h with ε h , but the problem was subsequently corrected by Wickramasuriya et al. (2019). First, it is straightforward to see from Eq. (12.21) that if the expectation of the base forecast error is zero, the base forecasts are unbiased—recall that an unbiased forecast has an expectation equal to the expectation of the observation. Mathematically, by taking expectation on both sides of Eq. (12.21), one has:   Y t+h ) = S E Y b,t+h , X t+h ) = E (Y (12.23) E (X where the second part of Eq. (12.23) follows from the fact that the observations are aggregate consistent. Next, from the regression in Eq. (12.19) we know: X t+h ) = S β t+h . E (X

(12.24)

Lastly, by taking expectation on the general form of reconciliation, that is, Eq. (12.14), one obtains:     = t+h = S P E (X X t+h ) = S P S β t+h = S P S E Y b,t+h . (12.25) E X Therefore, the reconciled forecasts are unbiased if and only if the condition SPS = S

(12.26)

can be satisfied—this result was first noted by Hyndman et al. (2011). 2 Here, it should be clarified that the values of bottom-level series at t + h have yet to be observed at the forecast-issuing time, and thus should be regarded as random variables, over which the expectation can be taken. Therefore, they are denoted using the capital letter Y b,t+h , as to be consistent with the notation of this book.

Hierarchical Forecasting and Firm Power Delivery

12.2.2

525

GENERALIZED LEAST SQUARES RECONCILIATION

Given the regression setting of the forecast reconciliation problem, the techniques for estimating β t+h have long been available. If Σ h , which is the variance of the h-stepahead reconciliation error ε h , is known, the regression coefficients can be estimated via generalized least squares (GLS), that is:  −1 GLS β t+h = S Σ †h S S Σ †h xt+h , (12.27) where Σ †h is the Moore–Penrose generalized inverse of Σ h . The GLS estimator is the minimum variance unbiased estimator (MVUE), which gives lower variance than any other unbiased estimator for all possible values of the parameter; with all other things being equal, MVUE should naturally be pursued, which gives GLS reconciliation an a priori preference. With the GLS approach, the reconciled forecasts are:  −1 GLS GLS = = S β t+h = S S Σ †h S S Σ †h xt+h . (12.28) xt+h Comparing the above equation with Eq. (12.14), one can immediately see that the reconciliation matrix P in the GLS setting takes the form:  −1 P GLS = S Σ †h S S Σ †h . (12.29) The difficulty in using the GLS reconciliation is that Σ h is not known and not identifiable (Wickramasuriya et al., 2019). In other words, it is not possible, in general, to estimate Σ h . Hence, alternative estimators for β t+h , or equivalently, for P , must be sought. 12.2.3

MINIMUM TRACE RECONCILIATION

Combining Eqs. (12.21) and (12.22), the reconciled forecast error can be linked to the base forecast error: = et+h = et+h + xt+h − = xt+h . (12.30) Furthermore, from Eq. (12.14), the above equation can be written as: = et+h = et+h + xt+h − S P xt+h = et+h + (II − S P )xxt+h = et+h + (II − S P )(yyt+h − et+h ) = S P et+h + (II − S P )yyt+h .

(12.31)

Recalling yt+h = S yb,t+h , the second term on the right-hand side of Eq. (12.31) becomes (II − S P)SS yb,t+h . This implies that if S P S = S , (II − S P)SS = S − S P S = 0, such that this term can be dropped and =et+h = S P et+h . Consequently, the variance– covariance matrix of the h-step-ahead reconciled forecast error is: = h = S PW h P S , V (= et+h ) ≡ W

(12.32)

526

Solar Irradiance and Photovoltaic Power Forecasting

where W h is the variance–covariance matrix of the h-step-ahead base forecast error, i.e., V (eet+h ) ≡ W h —this result was given as Lemma 1 of Wickramasuriya et al. (2019). The expression in Eq. (12.32) has far-reaching implications, since it allows one to perform minimization and to attain optimality in terms of the variances of the hstep-ahead reconciled forecast errors. More specifically, because the diagonal entries = h correspond to the variances of reconciled forecast errors of individual series of W = h , which is the sum of diin the hierarchy, one wishes to minimize the trace of W agonal entries—this method is thus coined “minimum-trace” (MinT) reconciliation. Then, based on this ideology, the MinT reconciliation theorem, which is perhaps the most significant result of modern hierarchical forecasting, was formally proposed by Wickramasuriya et al. (2019), although the conception can be traced back to a much earlier date. The MinT reconciliation theorem states that if W h is positive definite, the optimal = h subject to S P S = S , is given reconciliation matrix, which minimizes the trace of W by:  −1 S W −1 (12.33) P MinT = S W −1 h S h . The reader is referred to Wickramasuriya et al. (2019) for the mathematical proof of the theorem. Comparing Eq. (12.33) to (12.29), one notices that the distinction is only in the variance–covariance matrix. With P MinT , the reconciled forecasts, given any set of base forecasts xt+h , are:  −1 MinT = xt+h = S S W −1 S S W −1 (12.34) h h x t+h . An exceedingly significant corollary of this trace-minimization approach is that the MinT reconciled forecasts are at least as good as the incoherent base forecasts (Wickramasuriya et al., 2019), i.e., yt+h − xt+h ) ≥ (yyt+h − = yt+h − = xt+h ) W −1 xt+h ) (yyt+h − xt+h ) W −1 h (y h (y

(12.35)

which guarantees the confidence in performing MinT reconciliation. 12.2.4

FURTHER SIMPLIFICATIONS TO THE MINT RECONCILIATION

Performing MinT reconciliation requires the estimation of W h . This task is challenging for h > 1, because of the error propagation typically associated with the h-step-ahead extrapolative forecasting. In other words, because the error at horizon h depends on that at horizon h − 1, which further depends on the error at horizon h − 2, so on and so forth, deriving an expression for W h is not always possible. Therefore, five different simplified estimators for W h were consolidated/proposed by Wickramasuriya et al. (2019) and reiterated by Athanasopoulos et al. (2020). • If W h = I m , where I m is an identity matrix of size m, which is the total number of series in the hierarchy, the reconciliation matrix can be reduced to:  −1 P OLS = S S S . (12.36)

Hierarchical Forecasting and Firm Power Delivery

527

This expression is evidently an ordinary least squares (OLS) solution, which was used by Athanasopoulos et al. (2009) in their early works. A major pitfall of this simplification is that when the bottom-level series are of noncomparable scales (e.g., PV plants with different installed capacities), reconciliation via OLS usually delivers very poor forecasts. The way to overcome this is to consider the weighted least squares (WLS), which is able to effectively scale the series. 1, D , where W 1, D is a diagonal matrix containing the in-sample es• If W h = W timates of the variances of the 1-step-ahead forecast errors of each series, the reconciliation matrix becomes:  −1 −1 −1 P WLS = S W S W (12.37) 1, D S 1, D . 1, D only concentrates on the diagonal with all off-diagonal entries Since W being zero, this approach is a WLS solution.   • If W h = diag S 1 mb = Λ , where 1 mb is a length-mb vector of ones, the reconciliation matrix is:  −1 P HLS = S Λ −1 S S Λ −1 . (12.38) Mathematically, the product of S and 1 mb gives a vector of integers that describes the number of series that aggregate to a particular node, e.g., in the case of the hierarchy in Fig. 12.2, Λ = diag(4, 2, 2, 1, 1, 1, 1). This variant of reconciliation originally proposed by Athanasopoulos et al. (2017) is evidently an alternative form of WLS. Since the computation of Λ depends solely on the hierarchical structure, Yang et al. (2017b,c) called it “hierarchical least squares” (HLS). Although HLS is a form of WLS, because it assigns weights according to the number of series rather than the scale of errors, it shares the same drawback with OLS, as such it does not perform well with bottom-level series that are of diverse scales. 1 , where • If W h = W

t 1 = 1 ∑ eie

W t i=1 i

(12.39)

is the in-sample estimate of the variance–covariance matrix of the 1-step-ahead base forecast errors, which leads to:  −1 S S W −1 (12.40) P MinT-sample = S W −1 1 1 . 1 is easy to compute, it may not be a good estimate when the Although W number of series in the hierarchy is larger or comparable to the number of temporal samples, i.e., when m  t (Wickramasuriya et al., 2019). For clarity, this method is the closest option to the original MinT reconciliation, and therefore is referred to as MinT-sample henceforth. The m  t situation can be mitigated by a shrinkage estimator, which is introduced next.

528

Solar Irradiance and Photovoltaic Power Forecasting ∗

1, D , where • If W h = W ∗1, D = λDW 1, D + (1 − λD )W 1, W

(12.41)

1, D as appeared in WLS reconciliation and W 1 in MinT-sample reconwith W ciliation. Parameter λD controls the strength of shrinkage. It is obvious from ∗1, D = W 1 and has no shrinkage, and when Eq. (12.41) that when λD = 0, W ∗ 1, D = W 1, D and has no off-diagonal terms. With arbitrary λD , the λD = 1, W off-diagonal terms are shrunk towards zero where the diagonal terms remain unchanged. The estimation of λD , as recommended by Wickramasuriya et al. (2019), can adopt the method proposed by Sch¨afer and Strimmer (2005). And finally, referring to the method as MinT-shrink, the reconciliation matrix is:  −1 −1 −1 ∗1, D . ∗1, D S S W (12.42) P MinT-shrink = S W

12.3

PROBABILISTIC FORECAST RECONCILIATION

Given the fact that deterministic forecasts provide no information about the uncertainty associated with the forecasts, one would naturally want to extend the hierarchical forecasting framework into the probability space. However, defining “aggregation consistency” is much more challenging in a probabilistic sense. Therefore, in the first part of this section, two equivalent definitions of probabilistic coherency, one by Ben Taieb et al. (2021) and the other by Athanasopoulos et al. (2020), are introduced. Then, two probabilistic reconciliation frameworks, Gaussian and nonparametric, are elaborated. 12.3.1

TWO EQUIVALENT DEFINITIONS OF PROBABILISTIC COHERENCY

The first attempt made in regard to the definition of probabilistic coherency should be attributed to Ben Taieb et al. (2021),3 who defined probabilistic coherency through equality in distribution,4 in that, predictive distribution of each aggregate series is equal to the convolution of the predictive distributions of the corresponding disaggregate series. Recall the summing matrix S has a dimension of Rm×mb , which can be horizontally divided into two parts: The top part has a dimension of R(m−mb )×mb , which is responsible for summing of lower-level series into higher-level ones, and the bottom part is I mb , which is a size-mb identity matrix. Let i = 1, 2, . . . , m − mb index the non-bottom-level series in the hierarchy, Ben Taieb et al. (2021) defined probabilistic coherency as d (12.43) Xi,t+h = s i∗ X b,t+h , 3 The idea was first made available in 2017 if not earlier, but the paper was only finalized in 2021, which is due to the excessive delay in the publication process, which is fairly common in the field of statistics; this fact has been acknowledged by Jeon et al. (2019); Athanasopoulos et al. (2020), whose publications appeared online at earlier dates than that of Ben Taieb et al. (2021). 4 Two random variables X and Y are said to be equivalent, or equal in law, or equal in distribution, if and only if they have the same distribution function.

Hierarchical Forecasting and Firm Power Delivery

529

d

where s i∗ denotes the ith row of S , and = denotes equality in distribution. Since X b,t+h is a vector of random variables, Eq. (12.43) involves the sum of these random variables, and the random variable resulting from the summation should have the same distribution as Xi,t+h . x0,t xBB,t,t 4

s ∗2

2

S

2 1 s ∗1 1 2

xA,t

Figure 12.3 Visualization of a coherent subspace S in a three-dimensional hierarchy where x0,t = xA,t + xB,t . The dots lying on S can be either observations or coherent forecasts.

The definition of Athanasopoulos et al. (2020) requires the concept of coherent subspace, which is the mb -dimensional linear subspace S ⊂ Rm for which some linear constraints hold for all x ∈ S. Figure 12.3 elaborates this definition with the simplest possible hierarchy, i.e., a single-level hierarchy with two bottom-level nodes such that x0,t = xA,t + xB,t . It follows that mb = 2, m = 3, and the summing matrix is ⎛ ⎞ 1 1   (12.44) S = s ∗1 s ∗2 = ⎝1 0⎠ , 0 1 where s ∗ j is the jth column of S . From the figure, it can be seen that the coherent subspace S is spanned by the two column vectors s ∗1 = (1, 1, 0) and s ∗2 = (1, 0, 1) . More importantly, a set of base forecasts {x0,t , xA,t , xB,t } can be represented by a point, which can lie anywhere in the three-dimensional space. However, a set of reconciled forecasts {= x0,t , x=A,t , x=B,t } or a set of observations {y0,t , yA,t , yB,t }, which can also be represented by a point, can only lie in the two-dimensional subspace S; Fig. 12.3 shows numerous instances of reconciled forecasts or observations. With such a preliminary, Athanasopoulos et al. (2020) provides the following definition for probabilistic coherency: Let A be a subset of S and let B be all points in Rm that can be mapped onto A after the premultiplication by S P , and let ν be a probability function (i.e., a base probabilistic forecast) for the full hierarchy, the coherent measure ν= reconciles ν if ν=(A ) = ν(B) for all A . The two definitions are equivalent (Athanasopoulos et al., 2020). Procedurally, these definitions allow reconciled probabilistic forecasts to be produced from base

530

Solar Irradiance and Photovoltaic Power Forecasting

forecasts that are either deterministic or probabilistic. For cases in which special distributional assumptions can be made, reconciling probabilistic forecasts is straightforward. However, away from these special cases, producing reconciled probabilistic forecasts from base probabilistic forecasts is more tedious than from base deterministic forecasts, as evidenced by the proposal of Ben Taieb et al. (2021), which involves Monte Carlo sampling and copula modeling. In what follows, we first show probabilistic forecast reconciliation in the Gaussian framework, and then a nonparametric framework that proceeds from base deterministic forecasts. Both the Gaussian and nonparametric frameworks have been applied in a PV power forecasting context (Yagli et al., 2020c; Yang, 2020f). 12.3.2

PROBABILISTIC FORECAST RECONCILIATION IN THE GAUSSIAN FRAMEWORK

If the base forecasts can be characterized by an elliptical distribution, or any distribution that can be completely specified by its first two moments, such as Gaussian, the reconciled probabilistic forecasts are readily available if the distributional parameters for the base forecasts are known or can be estimated (Athanasopoulos et al., 2020). More particularly, suppose the base forecasts are normally distributed, that is, W h) , X t+h ∼ N (xxt+h ,W

(12.45)

then the reconciled forecasts have the distribution:   =h , = t+h ∼ N = xt+h , W X

(12.46)

where the reconciled mean and variance can be obtained per Eqs. (12.14) and (12.32), respectively. To generate base forecasts that adhere to this Gaussian criterion, there are several options, as outlined by Athanasopoulos et al. (2020). The first option is to fit a collection of individual-series models, i.e., each model forecasts only a single series of the hierarchy, with a standard forecasting method that provides the estimation of standard error. In this way, W h is diagonal, and different series in the hierarchy have no interactions. The well-known autoregressive integrated moving average (ARIMA) and exponential smoothing (ETS) families of univariate forecasting models are appropriate in this regard. The second option is to build multivariate statistical models, on a level-by-level basis, such that the covariance matrix is block diagonal—only series at the same level of the hierarchy would receive cross-covariances. The third option uses in-sample estimates of W h , which is similar to the case of MinT-sample. Comparing these three options, the first two may not be appropriate for solar forecasting, since they are of purely statistical treatment, which has been repeatedly emphasized throughout this book as being insufficient to capture the weather dynamics. The third option is empirical, and thus can be used with physics-based forecasting. Notwithstanding, the overarching normality assumption within the Gaussian reconciliation framework may be in itself problematic, as the predictive distribution of solar power is likely to be skewed or may not even be unimodal. Empirically, it has

Hierarchical Forecasting and Firm Power Delivery

531

also been demonstrated that the Gaussian framework is suboptimal in solar forecasting as compared to the nonparametric one: In the pair of papers by Yagli et al. (2020c) and Yang (2020f), who applied Gaussian and nonparametric reconciliation methods to the same dataset, which contains 318 simulated PV systems in California, over a one-year period in 2006, it was found that the CRPS skill score of the former is generally lower than that of the latter. 12.3.3

PROBABILISTIC FORECAST RECONCILIATION IN THE NONPARAMETRIC FRAMEWORK

The fundamental idea of nonparametric statistics is to leverage data to infer unknown quantities while making as few assumptions as possible. So long as one needs to compute standard errors and confidence intervals, bootstrapping is perhaps the most popular nonparametric method (Wasserman, 2006); it is hence used for nonparametric probabilistic forecast reconciliation. We should outline the block bootstrapping method proposed by Athanasopoulos et al. (2020). The word “block” suggests that the errors for different forecast horizons are not sampled individually and independently, but as a block (i.e., a time series of errors), so as to preserve the heteroscedasticity in errors of PV power forecasts. The block bootstrapping method is to be separated into seven steps as follows: 1. Generate h-step-ahead (deterministic) base forecasts for every series in the hierarchy, using training data samples up to and including t. For each horizon h ∈ {1, · · · , H}, where H is the maximum forecast horizon of interest, the forecasts can be consolidated into a length-m column vector xt+h . 2. Frame the H number of length-m vectors into a matrix, i.e.,   X ≡ xt+1 · · · xt+H , ∈ Rm×H .

(12.47)

3. Compute 1-step-ahead in-sample base forecast errors and collect them into an m × t matrix, that is,   E ≡ e 1 · · · et , ∈ Rm×t , (12.48) where e i = y i − x i , with i ∈ {1, · · · ,t}, is the length-m column vector of errors computed between the ith historical observations and base forecasts. 4. Block bootstrap from E, which is to choose a block of m × H sub-matrix from E. This procedure is repeated a total of J times, and the m × H matrix at the jth repetition is denoted by E ( j) , where j ∈ {1, · · · , J}. 5. For each j ∈ {1, · · · , J}, compute X ( j) ≡ X + E ( j) , ∈ Rm×H .

(12.49)

In this way, each row of X ( j) is a bootstrapped path of H forecasts for the series in that row. For instance, the first row of X ( j) contains the length-H bootstrapped forecasts for the top-level series.

532

Solar Irradiance and Photovoltaic Power Forecasting

6. For each X ( j) , extract the hth column, and consolidate these columns into a new m × J matrix, denoted as Ft+h . Since X ( j) contains H columns, there would be a total of H such new matrices, namely, Ft+1 , . . . , Ft+H . 7. For each h ∈ {1, · · · , H}, and with a known P , which is computed using a method of choice outlined in Section 12.2.4, reconcile the forecasts using Eq. (12.14), except that the reconciliation is performed with matrix Ft+h , that is, =t+h = S P Ft+h , ∈ Rm×J . F (12.50) =t+h is a particular realization from It should be noted that each column of F the joint predictive distribution of all the series in the hierarchy, and each row =t+h contains J bootstrapped forecasts representing the marginal predictive of F distribution corresponding to that row. In short, this method applies deterministic forecast reconciliation to samples drawn from the incoherent base predictive distributions, which, in turn, results in samples from the coherent reconciled predictive distributions. This concludes the methodological discussion on forecast reconciliation. In what follows, a case study that depicts both deterministic and probabilistic forecast reconciliation is presented.

12.4

A CASE STUDY ON FORECAST RECONCILIATION

Owing to the lack of actual areal PV generation data hitherto, which hampers solar forecasting research at large, it is customary to use simulated data for large-scale solar integration studies. In this part of the book, we consider the Solar Power Data for Integration Studies (SPDIS),5 which is produced as part of the deliverable of the Western Wind and Solar Integration Study (GE Energy, 2010; Lew et al., 2013; Miller et al., 2014) and Eastern Renewable Generation Integration Study (Bloom et al., 2016), which are two sister projects led by the National Renewable Energy Laboratory (NREL) with the participation of various subcontractors. The aim of the pair of projects was to explore the operational impact of high penetration of wind and solar into the power system operated by the West Interconnection and East Interconnection, which are two groups of transmission providers that are working collaboratively in the western and eastern United States, respectively. In this regard, the SPDIS dataset seems particularly amenable for hierarchical forecasting research, which requires a multitude of PV systems of differing installation capacities and designs, over wide geographical areas. The SPDIS dataset consists of 1 year (2006) of 5-min-resolution PV power data and 1-h-resolution day-ahead forecasts for approximately 6000 simulated PV plants in all states except Alaska, North Dakota, and Hawaii (Feng et al., 2019). The simulation of the PV power data was handled by NREL, based upon hourly satellitederived irradiance and a sub-hour irradiance downscaling algorithm, of which the details can be found in Hummon et al. (2012). On the other hand, the production 5 https://www.nrel.gov/grid/solar-power-data.html

Hierarchical Forecasting and Firm Power Delivery

533

of the NWP-based day-ahead forecasts was performed by 3TIER (2010), a subcontractor for phase 1 of the Western Wind and Solar Integration Study. (The day-ahead forecasts for the PV systems in the Eastern Renewable Generation Integration Study were produced by NREL using the Weather Research and Forecasting model; but this does not concern the current case study, as only a subset of the western SPDIS dataset is used, see below.) It should be noted that the SPDIS dataset is only valid for the specific year of 2006, and it is unable to represent the long-term irradiance conditions of a site, and thus should not be used for resource assessment purposes. The typology of PV systems in the SPDIS dataset can be divided based on two groupings. Depending on the power system structure, there are utility-scale PV plants, which were located in areas with high solar resources and were assumed to be connected to the nearest transmission buses, and distributed PV plants, which were located close to population centers. Depending on the mounting structure, the PV systems can be differentiated by those with fixed-tilt mounting and the rest with single-axis tracking. Since the transients of PV power from fixed-tilt or tracking systems are different, it is straightforward to distinguish the systems by employing a binary classification using time series features, as demonstrated by Yang et al. (2017b). The SPDIS features a total of 405 plants in the state of California, among which 318 are fixed-tilt plants while the others are single-axis tracking plants. Since the system type does not affect reconciliation, all 405 plants are used in the case study below. Figure 12.4 (a) shows the locations and sizes of these 405 systems. Performing hierarchical forecasting in a power system context requires certain geographical information of the grid to be known, such that the hierarchy can be constructed according to such information. One option is to consider the balancing authorities within the area of interest. In California, there are eight balancing authorities,6 however, besides the California Independent System Operator (CAISO), all other balancing authorities only have control over small geographical areas, see Fig. 12.4 (b). On this point, assigning PV plants according to the balancing authority area seems unattractive, for most PV plants would fall within CAISO, whereas other balancing authorities would contain just a few plants, making the hierarchy highly imbalanced in terms of structure. An alternative strategy for constructing a hierarchy is to consider the grid topology, which is shown in Fig. 12.4 (c). All transmission lines within California, with an AC voltage equal to or higher than 115 kV, are shown, alongside 36 high-voltage substations with AC voltages of either 345 kV or 500 kV. Comparing Fig. 12.4 (a) to (c), it can be seen that the spatial distribution of PV systems aligns well with the grid topology. In principle, when the full grid topology is known, one can construct a multiple-level hierarchy, such as the one shown in Fig. 12.1, in that, distributed PV plants are assigned to lower-voltage substations, whereas utility-scale PV plants and lower-voltage substations are assigned to high-voltage substations. Notwithstanding, to facilitate understanding, we employ in what follows a simple two-level hierarchy, in which the levels corresponding to lower-voltage substations are omitted, i.e., 6 https://cecgis-caenergy.opendata.arcgis.com/datasets/ california-electric-balancing-authority-areas/

Solar Irradiance and Photovoltaic Power Forecasting

(b) 42

(a) 42

Latitude [°]

40 38

Sys. size [MW] 50 100 150 200

36

40 Latitude [°]

534

38 36

34

34

32

32 −124 −122 −120 −118 −116 −114 Longitude [°]

(c) 42

−124 −122 −120 −118 −116 −114 Longitude [°] (d) 42

23 25

38

Trans. line [kV] 500 115~345

36

40 Latitude [°]

Latitude [°]

40

38 36

34

34

32

32 −124 −122 −120 −118 −116 −114 Longitude [°]

7 2 29 27 24 31 14 32 1 4 26 30 3 11 18 6 33 19 10 13 28 20 34 12 9 21 17 16 8 22 5 15 −124 −122 −120 −118 −116 −114 Longitude [°]

Figure 12.4 (a) Locations and sizes of the 405 simulated PV plants of varying installed capacities in California. (b) The California Independent System Operator territory is shown in gray, whereas the white patches and holes are handled by other smaller balancing authorities. (c) Transmission network (≥115 kV) in California, with high-voltage substations (345 or 500 kV) marked with circles. (d) PV system assigned (through nearest neighbor) to the 34 highvoltage substations in California.

distributed and utility-scale PV systems are all assigned to the nearest high-voltage substations, see Fig. 12.4 (d). This assignment leads to 34 L1 series (i.e., 34 highvoltage substations),7 and 405 L2 or Lb series (i.e., PV plants), with which both deterministic and probabilistic forecast reconciliation frameworks are demonstrated. 7 Two

substations have no PV system nearest to them.

Hierarchical Forecasting and Firm Power Delivery

12.4.1

535

DETERMINISTIC FORECAST RECONCILIATION

Following the mathematical notation introduced earlier in this chapter, the hierarchy has m = 440, mb = 405, and S ∈ R440×405 . As the first step, the 5-min simulated PV power data are aggregated into hourly values—for clarity, these hourly time series are referred to as “measurements” or “observations” hereafter. In the second step, the bottom-level measurements are aggregated with respect to the hierarchical structure into L0 and L1 measurements. The nodes at L1 consist of varying numbers of L2 series, of which the exact numbers can be found in the second column of Table 12.1 below. From the above two steps, we obtain y i , with i = 1, . . . , 8760 indexing for the hours in the year 2006. Next, the base forecasts x i are prepared. Since the bottom-level base forecasts were already made available by 3Tier, we only need to generate base forecasts for L0 and L1 . In this regard, the ARIMA family of models is applied to each series at L0 and L1 , in a day-by-day rolling fashion. However, it should be noted that due to the diurnal transient in the PV power time series, it is customary to deseasonalize the data prior to forecasting. Here, owing to the lack of detailed plant information, the Fourier-based clear-sky identification method of Huang and Perry (2016) is used, and the forecasting method is referred to as ARIMA with Fourier terms (AFT) hereafter. The rolling forecasts are produced as follows. For each series of concern, an AFT model is fitted using the first 7 days of data, namely, January 1–7, and 24-h-ahead forecasts for January 8 are generated using the fitted model. Once the forecasts are made, we refit another AFT model using the measurements over January 2–8, and produce forecasts for January 9, and so on and so forth, until the forecasts for December 31 are made. Although AFT is a time series method, due to the Fourier terms employed, it is able to handle the diurnal cycles in substation-level PV power somewhat satisfactorily. With both x i and y i , four reconciliation techniques as introduced in Section 12.2.4, namely, OLS, WLS, HLS, and MinT-shrink, are applied. Additionally, the bottom-up (BU) approach is used as a benchmark. Recall Eqs. (12.36) and (12.38), OLS and HLS reconciliations do not require training, which implies the reconciled forecasts can be generated for the entire 2006 (except for January 1–7, during which L0 and L1 base forecasts are not available). In contrast, WLS and 1, D and W ∗1, D , which have to be estimated. MinT-shrink require respectively W On this point, data points from the first half of the year are used for training, and the remaining ones are kept for verification. It is worth mentioning that although 1, D and W ∗1, D represent 1-step-ahead base forecast error variances, in our case, W due to the lack of samples, they are simply calculated using all in-sample errors— considering nighttime gaps, which halves the training sample size, the matrix holding 1, D and W ∗1, D , the all in-sample errors has a size of 2339 × 440. With the estimated W reconciliation matrices P WLS and P MinT-shrink can be obtained via Eqs. (12.37) and (12.42). Finally, by substituting each of P BU , P OLS , P WLS , P HLS , and P MinT-shrink into Eq. (12.14), various versions of reconciled forecasts = x i can be obtained, for all i in the

536

Solar Irradiance and Photovoltaic Power Forecasting ∗

1, D and W 1, D are used for all verification validation set. In other words, the same W samples.

Table 12.1 Normalized root mean square error [%] of L1 series (i.e., substation level), using ARIMA with Fourier terms (AFT) and five different reconciliation techniques, namely, BU, OLS, WLS, HLS, and MinT-shrink. Row-wise best results are in bold. The installed capacity (Cap.) is in MW. Substn. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Overall

# series

Cap.

AFT

BU

OLS

WLS

HLS

MinT-shrink

7 3 2 8 12 13 1 6 18 1 24 12 14 3 30 22 24 5 14 3 10 30 1 12 3 20 14 10 16 6 4 27 12 18

171 136 225 759 965 813 7 630 1866 121 1429 650 966 68 2564 1290 1005 652 1928 306 482 451 11 224 112 1126 332 465 378 36 64 531 169 1571

28.5 32.0 28.2 24.3 11.9 23.4 34.4 24.8 23.4 27.6 18.6 23.7 18.3 28.8 25.8 16.1 24.0 18.1 19.1 26.4 26.3 26.8 39.6 35.7 37.9 18.2 30.0 25.5 29.2 37.0 30.3 31.0 25.7 13.7

37.0 38.3 33.7 23.1 17.6 21.9 37.0 26.2 24.4 28.0 21.1 23.4 20.5 38.0 29.5 19.5 25.2 20.1 21.0 27.8 26.7 30.9 47.8 49.0 40.9 21.0 44.9 25.9 37.6 47.4 43.9 38.4 30.4 19.2

30.0 30.8 27.1 23.3 11.9 23.0 125.7 23.9 23.0 25.9 18.6 23.5 18.1 35.5 25.4 16.0 23.7 17.3 18.5 23.7 25.7 27.0 74.5 37.2 38.0 18.1 30.9 25.0 29.6 63.6 41.7 31.5 29.2 13.5

28.3 29.9 26.8 20.6 15.4 20.1 30.6 23.5 23.0 24.1 20.2 20.9 18.8 27.3 27.9 17.2 23.0 17.1 19.2 22.7 24.4 28.7 36.7 42.0 34.0 19.8 37.0 23.9 32.1 40.9 35.2 35.4 24.6 17.5

27.0 29.3 26.4 19.5 12.1 19.3 40.3 21.8 20.2 24.2 17.3 20.3 16.4 28.3 22.8 14.8 19.8 16.0 16.6 22.1 22.1 25.9 38.9 36.8 33.9 16.5 33.0 21.8 27.5 51.3 35.1 30.0 28.6 13.1

24.0 26.1 25.2 20.1 12.5 20.4 30.3 22.0 20.4 24.8 16.8 19.9 16.2 24.6 23.4 14.5 20.0 16.2 16.5 21.8 22.1 24.5 36.6 34.2 33.6 16.3 23.9 21.3 24.0 33.9 25.8 27.1 22.7 14.1

405

22503

26.0

30.5

30.9

26.1

25.0

22.8

Hierarchical Forecasting and Firm Power Delivery

537

To evaluate the performance of various reconciliation techniques, normalized root mean square errors (nRMSEs) of the reconciled forecasts, for all 34 L1 series, are computed and tabulated in Table 12.1—as usual, only forecast–observation pairs during daylight hours are involved during the error calculation. Table 12.1 reveals that MinT-shrink attained the best overall nRMSE amongst its peers, with HLS being the closest competitor. The bottom-up approach performs worse than AFT at L1 , which is interesting in the sense that the more granular bottom-level information by itself does not contribute much to accuracy. This may be due to the fact that the bottom-level series are highly variable, but after geographical smoothing, the aggregated series become easier to forecast. The present finding has a profound implication on how system operators should make operational forecasts. As the current grid code often mandates plant-level forecast submission, system operators should not simply sum up the submitted forecasts without performing their own substationlevel forecasting. Through validating the L1 series, forecast reconciliation has been shown to be beneficial for system operators. Do plant owners also benefit from it? This question is answered by validating the L2 series. Unlike the L1 series, which are few in number, bottom-level series are more numerous, and tabulating the errors is not efficient. Hence, two verification strategies are devised. First, given any pair of methods, the 405 pairs of nRMSE values of L2 reconciled forecasts can be represented in the form of scatter plot; these are depicted in the lower triangular entries of Fig. 12.5. Next, to evaluate the statistical difference in forecast accuracy, the Diebold–Mariano (DM) test as introduced in Section 11.3.2.2 is employed. For any pair of methods, a total of 405 DM tests are to be conducted, and the outcome of the tests—either one method is better than the other, or the two methods have statistically comparable accuracies— can be summarized as percentages; these are depicted in the upper triangular entries of Fig. 12.5. Lastly, to visualize how nRMSEs of L2 series distribute statistically, their probability density functions are plotted in the diagonal entries of Fig. 12.5. Several observations can be made from the figure. Most evident is that all reconciliation methods except for OLS are able to improve the accuracy of the L2 base forecasts of 3Tier, confirming the benefit of hierarchical forecasting. MinT and WLS have a similar performance against 3Tier, and both compare favorably with OLS and HLS. It can also be noticed that OLS and HLS occasionally produce very large errors. This unsatisfactory performance of OLS and HLS is due to the reason mentioned in Section 12.2.4—they do not perform well if bottom-level series are on scales (i.e., installed capacities in this case) that are not comparable, which is the case for the SPDIS dataset. Among the four reconciliation methods, MinT has the highest number of winning cases, followed by HLS, WLS, and OLS in order. The poor performance of WLS against MinT or HLS is because WLS considers neither the interaction between various bottom-level series as MinT does, nor the hierarchical structure as HLS does, it is unable to fully exploit the advantages of hierarchical modeling, which implies its “strength” in adjusting the base forecast is but marginal. This case study demonstrates the performance boost of hierarchical forecasting well. Policy-wise, hierarchical forecasting not only aligns with current grid codes but also has an advisory effect on future grid code. Recall that current grid codes suggest

538

Solar Irradiance and Photovoltaic Power Forecasting

MinT

OLS is 40.2% better 44.9% worse than 3TIER

WLS is 99.8% better 0.0% worse than 3TIER

HLS is 66.7% better 23.5% worse than 3TIER

MinT is 96.5% better 0.2% worse than 3TIER

WLS is 54.3% better 28.1% worse than OLS

HLS is 75.3% better 15.8% worse than OLS

MinT is 85.7% better 3.5% worse than OLS

HLS is 51.9% better 31.1% worse than WLS

MinT is 91.6% better 1.2% worse than WLS MinT is 72.1% better 6.7% worse than HLS

HLS

0.00

HLS

WLS

0.02

WLS

OLS

0.04

OLS

3TIER

3TIER

60 40 20 60 40 20 60 40 20

MinT

60 40 20 20 40 60

20 40 60

20 40 60

20 40 60

20

40

60

Figure 12.5 (Lower triangular entries) Pairwise scatter plots of 405 L2 nRMSEs [%] of various reconciliation techniques; identity lines are shown in black. (Diagonal entries) Probability density functions of nRMSE distribution of each reconciliation technique. (Upper triangular entries) Pairwise Diebold–Mariano test results using squared error loss differential of L2 forecasts, e.g., the bottom-right entry reads: MinT is 72.1% (or 292 out of 405 locations) better than HLS, and HLS is 6.7% (or 27 out of 405 locations) better than MinT, whereas the MinT and HLS forecasts at the remaining 86 stations do not differ significantly.

a top-down–bottom-up information flow, where weather forecasts are disseminated by national weather centers and PV power forecasts are submitted by system owners. What may be further added to this process is another round of forecast dissemination; this time, it is the system operators who disseminate the reconciled plant-level forecasts to individual plant owners, who are tasked to firmly meet the generation target, perhaps through on-site energy storage. We shall revisit this concept of planned firm forecasting towards the end of this chapter.

Hierarchical Forecasting and Firm Power Delivery

12.4.2

539

PROBABILISTIC FORECAST RECONCILIATION

Given the fact that nonparametric forecast reconciliation places no distributional assumption, it is usually preferred over probabilistic reconciliation in the Gaussian framework. On this point, the probabilistic part of the case study should adopt the block bootstrapping algorithm outlined in Section 12.3.3. Reconciliation in the nonparametric proceeds from generating the base deterministic forecasts, see step 1 of the algorithm. For consistency reasons, AFT is again used as the method to produce base forecasts on L0 and L1 , whereas, for L2 base forecasts, it is again the raw 3Tier forecasts that are relied upon. After gathering the base deterministic forecasts for the entire year 2006, the same splitting ratio as in the case of deterministic forecast reconciliation is used to divide the data into two halves, one for training and the other for testing. The block bootstrapping is then performed for each day in the testing set, i.e., steps 2–7 are repeated for each day. To echo the day-ahead setting, the block size H is taken to be 24. The base forecast error matrix E in Eq. (12.48) has a dimension of 440 × 4296—the number 4296 corresponds to the 179-day period in the first half of the year (January 8 to July 5). Subsequently, in step 4 of the algorithm, a total of 99 bootstrapped paths (J = 99) are randomly sampled from E; for each forecast horizon h, the corresponding values in the 99 paths may be regarded as the 0.01, 0.02, . . . , 0.99 quantiles of the predictive distribution at horizon h. Step 5 of the algorithm is straightforward, and 99 X ( j) matrices of base probabilistic forecasts become available through Eq. (12.49). After these X ( j) matrices are rearranged =t+h are into Ft+1 , . . . , Ft+H following step 6, the reconciled probabilistic forecasts F obtained using five different reconciliation techniques—this is done by substituting each of P BU , P OLS , P WLS , P HLS , and P MinT-shrink into Eq. (12.50). Just like the deterministic case, the verification of the reconciled probabilistic forecasts also needs to be performed for both L1 and L2 , for we wish to examine the benefit of probabilistic reconciliation in the perspectives of both the grid operators and individual plant owners. Furthermore, we also wish to examine whether or not using probabilistic reconciliation can lead to better deterministic forecasting results. In this regard, in the first part of the verification exercise, deterministic forecasts on L1 are elicited from the corresponding reconciled probabilistic forecasts through the mean functional, which is consistent with nRMSE. Table 12.2 lists these L1 nRMSE values. Comparing Table 12.2 to Table 12.1, one is able to notice that the reconciled deterministic forecasts drawn from predictive distributions have a slightly different and often better performance than those obtained via deterministic forecast reconciliation. Particularly for BU reconciliation, the improvements are quite substantial (i.e., the overall nRMSE reduced from 30.5% to 28.0%), which must be wholly attributed to the better L2 forecasts due to the ensemble modeling—instead of forecasting the bottom-level just once, block bootstrapping creates an ensemble of bottom-level forecasts, which are able to reduce the bias in member forecasts. As for the best reconciliation technique, that is, MinT-shrink, the overall performance has also improved from the former case of 22.8% to the present 22.3%. In the second part of the verification exercise, deterministic forecasts on L2 are elicited from the corresponding reconciled probabilistic forecasts through

540

Solar Irradiance and Photovoltaic Power Forecasting

Table 12.2 Normalized root mean square error [%] of the L1 (substation-level) deterministic forecasts elicited from the five sets of reconciled probabilistic forecasts. Row-wise best results are in bold. The installed capacity (Cap.) is in MW. Substn. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Overall

# series

Cap.

AFT

BU

OLS

WLS

HLS

MinT-shrink

7 3 2 8 12 13 1 6 18 1 24 12 14 3 30 22 24 5 14 3 10 30 1 12 3 20 14 10 16 6 4 27 12 18

171 136 225 759 965 813 7 630 1866 121 1429 650 966 68 2564 1290 1005 652 1928 306 482 451 11 224 112 1126 332 465 378 36 64 531 169 1571

28.5 32.0 28.2 24.3 11.9 23.4 34.4 24.8 23.4 27.6 18.6 23.7 18.3 28.8 25.8 16.1 24.0 18.1 19.0 26.3 26.3 26.8 39.6 35.7 37.9 18.2 30.0 25.5 29.2 37.0 30.3 31.0 25.7 13.7

33.8 35.1 31.7 22.3 15.7 20.6 36.4 24.6 22.6 27.0 17.8 21.2 18.4 38.9 26.6 16.5 23.4 17.6 18.6 26.1 24.9 28.6 45.8 39.4 38.2 17.8 39.3 23.8 32.4 44.1 39.3 36.4 29.4 17.5

30.2 30.4 27.3 23.5 11.7 23.2 119.4 24.0 23.2 25.8 18.5 23.6 18.0 35.6 25.5 15.8 23.4 17.1 18.4 23.4 25.6 26.9 72.7 37.3 37.5 18.0 30.9 24.9 29.8 62.1 40.9 31.8 29.3 13.3

26.7 28.9 26.2 20.2 14.0 19.2 30.6 22.3 21.4 23.7 17.2 19.2 17.0 28.1 25.3 14.9 21.7 15.5 17.2 21.9 23.1 26.7 36.8 35.3 33.8 17.0 33.7 22.3 28.3 39.0 32.6 33.7 24.3 16.0

27.1 28.6 26.0 19.4 11.8 19.1 38.8 21.2 19.6 23.8 16.5 19.7 15.9 29.5 22.0 14.2 19.6 15.4 16.0 21.9 21.8 25.9 39.3 35.5 33.4 15.9 31.9 21.5 27.1 51.0 34.2 30.8 28.3 12.9

23.2 25.5 24.6 19.3 12.3 19.8 29.8 21.6 20.0 24.4 16.0 19.3 15.6 24.1 23.1 14.1 19.9 15.6 16.2 21.6 21.9 24.5 36.5 32.8 33.1 15.5 23.2 21.0 23.2 33.1 25.2 26.0 22.3 13.7

405

22503

26.0

28.0

30.6

24.5

24.6

22.3

the mean functional. Subsequently, pairwise DM tests are conducted not only among these elicited forecasts using different probabilistic reconciliation techniques but also among those forecasts directly generated using deterministic forecast

Hierarchical Forecasting and Firm Power Delivery

Method B

MinT (prob)

0.2

2.2

1

4.4

5.2

541

2

3.5

4

HLS (prob)

23.2 6.4 27.4 8.1 61.7 26.2 13.1 35.6

WLS (prob)

0.5

OLS (prob)

44.4 9.1 50.6 71.6 78.5 50.9

7.2

3.7 22.7 66.4 0.7 12.6 0

8.1

0

0

67.7

0

38.3 77.8

62

80.2 83.2

24.2 89.4 54.6 88.4

BU (prob)

2

19.3

19

40.5

83

0

MinT (det)

0.2

3.5

1.2

6.7

0

3.2

5.9

6.7 14.1 56.8

72.1 30.9

16

45.4 63.2 78.8

% wins 75 50 25

23.5 15.8 31.1

44.9

0

51.9 91.6 55.6 30.4 87.4 55.8 93.1

54.3 75.3 85.7 53.8 66.7 68.4 86.4 89.9

BU (prob)

MinT (det)

HLS (det)

40.2 99.8 66.7 96.5 91.6 OLS (det)

0

3TIER

3TIER

0

40

96.8 66.2 96.8 MinT (prob)

OLS (det)

0

HLS (prob)

28.1

WLS (prob)

0

0

OLS (prob)

WLS (det)

WLS (det)

HLS (det)

Method A

Figure 12.6 Pairwise Diebold–Mariano (DM) tests for L2 reconciled deterministic forecasts. Two sets of reconciled forecasts, namely, those directly generated with different deterministic forecast reconciliation techniques (annotated with “det”) and those elicited from predictive distributions issued by different probabilistic reconciliation techniques (annotated with “prob”), are compared.

reconciliation techniques. The results are depicted in Fig. (12.6), where each entry shows the percentage of instances when “Model A is better than Model B.” It is noted that the results represented by the 5 × 5 sub-matrix at the bottom-left corner (annotated with a solid-line box) have already been presented in the upper triangular entries of Fig. 12.5, but are reiterated here for convenience. Figure 12.6 reveals that eliciting forecasts from distributions is a generally credible strategy, as evidenced by the two sub-matrices marked by the dashed boxes. In particular, probabilistic MinTshrink is able to produce better deterministic L2 forecasts 56.8% of the time than deterministic MinT-shrink. Similarly, probabilistic HLS outperforms its deterministic dual 63.2% of the time. After evaluating the performance of reconciled deterministic forecasts, verification of reconciled probabilistic forecasts is conducted next, on both L1 and L2 . The

542

Solar Irradiance and Photovoltaic Power Forecasting

Table 12.3 Normalized continuous ranked probability score [%] of L1 (substation-level) probabilistic forecasts. Row-wise best results are in bold. The installed capacity (Cap.) is in MW. Substn. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Overall

# series

Cap.

AFT

BU

OLS

WLS

HLS

MinT-shrink

7 3 2 8 12 13 1 6 18 1 24 12 14 3 30 22 24 5 14 3 10 30 1 12 3 20 14 10 16 6 4 27 12 18

171 136 225 759 965 813 7 630 1866 121 1429 650 966 68 2564 1290 1005 652 1928 306 482 451 11 224 112 1126 332 465 378 36 64 531 169 1571

18.1 20.2 18.3 15.8 8.2 15.0 23.6 17.1 16.9 17.5 13.2 16.9 16.7 17.8 16.6 12.1 16.6 13.2 12.2 17.7 17.3 16.9 21.1 21.4 21.2 13.3 19.4 18.1 19.4 22.4 19.9 19.6 16.2 9.6

15.8 17.4 15.4 10.4 7.0 9.2 15.8 10.9 10.5 12.1 8.3 10.3 8.7 16.1 12.8 8.3 10.8 8.1 8.7 12.5 11.1 13.2 23.9 19.1 16.9 8.5 18.8 10.4 15.4 20.8 16.6 16.7 12.7 8.0

15.1 16.3 14.2 11.5 6.1 10.9 63.3 11.5 11.4 12.3 9.3 11.9 9.2 17.9 12.8 8.3 11.6 8.5 9.1 12.0 12.4 13.2 38.2 18.8 18.7 9.2 15.3 12.2 14.7 32.7 20.6 15.6 14.3 6.8

12.8 15.1 13.4 9.6 6.3 8.6 14.1 9.9 9.9 11.0 7.9 9.3 8.0 12.1 12.1 7.4 9.9 7.3 8.1 10.7 10.1 12.3 19.3 17.4 15.8 8.1 16.0 9.7 13.6 18.5 14.0 15.6 11.0 7.3

13.4 15.1 13.4 9.6 5.6 8.9 20.2 9.8 9.3 11.1 7.9 9.8 7.8 13.4 10.7 7.3 9.5 7.4 7.7 10.9 10.1 12.9 20.8 17.8 15.9 7.9 15.6 10.0 13.6 26.7 16.2 15.0 13.9 6.2

11.5 13.5 12.5 9.2 5.9 9.2 13.8 9.9 9.4 11.4 7.9 9.5 7.9 11.2 11.0 7.2 9.4 7.6 7.9 10.8 10.0 11.5 19.2 16.4 15.8 7.8 11.6 9.6 11.7 16.1 11.8 12.8 10.5 6.5

405

22503

17.0

13.0

15.5

11.5

12.1

10.8

normalized continuous ranked probability score (nCRPS) in percent, which is obtained by dividing the CRPS by the mean observations of each series, is taken as the main verification metric for L1 forecasts without loss of generality. Table 12.3

Hierarchical Forecasting and Firm Power Delivery

543

tabulates the nCRPS values of all five hierarchical forecasting methods, alongside those of the AFT benchmark. The method for estimating the predictive distributions of forecasts from the ARIMA family of models is detailed in Box et al. (2015), and standard implementation is available from the forecast package in R. With little surprise, Table 12.3 confirms that MinT-shrink is able to achieve the overall best performance, whereas WLS and HLS also demonstrate satisfactory results. One may notice the high nCRPS of AFT forecasts, which can be largely attributed to the improper assumption of the Gaussian predictive distribution made by AFT. To that end, forecast reconciliation in a probabilistic setting carries more importance than reconciliation in a deterministic setting. To verify L2 reconciled probabilistic forecasts, we again employ the style of Fig. 12.5, which gives a comprehensive view of scatter plots, probability densities, and pairwise DM test results of the forecast errors of various reconciliation methods. What is different from Fig. 12.5 is that the error of interest is now CRPS instead of RMSE. More specifically, the lower triangular region of Fig. 12.7 depicts the pairwise scatter plot of nCRPS of forecasts from various probabilistic reconciliation methods, whereas the diagonal entries show the densities of the nCRPS values. As for the DM test, since it only requires the loss differential to operate, the squared loss differential used for Fig. 12.5 is replaced with the CRPS differential, of which the outcome is tabulated in the upper triangular region of Fig. 12.7. Examining the L2 forecast verification results, one can conclude quite certainly that the MinT-shrink reconciliation compares favorably to other alternatives.

12.5

FROM HIERARCHICAL FORECASTING TO FIRM POWER DELIVERY

Thus far, we have explained and demonstrated the benefits of hierarchical forecasting—it is able to produce a set of coherent forecasts across the entire power system, which is desirable for joint decision-making. However, even if the best techniques are used throughout the solar forecasting process, which includes NWP forecast generation, post-processing of NWP forecasts, irradiance-to-power conversion, and hierarchical reconciliation, the accuracy of forecasts obtained thereof would still be far from allowing solar power to be regarded as dispatchable like conventional generators. This inevitably invites complementary technologies, such as energy storage or demand response, to participate in modern power system operation and control. The perpetual goal of power system operation is balancing generation with load, on all relevant time and spatial scales. This goal, as has been repeatedly mentioned throughout the book, concerns two groups of people in the main, namely, PV plant owners and power system operators. From the perspective of PV plant owners, they wish their forecasts to be as accurate as possible, such that the forecast penalty is minimized. Recalling the last paragraph of Section 12.4.1, the thesis of having system operators disseminate the reconciled plant-level forecasts to individual owners was advocated. If these reconciled plant-level forecasts, which can be regarded as generation targets set for individual plants, can be met with absolute certainty, the forecasts are said to be “firm,”

544

0.00

HLS

MinT

OLS is 16.8% better 73.3% worse than BU

WLS is 92.6% better 0.2% worse than BU

HLS is 40.2% better 45.7% worse than BU

MinT is 73.8% better 7.4% worse than BU

WLS is 83.5% better 7.9% worse than OLS

HLS is 85.2% better 9.9% worse than OLS

MinT is 86.4% better 5.7% worse than OLS

HLS is 19.5% better 59.0% worse than WLS

MinT is 55.8% better 15.1% worse than WLS MinT is 68.6% better 11.6% worse than HLS

HLS

0.04

WLS

WLS

0.08

OLS

OLS

BU

BU

0.12

Solar Irradiance and Photovoltaic Power Forecasting

30 20 10

30 20 10

30 20 10

MinT

30 20 10 10 20 30

10 20 30

10 20 30

10 20 30

10

20

30

Figure 12.7 (Lower triangular entries) Pairwise scatter plots of 405 L2 nCRPS [%] of various probabilistic reconciliation techniques; identity lines are shown in black. (Diagonal entries) Probability density functions of nCRPS distribution of each probabilistic reconciliation technique. (Upper triangular entries) Pairwise Diebold–Mariano test results using CRPS differential of L2 forecasts. and the name of the corresponding technology was coined by Perez et al. (2020) as firm forecasting. Under the firm forecasting framework, generation follows the forecast exactly, which mitigates entirely any forecast inaccuracy. From the perspective of power system operators, even if the solar forecasts are firm (i.e., solar power is dispatchable), they still have to attend to balancing needs, as is the case with the conventional power system. On this point, if the generation is able to meet load demand 24/365, including times of low or no resources, such as winters or nights, it is termed firm generation. Firm forecasting and firm generation are two highly similar concepts, for they both aim at delivering firm power to their recipients. What is different is that the

Hierarchical Forecasting and Firm Power Delivery

545

recipients of the firm power resulting from firm forecasting are grid operators, whereas the recipients of the firm power from firm generation are consumers on the demand side. Whereas the resemblance and distinction between the two concepts are to be made clear progressively in what follows, the overarching question that guides the discussion is this: What is the most cost-effective way of ensuring firm solar power? The works of Perez et al. (2021, 2019a, 2016a) early shed light on that. The reader is highly recommended to read the review on firm power delivery by Remund et al. (2023), who collected exhaustively and discussed all published works on firm power delivery up to November 2023. In a nutshell, firm power enabling technologies—or firm power enablers—can be broadly categorized into four kinds: (1) energy storage, (2) geographical smoothing and generation blending, (3) load shaping, and (4) overbuilding & proactive curtailment (Perez et al., 2016a). Among these four, the first three enablers are traditional and have been well-known and applied for some time, whereas the last one is a counter-intuitive proposition made quite recently in the mid-2010s. We should give a brief account of all four technologies in the next few pages. 12.5.1

FIRM POWER ENABLERS

By “traditional” we do not mean the technologies are mature in terms of technology readiness level, rather, it suggests that their theoretical conceptions have been around for a long time. All three traditional technologies are intuitive, in that, they seek to modify the generation and/or load curves, such that the variability of, and the mismatch between, these curves can be reduced. But at the same time, they each face technical or socioeconomic challenges that are yet to be overcome. In contrast, overbuilding & proactive curtailment profoundly contradicts the conventional prejudice of minimizing energy wastage. However, in a future power system predominated by renewable generation, overbuilding & proactive curtailment can lead to substantial cost reductions in achieving firm power compared to storage-only solutions. In fact, as suggested by Remund et al. (2023), all empirical studies conducted at worldwide locations thus far have all pointed to the conclusion that overbuilding & proactive curtailment is an absolutely necessary constituent of lowest-cost firm power. 12.5.1.1

Electric storage

Energy can be stored in different forms, such as rotational mechanical energy (e.g., flywheel), electrostatic charges (e.g., capacitors), potential energy (pumped hydro and other gravity energy storage), or thermal energy (sensible heat, latent heat, and thermochemical energy storage). If any of the aforementioned, energy storage is used to support grid integration, the stored energy has to be converted eventually to electricity; we should gather all techno-economically feasible options under the umbrella term of “electric storage.” Electric storage devices are of different response rates and reserve capacities, among which the popular ones include capacitors, flywheels, batteries, and pumped hydro. On one end of the spectrum, both capacitors and flywheels have very fast

546

Solar Irradiance and Photovoltaic Power Forecasting

response but low reserve, and for that reason, they are expected to be used to provide frequency regulation and voltage fluctuation support. On the other end of the spectrum, there is pumped hydro, which has a slower response but much higher reserve; it provides all services from reactive power support to frequency control, synchronous or virtual inertia, and black-start capabilities. Utility-scale battery storage occupies the middle ground. At present, most existing utility-scale batteries are used for shortterm (e.g., primary frequency response) and longer-duration (e.g., load-following and ramping services) ancillary services, but only a small number are used for arbitrage, black start, and firm power delivery. (However, new policies and grid code are being developed and implemented, for instance, the requirement for battery storage to be installed in newly constructed utility-scale PV plants in China beginning in 2023. The mandatory storage capacity ranges from 10% to 30% of the PV capacity depending on the operating zone.) Lithium-ion batteries are often perceived as a storage option with the greatest prospects, for this technology has become commercially available, such as the Tesla Powerwall and Powerpack, which are products for home and utility-scale applications, respectively. Notwithstanding, the high price of lithium-ion batteries has hitherto been limiting its uptake. Although one can expect price drops for lithium-ion batteries, the price would still remain, for a foreseeable future, too expensive for widespread grid-scale applications. Additionally, lithium-ion batteries pose safety risks and their ability to hold charges fades over time (i.e., degradation), as such the necessity of having battery operations & maintenance (O&M) and replacement adds to the overall cost. As we shall see below, the equivalent annual cost of a PV–battery hybrid system using battery storage alone to deliver firm generation is tens of times that of the normal PV plant producing the unconstrained8 solar power. In short, this outrageous multiplier is due to the cost structure of a firm PV: Whereas storing excess solar energy for a few hours poses no problem, the batteries need to serve tens to hundreds of hours with low to no solar resources in reality, aka seasonal storage, which drives up the required nominal capacity and thus the cost of battery storage, even if this kind of prolonged low-resource periods or large energy deficits occur just a few times a year. 12.5.1.2

Geographical smoothing and generation blending

As the main source causing variability in PV power output, clouds only have local effects on generation. Stated differently, the total solar power generation of a region is unlikely to exhibit as strong a variability as any single PV plant does, due to random cancellation of intra-hour to hourly cloud-induced power ramp-ups and -downs among PV plants within that region. (This echoes the earlier finding that L1 time series are often easier to forecast than individual L2 time series.) This phenomenon is known as geographic smoothing or geographic dispersion, which has been known since at least the early 2010s (Lave and Kleissl, 2010). Although in most circumstances relevant to grid integration, geographical smoothing is studied 8 The

word “unconstrained” refers to feeding solar power into the grid as it is generated.

Hierarchical Forecasting and Firm Power Delivery

547

on a spatial scale comparable to or beyond the physical area of a distribution network, such smoothing effect may also be observed in large utility-scale PV plants, but on a much smaller time scale of a few minutes (Hoff and Perez, 2012). The geographical smoothing effect may be quantified using solar ramp rate correlation (Perez and Fthenakis, 2015; Arias-Castro et al., 2014; Hinkelman, 2013). If multiple sites exist within a region, one may fit a correlation function (also known as correlogram) to the empirical correlations, which is a standard procedure in spatial statistics (Cressie and Wikle, 2015). Since the correlation function gradually decreases as the plant-separation distance increases, the position where the function saturates is taken as the decorrelation distance by solar engineers, which marks the desirable separation distance of plants if one is to maximize the geographical smoothing effect. Another noticeable fact is that the parameters of the correlation functions used in studying geographical smoothing are often estimated based on physical laws (e.g., birth, movement, and extinction of clouds), as opposed to statistical parameter estimation (e.g., via least squares). Geographical dispersion as discussed above is confined to the decorrelation of power ramps among plants, as such the overall variability is reduced. In a more general sense, geographical dispersion is also helpful in coinciding generation with load. For instance, if long-distance transmission of power is permitted and economically viable, one can site large amounts of solar generation far to the west of the load centers. In that, the noon to early afternoon solar generation can be used to satisfy the late-afternoon to nighttime load peaks. Of course, the time zone difference between the plants and load centers needs to be a few hours, which, aside from several countries such as China or the United States whose administrative areas span several time zones, may require additional international power transmission policies and legislation of various sorts on top of the current ones. The eventual form of this ideal is known as the energy internet, which is beyond the scope of this book. Another technology that facilitates firm power delivery is generation blending, which refers to the optimal mix of multiple renewable energy sources. PV can be managed jointly with wind, biomass, geothermal, and hydro energy. Whereas wind power tends to be uncorrelated or even negatively correlated with solar power, biomass and geothermal are dispatchable, so one can deliberately make them negatively correlated with solar. Similar to the concept of geographic smoothing, aggregating uncorrelated sources can mitigate solar variability, which makes the optimization of the power-generation mix highly relevant.

12.5.1.3

Demand response

Demand response encourages consumption of electricity when solar generation is locally abundant, and discourages it when it is not. Hence, in its final implication on firm power delivery, demand response is very similar to electric storage. What is different is that electric storage, through discharging and charging, is able to modify both the generation and demand curve, whereas demand response only deals with the latter. This implies that the optimal control and management of electric storage, may

548

Solar Irradiance and Photovoltaic Power Forecasting

it be in the form of utility-scale batteries, electric vehicles, or others, is an important strategy for demand response. Demand response can be classified into different strategies that operate at different time scales. “Shape,” “shift,” “shed,” and “shimmy” from the Lawrence Berkeley Laboratory taxonomy of flexible demand load. “Shape” captures the kind of demand response that reshapes customer load profiles with advance notice of months to days such as energy efficiency. “Shift” represents the kind of demand response that encourages the movement of energy consumption from times of high demand to times of day when there is a surplus of renewable generation. “Shed” describes loads that can be curtailed to provide peak capacity and support the system in emergency or contingency events. “Shimmy” involves using loads to dynamically adjust demand at time scales ranging from seconds up to an hour. Load shaping through energy efficiency upgrades means replacing inefficient appliances with low-energy-consumption ones. On this point, many countries such as Singapore have mandated “energy labels” (or energy performance standards) among registered suppliers supplying regulated goods, such as air-conditioners, refrigerators, clothes dryers, or televisions. The most critical information that energy labels indicate is the potential cost saving, such that consumers can make informed decisions during purchases. A recent study has shown that the guiding role of energy efficiency labels is effective in converting consumers’ reference willingness to purchase intention (Wang et al., 2019e). Load shifting, shedding, and shimmying aim at altering the electricity consumption timing. The (in)famous residential examples are the proposals of using washing machines at night (shift) and temporarily setting the thermostat two degrees higher (shed). These ideas are logically attractive but practically difficult to reinforce when there are no large and concrete incentives. Therefore, programs that incentivize demand response actions contractually with businesses and households have been widely explored in the literature (e.g., Astriani et al., 2021; Alasseri et al., 2020; Aalami et al., 2019), but to date have limited widespread usage, for too much of the existing work so far has been based upon simplistic models with superficial results. As the value of demand response is increasingly realized by policymakers, new programs are created the benefit consumers and businesses. Demand response aggregators such as OhmConnect, GridPoint, or THG Energy Solutions, contract with individuals and (more commonly) businesses to shed or shift load and sell this capacity to markets. Notably, in California the Demand Response Auction Mechanism market accepts bids for capacity and energy from demand response. California also pioneered the Emergency Load Reduction Program (ELRP) in 2022. ELRP rewards consumers and businesses with $2 per kWh for load reduction during high grid stress periods. Both energy efficiency upgrades and optimizing electricity consumption timing require changes in consumer behavior and psychology, which are both beyond what technologies can offer. Especially for working-class people, energy saving and environmental conservation seem less important than tomorrow’s bread and butter. It is for this reason that promoting demand response is expected to face a much stronger undercurrent of resistance in developing countries than in developed countries. In

Hierarchical Forecasting and Firm Power Delivery

549

contrast to other firm power enablers, which primarily can be handled by the energy sector, demand response depends for its success upon the participation of a significant number of consumers and businesses. It is clearly more challenging to materialize. 12.5.1.4

Overbuilding and proactive curtailment

Just not so long ago, all the Wall Street analysts did not believe solar PV was going to ever stand on its own without subsidies. At present, even the most conservative analysts have realized that solar can, and in fact has, become the most economical way to generate electricity in most parts of the world. This abrupt price drop has led to the massive deployment of solar PV. The next step in the success story of solar power is a new strategy of achieving firm power, namely, overbuilding & proactive curtailment. The first impression is that both overbuilding and curtailment imply unnecessary or undesired energy wastage. Overbuilding refers to the deliberately planned capacity expansion on top of what has been initially planned or filed by the grid operators, in that, the overbuilt part of the PV plant does not participate in market operations, such as bidding or forecast submission. On the other hand, curtailment, which refers to the act of restricting or reducing generation or load, has hitherto been used as a last resort for maintaining system balance. In this book, we argue that the radical idea of supply shaping through overbuilding & proactive curtailment is actually beneficial to achieving firm generation at the lowest cost. There are two situations from which the need for overbuilding & proactive curtailment arises. One of those is when the system operator sets a threshold on the maximum AC power injection into the grid, after considering local load or balancing requirements. This means that the maximum AC power is not allowed to exceed the set threshold at any instance. In situations where a threshold is placed, plant owners can profit more from overbuilding than matching the rated capacity exactly to that threshold. The underlying principle is that PV plant rarely operates at the rated capacity, and overbuilding elevates the generation profile and thus the electricity sales, from which the additional cost of overbuilding can be recovered. Figure 12.8 illustrates the situation with a toy example. The bell-shaped curve in the figure corresponds to the AC power generation from a “climatological day,” that is, the long-term generation averaged by the hour-of-day. The left panel shows the exact sizing, with which 69.9 MWh/day of electricity is generated; the right panel shows 1.2x overbuilding, with which 78.3 MWh/day of electricity is generated. If the additional electricity sales over the lifetime of PV are able to offset the additional installation cost, which can easily be assured given the low cost of PV, overbuilding & proactive curtailment is viable. Whereas this example is overly simplistic, in reality, one can optimize the overbuilding factor in accordance with long-term solar resource data. If the grid operator does not set any threshold on AC power injection, the above rationale is no longer valid. The second situation that motivates overbuilding & proactive curtailment is described by Perez et al. (2019a). The central idea is to elevate the generation profile to an “exaggeratedly high” level, such that the instances where generation falls below

550

Solar Irradiance and Photovoltaic Power Forecasting

Exact sizing

AC power [MW]

12.5

Overbuilding

10.0 7.5 5.0 2.5 0.0 00:00

06:00

12:00

18:00

00:00 Hour

06:00

12:00

18:00

Figure 12.8 Two strategies for sizing a PV plant under the constraint of a maximum allowable AC power injection of 10 MW. The shaded area corresponds to electricity sold to the grid: 69.9 MWh/day for exact sizing and 78.3 MWh/day for overbuilding.

Daily energy production [MWh]

Unconstrained

Oversized by a factor of 2.5

150

100

50

Mar

Jun

Sep

Dec Mar Time [day]

Jun

Sep

Dec

Figure 12.9 Comparing unconstrained and 2.5x oversized/curtailed daily PV generation relative to a constant load of 2.5 MW (or 60 MWh/day, i.e., the dashed line) for a period of one year. The gray areas below the dashed line mark the energy deficit that must be met to ensure firm generation. The areas above the dashed line mark the energy to be stored or proactively curtailed.

load are few as compared to the case of unconstrained generation, as illustrated in Fig. 12.9. This idea does not seem reasonable for it suggests the curtailment of a large amount of energy. However, if we take into consideration the hard constraint of firm power delivery, which would otherwise require energy storage or other measures, this strategy becomes feasible. Examining the unconstrained generation as shown in the left panel of Fig. 12.9, the gray areas below the load (illustrated as a constant line here for simplicity) correspond to energy deficits that must be met by other means, such as electric storage or conventional thermal generators. Since both electric storage and spinning reserves

Hierarchical Forecasting and Firm Power Delivery

551

are expensive, the large deficit implies a high cost of firm power. Particularly worth noting is the deficit is large and persistent during winter months, and thus requires seasonal storage, which is technically difficult to realize and economically difficult to justify. In comparison, the 2.5x overbuilt generation in the right panel of Fig. 12.9 has only a few spikes of deficit that need to be fulfilled, which means a much lower cost for electric storage and spinning reserves, but a 2.5x higher cost for generation. Overbuilding of solar implies substantial curtailment of solar generation during the summer. The validity of overbuilding & proactive curtailment therefore resides on whether or not the savings on electric storage and spinning reserves are able to cover the cost of overbuilding, subject to the hard constraint of firm power delivery. According to the evidence gathered thus far (e.g., Pierro et al., 2021; Perez et al., 2021, 2020, 2019a,b), the answer is an affirmative “yes.” The follow-up problem is one of optimization, which aims at determining two things: (1) the oversizing ratio, and (2) the rated capacity of electric storage that can enable firm power. 12.5.2

OPTIMIZING THE FIRM POWER PREMIUM

In order to formulate the optimization problem, a concept called “firm power premium,” which is applicable to both firm forecasting and firm generation, needs to be first introduced. In the initial proposition, which is related to firm generation, Perez et al. (2019a) defined the firm kWh premium as the ratio of the levelized cost of electricity (LCOE) of a firm kWh and that of an unconstrained kWh: Firm kWh premium =

Firm PV generation LCOE , Unconstrained PV LCOE

(12.51)

where LCOE is the ratio of the equivalent annual cost of a generation option over the equivalent annual electrical energy produced by that option, in that, it is a measure of the average net present cost of electricity generation for a generation option over its lifetime. Based on Eq. (12.51), firm kWh premium can be interpreted as the cost multiplier for firming up unconstrained solar power generation, which is to be minimized. In a sister paper, which is related to firm forecasting, Perez et al. (2019b) proposed something similar, Firm forecasting premium =

Firm forecasting LCOE , Unconstrained PV LCOE

(12.52)

to account for the firm forecasting premium (also known as the perfect forecast metric, a term employed by Perez et al., 2019b, originally). Differing from Eq. (12.51), the numerator of the firm forecasting premium is the LCOE of ensuring all forecast amounts of power can be delivered. Based on Eq. (12.52), firm forecasting premium can be interpreted as the cost multiplier for delivering the forecast generation amount (i.e., the generation target set by the grid operator after hierarchical reconciliation), which is also to be minimized. Generally speaking, the firm forecasting premium is much lower than the firm kWh premium, because the former only needs to account for the mismatch between

552

Solar Irradiance and Photovoltaic Power Forecasting

the actual and forecast generation, which is not affected by winters, nights, and other prolonged low-resource periods, whereas the latter needs to account for the mismatch between the actual generation and load, which includes those prolonged lowresource periods. This implies that firm forecasting represents an entry-level application of firm power, and it is a relatively easy task; firm generation corresponds to the ultimate (i.e., ideal) application of firm power, where the demand of the entire power system is to be firmly met, and it is a much harder task (pers. comm. with Richard Perez, University at Albany, SUNY, 2022). In the next part, we show how to estimate firm power premium, which, as mentioned before, is an optimization problem. The objective function is the overall cost of a system that delivers firm power, which includes both capital (CapEx) and operating (OpEx) costs. Accompanying the objective function is a set of constraints, which narrates the conditions under which the system operates. Clearly then, to conduct such an optimization, one should possess a long enough (one year and beyond) time series of the actual PV power output, in that, the curtailing strategy is dynamic, see logic diagram in Fig. 12.10. For firm forecasting, the actual power output interacts with the forecast power output time series, whereas in firm generation, the actual power output interacts with the actual load time series. Since the two quantities, namely, firm forecasting premium and firm kWh premium, are highly analogous, only the optimization of the latter is demonstrated, and that of the former can be done in the exact same fashion by replacing the load time series with the forecast power output time series. 12.5.2.1

Formulation of the firm-generation optimization

Calculating the firm kWh premium, recalling Eq. (12.51), requires four parts: (1) equivalent annual cost of firm PV, (2) equivalent annual electrical energy produced by firm PV, (3) equivalent annual cost of unconstrained PV and (4) equivalent annual electrical energy produced by unconstrained PV. Following the definition of “firm,” part (2) is simply the annual load demand, which is to be met exactly by firm generation. As for the cost of unconstrained PV, i.e., part (3), it is usually known or can be assumed—e.g., Perez et al. (2019b) considered a current utility-scale cost scenario with PV at $1200 per kW AC turnkey with a 30-year lifetime. Lastly, since unconstrained PV feeds power to the grid as it is generated, part (4) is just the annual energy yield of the unconstrained PV. Hence, only the equivalent annual cost of firm PV remains to be formulated. Stated differently, optimizing the firm kWh premium is equivalent to minimizing the equivalent annual cost of firm PV. For a given oversizing ratio (χ), the minimization of the equivalent annual cost of a firm-generation system with both battery storage and overbuilding & curtailment strategy can be written as (Yang et al., 2023b): 0 8  / 8760 Pdis,t cbat Ebat, rated , arg min (ξPV + ζPV ) cPV χPPV, rated + ξbat + ζbat ∑ Ebat, rated t=1 Ebat, rated (12.53)

Hierarchical Forecasting and Firm Power Delivery

553

PV



yes

PV > load?

Load Storage full?

yes

Call on storage

no

+

no Discharge

Charge

Storage

Curtail

Figure 12.10 The flowchart of the dynamic curtailing strategy. When PV generation is greater than the load (or forecast, in the context of firm forecasting), the excess power is either curtailed or stored (the solid path); when PV generation is less than the load (or forecast), the deficit power is supplemented by the storage (the dashed path).

where ξPV , ζPV , cPV , and PPV, rated are the capital recovery factor, O&M factor, unit cost, and rated power of PV, respectively; ξbat , ζbat , cbat , and Ebat, rated , are the capital recovery factor, O&M factor, unit cost, rated capacity of the battery storage, respectively. It should be noted that, on top of defining an O&M factor for the battery storage, its O&M cost also depends upon the number of complete charging (or equivalently, full discharging) cycles—a complete cycle consumes ζbat times the investment cost; this is reflect through the summation term in (12.53). The capital recovery factor is calculated using the discount rate (r) and the lifetime of a component (): r(1 + r) . (12.54) ξ= (1 + r) − 1 The above equation is used to calculate both ξPV and ξbat , but with potentially different r and . During the operation of the PV–battery hybrid system, some constraints must be satisfied, and thus have to be formulated into the optimization problem. These constraints include the operation constraints of the battery storage, the operation constraint of the PV plant, and the power system balancing constraint: • The charging/discharging power (Pch,t and Pdis,t ) of the battery cannot exceed at any instance the maximum allowable power (Pch, max and Pdis, max ), which is

554

Solar Irradiance and Photovoltaic Power Forecasting

usually obtainable from the battery’s datasheet, i.e., 0 ≤ Pch,t ≤ Bch,t Pch, max ,

(12.55)

0 ≤ Pdis,t ≤ Bdis,t Pdis, max ,

(12.56)

where Bch,t and Bdis,t are binary variables representing the charging/discharging mode of the battery over time period t. • The battery cannot be in charging and discharging modes at the same time, which leads to another inequality constraint: Bch,t + Bdis,t ≤ 1.

(12.57)

• The energy stored in the battery at any instance t (Ebat,t ) can neither exceed the rated capacity nor fall below zero: 0 ≤ Ebat,t ≤ Ebat, rated .

(12.58)

• The energy states of the battery at two consecutive time stamps are governed by the charging/discharging over the period: Ebat,t+1 = Ebat,t + Pch,t Δt − Pdis,t Δt,

(12.59)

where Δt is the unit time interval, which acts to convert the power unit to the energy unit. • The power generated by PV (PPV,t ), at any instance, must be decomposed into the summation of three components, which leads to an equality constraint: PPV,t = Pch,t + Pgrid,t + Pcurtail,t ,

(12.60)

where Pgrid,t and Pcurtail,t are power fed into the grid and power curtailed, respectively. • Lastly, the load demand (Pload,t ) needs to be satisfied at all time: Pload,t = Pdis,t + Pgrid,t .

(12.61)

The above constraints constitute but a basic set, and one may introduce further refined parameters, such as the charging/discharging efficiency of the battery or selfdischarge rate of the battery (see Yang et al., 2023b, for example), who employed a sophisticated measurement-based battery model proposed by Gonzalez-Castellanos et al. (2020), which introduces additional modeling complexity but improves the resemblance of the model with the actual operation of lithium-ion batteries. Nonetheless, insofar as the type of optimization problem involved in the firm kWh premium calculation is concerned, the elaboration thus far is sufficient in narrating the situation.

Hierarchical Forecasting and Firm Power Delivery

555

The optimization problem defined in (12.53) under the constraints of (12.55)– (12.61) is a mixed-integer linear program (MILP), which can be solved to exactitude with any off-the-shelf mathematical solvers such as Gurobi, which is available in many languages including Python. However, if additional constraints are placed due to applying a more sophisticated battery model or other refinements to the optimization model, the problem may no longer be an MILP, which would then require customized solvers. For optimization-related issues, the reader is referred to Yang et al. (2023b), who first formulated the firm power delivery as an MILP, in clear contrast to the original thesis of Perez (2014), who used an iterative approach via bisection, which is not only tedious to code but also time-consuming to execute. 12.5.2.2

A case study on firm forecasting

Firm forecasting can be conducted at various levels of a hierarchy. For the hierarchical solar data in Section 12.4, one can perform firm forecasting either at the plant level or at the substation level. In the case of the former, each PV plant needs to be overbuilt and equipped with battery energy storage devices. However, if all PV plants tied to a substation are managed as a whole, the overbuilt part of PV and the battery storage can be centralized. Moreover, performing firm forecasting at the substation level implicitly takes advantage of the geographical smoothing effect, which is also beneficial in absorbing high-frequency variability. For that reason, data from two substations, namely, substations 32 and 34, are chosen for the present case study. The installed capacities at the two substations of concern are 531 MW and 1571 MW, respectively. Location-wise, PV plants under substation 32 are located in the San Francisco Bay Area along the coast of California, which means the forecast error would be higher than that of PV plants under substation 24, which are located within the Sonoran Desert, cf. Fig. 12.4 and Table 12.1. Recall that since the optimization concerning firm forecasting should work with a long enough time series of generation and forecast, at least one year’s worth of forecast–observation pairs are needed. On top of that, we should naturally employ the best possible forecasts, in order to minimize the discrepancies between forecasts and observations and thus the cost of additional firm-power components; the MinT-shrink reconciliation should therefore be selected. Notwithstanding, since MinT-shrink requires training, the one-year out-of-sample reconciled forecasts are produced in two batches, each using a reconciliation matrix P MinT-shrink trained with the other. Stated differently, data points from the first half of the year are used to construct P MinT-shrink via Eq. (12.42), and the remaining base forecasts from the second half of the year are reconciled via Eq. (12.14); and then the two halves switch roles. Next, various parameters, such as discount rate or unit cost of PV, must be known and set. We solicit the parameter values from a few credible sources and list them in Table 12.4. It should be noted that Statista, which is a portal that hosts a wide range of statistics on market and consumer data, is dynamic, and the listed cPV and cbat values correspond to the numbers for the year 2020 and are likely to change in the future. However, for demonstration purposes, the accuracy and recency of the parameter values should not be a primary concern.

556

Solar Irradiance and Photovoltaic Power Forecasting

Table 12.4 Parameter setting used for the case study. Parameter

Symbol

Value

Source

Discount rate Lifetime of PV Lifetime of battery Unit cost of PV Unit cost of battery O&M factor of PV O&M factor of battery

r PV bat cPV cbat ζPV ζbat

8% 30 years 15 years 883 $/kW 137 $/kWh 1% of ξPV 0.02% of ξbat

Huang et al. (2020) Cole et al. (2018) Cole et al. (2018) Statista Statista Perez et al. (2019a) Perez et al. (2019a)

Firm forecasting premium

The optimization is conducted with the gurobipy package9 of Python, which offers a trial license of the Gurobi Optimizer, which, as mentioned earlier, is a mathematical optimization solver for MILPs and quadratic programming problems. Although the trial license only handles problems of limited size, it is sufficient for the present case study. The optimization is repeated for a vector of oversizing ratios (χ), ranging from 1 to 5 with a step size of 0.01. For each χ, an Ebat, rated can be obtained, with which the costs for PV and storage can be computed. Figure 12.11 shows the firm forecasting premium with the contributions from PV and storage broken down, over χ ∈ [1, 5], for both substations. Substation 32

8

Substation 34

6 4 2 0

1

2

3

4

5 1 Oversizing ratio

Battery

PV

2

3

4

5

Total

Figure 12.11 Firm forecasting premium optimization results for two substations in California, which have 531 MW and 1571 MW of unconstrained PV connected. Whereas the optimal oversizing ratio is 1.50 for substation 32, which corresponds to a premium of 2.15 (or $100 per kW AC turnkey, per year), the numbers for substation 34 are 1.13 and 2.02 (or $88), respectively.

For substation 32 with 531 MW of unconstrained PV installed, the lowest firm forecasting premium is found to be 2.15, or equivalently, an additional $100 per kW AC turnkey per year. This premium corresponds to an oversizing ratio of 1.50, which 9 https://pypi.org/project/gurobipy/

Hierarchical Forecasting and Firm Power Delivery

557

then implies the need for a battery storage ratio of 0.65 of the unconstrained PV cost. In monetary terms, the unconstrained PV annualized cost is $46.34 million USD, and the total annualized cost of the optimal firm-power system is $99.52 million USD.10 For substation 34 with 1,571 MW of unconstrained PV installed, the lowest firm forecasting premium is found to be 2.02, or equivalently, an additional $88 per kW AC turnkey per year, which corresponds to an optimal oversizing ratio of 1.13 and a battery storage ratio of 0.89. In monetary terms, the costs of the unconstrained PV and the optimal firm-power system are $137.09 million USD and $276.58 million USD, respectively. These results agree with our initial assessment, in that, PV plants connected to substation 34 see smaller forecast errors, and thus are cheaper to firm up, than PV plants connected to substation 32. 12.5.2.3

A case study on firm generation

Similar to the case of firm forecasting, firm generation can also be enabled on different scales. There are three natural options, namely, firm generation within (1) each electricity demand forecast zone, (2) each balancing authority area, and (3) the entire power system or area of interest. Without loss of generality, the third option is used here to exemplify the procedure. Echoing the California subset of the SPDIS dataset used in earlier parts of this chapter, historical hourly load data for California as available from the CAISO website11 is downloaded. However, since the data is only available since 2019, we choose the year 2021, which was the most recent complete-year data at the time of writing. Another factor of concern is that the California load is much larger in magnitude as compared to the total installed PV capacity in the SPDIS dataset, which is 22,530 MW. As such, to make the case study more meaningful, the load is downsized by a factor of 10. Since hourly data are too numerous to be legibly displayed as time series, Fig. 12.12 shows the transient of daily PV generation and load aggregated from the hourly values, for both the unconstrained and the 1.5x oversized versions; nonetheless, the optimization is performed with hourly data. As mentioned earlier, the technique for optimizing the firm kWh premium is no different in technique from that of the firm forecasting premium: one seeks to optimize the rated capacity of the battery storage, Ebat, rated , for a range of oversizing ratio, χ ∈ [1, 5], for example. Put more plainly, by replacing the forecast time series with the load time series, the optimization model and routine used for the previous case study can be of service “as is.” On this point, the parameters listed in Table 12.4 are employed for the present case study as well. Figure 12.13 shows how the firm kWh premium varies with oversizing ratio, cf. Fig. 12.11. The result points to an optimal oversizing ratio of 1.54, which corresponds to a firm kWh premium of 3.04. PV contributes 1.40 of the firm kWh premium,12 10 The

ratio between these two costs yields 2.15.

11 https://www.caiso.com/Documents/HistoricalEMSHourlyLoadDataAvailable.html 12 Recall

that the denominator of the LCOE of firm generation is load demand, but not the total generation of unconstrained PV, the contribution of PV to the firm kWh premium, which is 1.40, should not be confused with the cost multiplier of PV, which is 1.54.

558

Solar Irradiance and Photovoltaic Power Forecasting

Unconstrained

Daily energy production [GWh]

200 150 100 50 Oversized by a factor of 1.5 200 150 100 50 Apr

Jul Time [day]

Oct

Figure 12.12 Comparing unconstrained and 1.5x oversized/curtailed daily PV generation (solid line) relative to 1/10 of the actual daily load (dashed line) of California. The gray areas below the dashed line mark the energy deficit that must be met to ensure firm generation. The areas above the dashed line mark the energy to be stored or proactively curtailed.

Firm kWh premium

16 12 8 4 0

1

2

3 Oversizing ratio Battery

PV

4

5

Total

Figure 12.13 Firm kWh premium optimization results for the entire state of California with 22,530 MW of unconstrained PV installed. The optimal PV oversizing ratio is 1.54, which corresponds to a premium of 3.04 (or $205 per kW AC turnkey, per year).

whereas the battery contributes 1.64. Given that the total annualized cost of all unconstrained PV in California is about $1.96 billion USD, and that of firm generation cost is about $6.59 billion USD, firming up the generation across the entire state of California requires $205 per kW AC turnkey, per year. Once can confirm that achieving firm generation is more costly than achieving firm forecasting—recall that firm

Hierarchical Forecasting and Firm Power Delivery

559

generation requires an excessive amount of storage for nighttime and other lowresource periods. 12.5.3

MOVING FORWARD WITH FIRM POWER DELIVERY

Through the above elaboration and case studies, we have shown that firm power is an attractive concept that facilitates large-scale solar integration. Nonetheless, there are still numerous unsolved issues that prevent the immediate uptake of firm power technology. Amongst all the challenges that may or may not be foreseen at present, financing is the foremost one. The case study on firm forecasting indicates that achieving firm forecasting requires a cost multiplier of more than 2, whereas ensuring firm generation suggests a cost multiplier of more than 3. Even though the unconstrained PV production cost has already reached grid parity in many parts of the world, delivering firm solar power is considerably more expensive, which makes the overall cost still exceed grid parity. Environmental incentives (and penalties) would be needed to help justify the higher cost of firm solar power. Another possible path to lower the cost is to utilize the curtailed power for other purposes, such as hydrogen production, irrigation, or pumped hydro, insofar as the electricity usage of the application is not impacted by the intermittency of the curtailed power. At any rate, real grid parity can only be achieved with technological advancements and further cost reduction of PV and battery. As is the case for regular solar projects, the financial viability of firm solar projects would depend highly on the financing structure and contractual terms. Indeed, it should be now clear that the optimal combination of various firm power enablers, in particular, battery storage, overbuilding & proactive curtailment, and geographical smoothing, is the most cost-effective way of ensuring firm power. However, the key consideration for banks and investors when making the commercial decision to provide a loan to finance any project has hitherto been the borrower’s ability to repay the loan, which must be assessed with care. Even before we attend to this discussion, the question “Who is the borrower?” must be first addressed. This seemingly rhetorical question does not lend itself to an obvious answer. Recall that firm forecasting can be conducted at the plant level or in an arealaggregate fashion (e.g., at the substation level). At the plant level, PV plant owners could be motivated and tasked with installing storage and overbuilding their plants, if the additional cost can be covered in terms of forecast penalty avoidance and (if any) government subsidy. Under the current remuneration framework, this is unlikely to be a case that can be reliably analyzed, for even the penalty schemes are immature and sometimes flawed (Yang and Kleissl, 2023). In that, we can only anticipate the gradual perfection of the operations management of the modern power system with high renewable penetration. For the areal aggregate, the situation is less obvious. One option is crowdsourcing, in which each plant owner bears a part of the overall cost of storage and overbuilding. However, curtailing existing PV plants at the (dramatic) levels suggested above is infeasible as these plants were often contractually guaranteed a zero or minimum amount of curtailment. In other words, the financial viability of these plants

560

Solar Irradiance and Photovoltaic Power Forecasting

would be threatened if additional curtailment was applied. On the other hand, if only new PV plants would have to carry the burden of curtailment, then those plants would not be financially viable. That is because, marginally, for the last few PV plants in the fleet, nearly all of the generation would have to be curtailed; these plants would only feed into the grid during a few hours of the year. Therefore it seems unavoidable that the power grid would have to bear the cost of curtailment. Currently, power system expansion projects are already planned and carried out around the world anyway. At present, a holistic strategy on how to properly monetize firmness is lacking. Besides financing, another unresolved issue hindering the adoption of firm power technology is the conceptually abrupt transition from firm forecasting to firm generation. Firm forecasting, as mentioned earlier, constitutes an entry-level application of firm power, whereas firm generation refers to the ultimate objective of modern power systems, in which forecasting is no longer needed. The two are not meant to operate simultaneously (pers. comm. with Richard Perez, University at Albany, SUNY, 2022). On the other hand, the adoption of firm power technologies is necessarily a slow and gradual process. Combining both facts, it is difficult to discern, during power system operations, which battery–overbuilding installation is meant for firm forecasting and which is meant for firm generation. It is also unclear when should certain battery–overbuilding installations built for firm forecasting be expropriated to be used for firm generation. Although Perez et al. (2020) have attempted to respond to these doubts from the perspective of load morphing—i.e., how the “load” in firm forecasting, which is simply the forecast PV power, is morphed into the actual full load in firm generation—the framework still needs much refinement. These are problems that desperately need to be solved. With that, we end the very long journey of unfolding state-of-the-art solar forecasting.

Epilogue It took me roughly two years to complete this book. The current version of the book has ended up significantly bulkier than initially planned, which suggests my ontology of solar forecasting has expanded as writing progressed. Many concepts and ideas, which were previously known to me only superficially, have been strengthened and made more tangible. Consequently, I have endeavored to give my best attempt to deliver these newly acquired knowledge and material to the readers, and to highlight as lucidly as I am able the challenges and difficulties in understanding by which I myself was confronted. I received help from not just my coauthor but also Martin J´anos Mayer (on model chain construction), Xiang’ao Xia (on radiometry-related topics), Pierre Pinson (on probabilistic forecast verification), Rob J. Hyndman (on hierarchical forecasting), and Richard Perez (on firm power delivery), to whom I wish to express my sincere appreciation. As the outcome of the discussions in this book and in line with the initial proposition, I suggest, hopefully convincingly, that solar forecasting research topics, at present and as to serving grid-integration purposes, may be collected into five aspects: (1) base methods, (2) post-processing, (3) verification, (4) irradiance-to-power conversion, and (5) grid-integration applications. It is probable that more can be devised, but I have not myself succeeded in doing so. Each of these aspects has gathered over the past decade theories and practices that have wholly transformed the field of solar forecasting since its birth; each nevertheless is still subject to indefinite epistemological refinement and expansion. These aspects collectively form state-of-the-art solar forecasting, have advanced the ways which solar is integrated into the existing power grid, and are believed to be able to serve the same purposes better moving forward when more public acceptance is gained. In the remaining paragraphs, I shall give some bold advice in regard to each of the five aspects. With the exception of the earlier years, I have not been supportive, by any means, to purely extrapolative data-driven base forecasting methods, for ignoring the physical and spatio-temporal nature of solar irradiance is illogical, especially when data that facilitate the latter style of forecasting are available in bulk and for worldwide locations. This necessarily reduces the options for generating base forecasts to methods leveraging information from sky camera, satellite, and numerical weather prediction (NWP). Insofar as grid integration is concerned in the main with day-ahead and intra-day time scales, NWP finds the highest relevance among the physical forecasting methods. NWP, as it was conceived, has been a two-step iterative procedure, one diagnosis and the other prognosis. While data assimilation is an integral part of the weather enterprise, and it is there to stay, for the prognosis step, there is on-going and intense discussion in the meteorological community on the possibility of replacing the current physics-based numerical models with learning-based models. Although at medium-range and at global scales, such transformation is unlikely to take place in a very near future, on smaller scales of a few days and over regions compatible 561

562

Solar Irradiance and Photovoltaic Power Forecasting

to local grids, morphing from using numerical to learning-based methods may occur fairly soon. Post-processing is thought to be an area where solar forecasters can operate and innovate very conveniently, for it neither requires the meteorological sophistication as to produce the raw forecasts through weather models, nor demands the often proprietary information pertaining to photovoltaic plants and power grids. That said, the bidirectional conversion between deterministic and probabilistic forecasts does not seem to have received, at present, as much attention as the deterministic-todeterministic or probabilistic-to-probabilistic post-processing, of which the reason may be argued to be the redundancy of starting from a different form of initial forecasts than the one which a forecaster aims to arrive at. The question to be urgently addressed is thus this: Which post-processing sequence would maximize the goodness of the final forecasts? There are at least two directions in which further investigation should be conducted. One of those is whether using a cascade of several post-processing methods can lead to quantifiable benefits, e.g., one may proceed from a set of ensemble forecasts, calibrate it and then summarize the calibrated forecasts into a set of deterministic forecasts. The other inquiry that needs to be made is whether post-processing should be applied to irradiance or power forecasts. As far as I know, there are already some concurrent projects commencing on these topics, led by several independent groups of researchers, which confirm the relevance of these topics. The virtue of forecast verification which I am accustomed to ought to be attributed to two scientists of different eras. First there was Allan Murphy, who devoted his entire professional life to verification of weather forecasts, and there is Tilmann Gneiting, who wrote most of those seminal statistical pieces underpinning the verification practices well accepted by solar forecasters today. Their works contain so much detail, technical or not, that I could most certainly discover something new and exciting each time I revisit them. Following their verification frameworks wholly to most forecasters is perhaps a daunting task, as exemplified by the myriad facets of qualitative and quantitative analyses brought forward by the Murphy–Winkler decomposition of mean square error. Most journals and possibly the readers of those journals do not have pages and bandwidth to accommodate and to scrutinize long lists of figures and tables. What could be done instead? To address this question we should turn to the success story of machine learning, which ought not to be regarded as independent of the myriads of competitions and public datasets, inviting fair rivalry of skills of various kinds. On the other hand, the lack of reward and motivation may prevent all those from happening. In the case of former, the “winner takes all” reality of competitions could block less skilled forecasters from participating, as the time cost of participating could be rather demanding. In the case of the latter, solar forecasting concerns more on local weather regimes, and the measure of goodness of forecasts is largely driven by domestic grid policies, which both diminish the return of conducting forecasting research on public datasets. All these appear insoluble at the moment, I urge the solar forecasting community to be more attentive to the importance of furthering its research in these important areas.

Epilogue

563

Concerning irradiance-to-power conversion, in the main, is the amount of information available to the modeler. Depending upon whether the plant is newly commissioned and the availability of design parameters, indirect and direct approaches of conversion can be opted for. In the case of model chain, which refers to the indirect approach, the degree of validity of the entire chain relies on the accuracy of individual component models. Therefore, continued efforts pouring into developing better component models are justified. Notwithstanding, some component models such as the Perez transposition model may already achieved an asymptotic level of optimization, and pushing its accuracy further may very likely be an arduous journey. It may be more rewarding to identify and improve those component models that are underdeveloped at present, for which the data supporting the analyses is often the main barrier. In the case of regression-based conversion, which refers to the direct approach, machine learning would no doubt be an indispensable tool. To maximize the utility of machine learning, however, knowledge of energy meteorology is to be integrated into modeling, which gives rise to hybrid irradiance-to-power conversion methods, in which component models based on largely known physics can be used to derive transformed inputs for machine-learning models. Moving beyond the irradiance-to-power conversion for individual plants, methods dealing with the aggregated power output of a cluster of plants are, at least as of now, in shortage, which presents another tempting research direction. Inasmuch as this book is concerned, the ultimate aim of solar forecasting is to facilitate grid integration. Various high-level theoretical frameworks of doing so are already available, as exemplified by hierarchical forecasting and firm forecasting. Yet, none of those would be meaningful without actual accepting them into an operational context by grid operators. It is almost certain that not every solar forecaster would have the opportunity to develop their research fully based on a predetermined research direction while working under a corporate or policy context. And those who do have such luck are often confined to the small set of techniques that they themselves are acquainted with and can benefit from. If most solar forecasters are to submit to this cruel reality, the only way to advance any proposition is through collaborating with those who are connected to the industry and policy makers. Even so, it would normally take many long years before the conveyed ideas can turn into operational practices and grid code, which has been the case of load forecasting. Be that as it may, largely driven by the climate mitigation needs, the logical action here is to remain steadfast in our belief that power grids with high solar penetration bound to become reality in a few decades. I should, at this moment, quote the words of Bertrand Russell, which were answered when Russell was asked to outline the advice he would give to future generations; this advice still is timely and congruous with our current challenge: “I should like to say two things, one intellectual and one moral. The intellectual thing I should want to say to them is this: When you are studying any matter or considering any philosophy, ask yourself only what are the facts and what is the truth that the facts bear out. Never let yourself be diverted either by what

564

Solar Irradiance and Photovoltaic Power Forecasting

you wish to believe, or by what you think would have beneficent social effects if it were believed. But look only, and solely, at what are the facts. The moral thing I should wish to say to them is very simple. I should say love is wise, hatred is foolish. In this world, which is getting more and more interconnected, we have to learn to tolerate each other, we have to learn to put up with the fact that some people say things that we don’t like. We can only live together in that way. And if we are to live together and not die together, we should learn the kind of tolerance which is absolutely vital to the continuation of human life on this planet.” Bertrand Russell The 1959 interview with BBC Narrowing that down to the present case: Any proposition in regard to solar forecasting must be based solely on objective facts, not wishful thinking; and we have to learn to live with differing views or we will end up with nothing but disappointment. It is simple advice, but it bears repeating. Dazhi Yang Children’s Day, 2023 Harbin

A Statistical Derivations A.1

EXPECTATIONS

The expectation of a continuous random variable X ∼ F can be expressed in a variety of ways (Wasserman, 2013; Lichtendahl et al., 2013): E(X) = EX (X) = EF (X) = E[F(x)] =



x f (x)dx =



xdF(x) ≡ μX ,

(A.1)

where F and f are the cumulative distribution function (CDF) and probability density function (PDF) of X, respectively. Though the expressions in Eq. (A.1) all refer to the same thing, the differences in notation are often useful to clarify the context. For instance, EX (X) reads “taking the expectation of X with respect to X.” It may appear redundant for now, it is nevertheless useful when the expectation needs to be evaluated for expressions in which multiple random variables are involved. On the other hand, the notation EF (X) reads: the expectation of random variable X, and X is distributed according to F. This notation is useful when there are multiple forecasters placing different judgements in regard to the distribution of a random variable—e.g., one assumes X ∼ F, and another X ∼ G, and using EF (X) and EG (X) is able to discriminate those. And E[F(x)] has the same purpose. One important theorem in regard to expectation is the rule of iterated expectations, which states that for random variables X and Y : EY [E(X|Y )] = E(X),

EX [E(Y |X)] = E(Y ).

and

(A.2)

The proof is as follows: EY [E(X|Y )] = =





E(X|Y = y) f (y)dy = x f (x|y) f (y)dxdy =





x f (x|y)dx f (y)dy x f (x, y)dxdy = E(X).

(A.3)

It should be clarified that unlike E(X), which is a number, the conditional expectation E(X|Y = y) is in fact a function of y, since it takes different values for different y’s. Before Y is observed, one does not know what value E(X|Y = y) takes, therefore, it is a random variable which can be denoted as E(X|Y ). Another point which should be clarified is that the first step of Eq. (A.3) invoked the law of the unconscious statistician. The law states that: Suppose a random variable Z is a function of Y , i.e., Z = g(Y ), then E(Z) = E [g(Y )] =



g(y) f (y)dy.

(A.4)

One should not confuse g and f , where the former denotes the function linking Z to Y , and the latter denotes the density of Y . The law of the unconscious statistician DOI: 10.1201/9781003203971-A

565

566

Solar Irradiance and Photovoltaic Power Forecasting

is also true for the case where a random variable Z = g(X,Y ) is a function of two random variables X and Y, which have a joint PDF f (x, y). In this case, E(Z) =

A.2



g(x, y) f (x, y)dxdy.

(A.5)

MSE DECOMPOSITION IN LINEAR FORECAST COMBINATION

With the preliminary in Section A.1, we look at the expression for the mean square error (MSE) between a random variable Y and its forecast, which takes the form of a linear combination ∑mj=1 w j X j , where X j and w j are the jth component forecast and its corresponding weight, with w j ≥ 0 and ∑mj=1 w j = 1. This is to provide a full derivation of Eq. (8.25). / 0 m

∑ w j X j ,Y

MSE

j=1

02 ⎤ =E ⎣ ∑ w j X j −Y ⎦ ⎡/

=E

⎧ ⎨ ⎩ 

m

j=1

2 ⎫ ⎬

m

∑ (w j X j − w jY )

j=1 m

⎭ 8

m

m−1

∑ w2j (X j −Y )2 + 2 ∑ ∑

=E 

j=1 m

m−1

"

m

∑ w2j (X j −Y )2 + ∑ ∑

=E

j=1



m−1

w j wk (X j −Y )(Xk −Y )

j=1 k= j+1

w2j (X j −Y )2 + w2k (Xk −Y )2

j=1 k= j+1

m

∑ ∑

"

w j (X j −Y ) − wk (Xk −Y )

#2

#

8

j=1 k= j+1



m

m

∑ w2j (X j −Y )2 + (m − 1) ∑ w2j (X j −Y )2

=E

j=1



m−1

j=1

m

∑ ∑

"

#2 w j (X j −Y ) − wk (Xk −Y )

8

j=1 k= j+1

 =E

m

m−1

∑ mw2j (X j −Y )2 − ∑ ∑

j=1

m

= ∑ mw2j MSE(X j ,Y ) − j=1

w j (X j −Y ) − wk (Xk −Y )

#2

8

j=1 k= j+1

m # m−1 " = ∑ mw2j E (X j −Y )2 − ∑ j=1

"

m

m



E

5"

w j (X j −Y ) − wk (Xk −Y )

#2 6

j=1 k= j+1

m−1

m

∑ ∑

j=1 k= j+1

E

5"

w j (X j −Y ) − wk (Xk −Y )

#2 6

(A.6)

Statistical Derivations

A.3

567

MEAN AND VARIANCE OF A TRADITIONAL LINEAR POOL

The need for combining predictive distributions arises when there are several forecasters who each issues a probabilistic forecast. The forecasts issued by individuals are called component forecasts, or in this case, component predictive distributions. Given m component predictive distributions, with the jth one denoted as Fj with a corresponding PDF f j , the traditional linear pool (TLP) suggests: m

∑ w j Fj (x),

G(x) =

(A.7)

j=1

where w j is the weight of the jth component, G is the combined predictive distribution with a corresponding PDF g. Taking expectation of the above equation yields: E [G(x)] = =

∞ −∞



=



m

−∞

= w1

xg(x)dx 

∑ w j f j (x)

x

∞ −∞

dx

j=1

x f1 (x)dx + · · · + wm

∞ −∞

x fm (x)dx

m

∑ w j E [Fj (x)] ≡ μ,

(A.8)

j=1

where E [Fj (x)] =

∞ −∞

x f j (x)dx ≡ μ j

(A.9)

is the expectation of the random variable representing the ith component forecast. On the other hand, the variance of the TLP distribution is: V [G(x)] = = = = = =

∞ −∞



−∞

(x − μ)2 g(x)dx 

∑ wj

j=1 m

∑ wj

j=1 m

∑ wj

j=1 m

∑ w j f j (x)

(x − μ)

m







−∞



dx

j=1

−∞





m

2



−∞

(x − μ) f j (x)dx 2

(x − μ − μ j + μ j ) f j (x)dx 2

"

2

m

∑ w j V [Fj (x)] + ∑ w j (μ j − μ)2 ≡ σ 2 ,

j=1

#

(x − μ j ) + (μ j − μ) + 2(x − μ j )(μ j − μ) f j (x)dx 2

j=1

(A.10)

568

Solar Irradiance and Photovoltaic Power Forecasting

where V [Fj (x)] =

∞ −∞

(x − μ j )2 f j (x)dx ≡ σ 2j .

(A.11)

Note that the last term in the second last row of Eq. (A.10), namely,

∞ −∞

=μ j

(xμ j − μ 2j + μ j μ − xμ) f j (x)

∞ −∞

x f j (x)dx − μ 2j

∞ −∞

f j (x)dx + μ j μ

∞ −∞

f j (x)dx − μ

=μ 2j − μ 2j + μ j μ − μ j μ = 0.

∞ −∞

x f j (x)dx (A.12)

Therefore, one has: μ= σ2 =

A.4

m

∑ w j μ j,

(A.13)

j=1 m

m

j=1

j=1

∑ w j σ 2j + ∑ w j (μ j − μ)2 .

(A.14)

RETRIEVING MEAN AND VARIANCE OF GAUSSIAN DISTRIBUTION FROM PREDICTION INTERVALS

The aim of this section is to retrieve the parameters of a Gaussian random variable X ∼ N (μ, σ 2 ), from a pair of prediction intervals [l, u] with P(X < l) =

α , 2

P(X < u) = 1 −

(A.15) α . 2

(A.16)

It is known that the random variable X has the same distribution as σ Z + μ, where Z ∼ N (0, 1). Let Φ be the CDF of Z, then     l−μ l−μ α (A.17) P(X < l) = P Z < =Φ = , σ σ 2     u−μ u−μ α (A.18) P(X < u) = P Z < =Φ = 1− , σ σ 2 which give rise to the linear equations: l = σ · Φ−1

α 

+ μ, 2 α u = σ · Φ−1 1 − + μ. 2

(A.19) (A.20)

Statistical Derivations

569

Solving the equations, one obtains μ and σ as:  α  α l · Φ−1 1 − − u · Φ−1 2   α 2 , μ= (A.21) α Φ−1 1 − − Φ−1 2 2 u−l  α . (A.22) σ= α Φ−1 1 − − Φ−1 2 2 In fact, this method can be used to retrieve Gaussian distribution parameters from any two quantiles.

A.5

MSE DECOMPOSITIONS FOR FORECAST VERIFICATION

The properties of MSE is further examined in this section. In particular, we are interested in knowing how various aspects of forecast quality can be decomposed from MSE. Let X and Y denote random variables representing forecast and verification, respectively. Furthermore, a third random variable Z = g(X,Y ) = (X − Y )2 is used to represent the squared error between X and Y . More specifically,



# " 2 (x − y)2 f (x, y)dxdy. (A.23) MSE(X,Y ) = E (X −Y ) = E(Z) = y x

Though this expression, calibration–refinement and likelihood–base rate decompositions of MSE (also known respectively as cof and cox decompositions of MSE) as seen in Chapter 9 can be derived. The trick to derive those decompositions is to add and subtract terms to/from Eq. (A.23), regroup them, and thus simplify into desired forms. Before that, several symbols are defined: E(X) ≡ μX , E(Y ) ≡ μY , E(X|Y = y) ≡ μX|Y , and E(Y |X = x) ≡ μY |X . The term E(X|Y = y) denotes the expected value of the forecast, conditioning on an observation value of y; and E(Y |X = x) is the expected value of the observation, conditioning on a forecast value of x. Of course, when the value of the given y or x changes, the evaluated conditional expectation also changes. A.5.1

CALIBRATION–REFINEMENT DECOMPOSITION

For calibration–refinement decomposition, one starts by adding and subtracting both (μY − y)2 and (μY |X − y)2 inside the integral of Eq. (A.23): MSE(X,Y ) = =



y x



y x

− +

(x − y)2 f (x, y)dxdy (μY − y)2 f (x, y)dxdy

" y x

"

# (μY − y)2 − (μY |X − y)2 f (x, y)dxdy

y x

# (x − y)2 − (μY |X − y)2 f (x, y)dxdy.

(A.24)

570

Solar Irradiance and Photovoltaic Power Forecasting

The first term of Eq. (A.24) can be arranged into:



y x

(μY − y)2 f (x, y)dxdy



= (μY − y)2 y



x

f (x, y)dxdy

= (μY − y)2 f (y)dy ≡ V(Y ).

(A.25)

y

The second line of Eq. (A.25) is due to the fact that (μY − y)2 does not contain x and, thus, can be taken out of the inner integral. The second term of Eq. (A.24), on the other hand, is slightly more tedious:

" y x



# (μY − y)2 − (μY |X − y)2 f (x, y)dxdy

"

# (μY − y)2 − (μY |X − y)2 f (y|x)dydx x y

  = f (x) μY2 − 2μY y − μY2|X + 2μY |X y f (y|x)dydx x y  



 = f (x) μY2 − μY2|X − 2 μY − μY |X y f (y|x)dy dx x y

  = f (x) μY2 − μY2|X − 2μY μY |X + 2μY2|X dx =

=

f (x)

x  x

μY |X − μY

2

 2 f (x)dx ≡ EX μY |X − μY .

(A.26)

The third term of Eq. (A.24) can be treated similarly to the second term:

" y x

=

 x

# (x − y)2 − (μY |X − y)2 f (x, y)dxdy

x − μY |X

2

 2 f (x)dx ≡ EX X − μY |X .

(A.27)

By combining Eqs. (A.25)–(A.27), one yields Eq. (9.15):  2  2 MSE(X,Y ) = V(Y ) + EX X − μY |X − EX μY |X − μY .

(A.28)

Murphy and Winkler used f and x to represent forecast and verification, with that, Eq. (A.28) can be rewritten as: MSE( f , x) = V(x) + E f [ f − E(x| f )]2 − E f [E(x| f ) − E(x)]2 ,

(A.29)

which is identical to Eq. (11) of their seminal 1987 paper, A General Framework for Forecast Verification.

Statistical Derivations

A.5.2

571

LIKELIHOOD–BASE RATE DECOMPOSITION

For likelihood–base rate decomposition, one starts by adding and subtracting both (μX − x)2 and (μX|Y − x)2 inside the integral of Eq. (A.23): MSE(X,Y ) = =



x y



x y

(x − y)2 f (x, y)dydx (μX − x)2 f (x, y)dydx

"



x y

"

+

x y

# (μX − x)2 − (μX|Y − x)2 f (x, y)dydx # (x − y)2 − (μX|Y − x)2 f (x, y)dydx.

(A.30)

The first term of Eq. (A.30) can be arranged into:



x y

(μX − x)2 f (x, y)dydx



= (μX − x)2 x



y

f (x, y)dydx

= (μX − x)2 f (x)dx ≡ V(X).

(A.31)

x

The second term of Eq. (A.30) is manipulated as:

" x y



# (μX − x)2 − (μX|Y − x)2 f (x, y)dydx

"

# (μX − x)2 − (μX|Y − x)2 f (x|y)dxdy y x

  2 = f (y) + 2μX|Y x f (x|y)dxdy μX2 − 2μX x − μX|Y y x  



 2 2 = f (y) μX − μX|Y − 2 μX − μX|Y x f (x|y)dx dy y x

  2 2 = f (y) μX2 − μX|Y − 2μX μX|Y + 2μX|Y dy

=

f (y)

y

=

 y

μX|Y − μX

2

 2 f (y)dy ≡ EY μX|Y − μX .

(A.32)

The third term of Eq. (A.30), by analogy, yields:

" x y

=

 y

# (x − y)2 − (μX|Y − x)2 f (x, y)dydx

y − μX|Y

2

 2 f (y)dy ≡ EY Y − μX|Y .

By combining Eqs. (A.31)–(A.33), one yields Eq. (9.16):  2  2 MSE(X,Y ) = V(X) + EY Y − μX|Y − EY μX|Y − μX .

(A.33)

(A.34)

572

Solar Irradiance and Photovoltaic Power Forecasting

With the notation as used by Murphy and Winkler, Eq. (A.34) becomes: MSE( f , x) = V( f ) + Ex [x − E( f |x)]2 − Ex [E( f |x) − E( f )]2 ,

(A.35)

which is identical to Eq. (12) of their seminal 1987 paper, A General Framework for Forecast Verification. A.5.3

BIAS–VARIANCE DECOMPOSITION

The bias–variance decomposition of MSE is much simpler to derive, as one   just needs to employ a new variable Z = X −Y , such that the property, V(Z) = E Z 2 − [E(Z)]2 , is able to make the decomposition readily available:   MSE(X,Y ) = E Z 2 = V(Z) + [E(Z)]2 = V (X −Y ) + [E(X) − E(Y )]2 = V(X) + V(Y ) − 2Cov(X,Y ) + [E(X) − E(Y )]2 ,

(A.36)

where Cov(X,Y ) denotes the covariance between X and Y . This is written as: MSE( f , x) = V( f ) + V(x) − 2Cov( f , x) + [E( f ) − E(x)]2 ,

(A.37)

in Murphy and Winkler’s notation.

A.6

KERNEL CONDITIONAL DENSITY ESTIMATION

Let x1 , · · · , xn be independent observations from a CDF F and let f = F  be the PDF. Then, for a given kernel K and a bandwidth hx , the kernel density estimator of the f of a random variable X is:   x − xi 1 n 1 1 n f (x) = ∑ K (A.38) = ∑ Khx (x − xi ), n i=1 hx hx n i=1 where Khx (x − xi ) is just a more compact way of writing the kernel. Now, suppose we observe pairs of data (x1 , y1 ), · · · , (xn , yn ), and one is interested in estimating the joint density of X and Y , then, the estimator is the product kernel: 1 n f (x, y) = ∑ Khx (x − xi )Khy (y − yi ). n i=1

(A.39)

Since it is natural to have f (y|x) = f (x, y)/ f (x), the conditional density estimator is: n

f (y|x) = ∑ wi (x)Khy (y − yi ), i=1

(A.40)

Statistical Derivations

573

(a)

(b)

0.50

0.50 y

0.75

y

0.75

0.25

0.25

0.00

0.00 0.00

0.25

0.50 x

0.75

0.00

0.25

0.50 x

0.75

Figure A.1 Gaussian kernels for 20 pairs of (xi , yi ). Depicted in (a) are Khx (x − xi ), and in (b) are Khy (y − yi ).

where wi (x) =

Khx (x − xi ) n

∑ Khx (x − xi )

.

(A.41)

i=1

To understand what these estimators represent, it is thought useful to adopt a graphical approach. Figure A.1 shows 20 pairs of (xi , yi ), with their Gaussian kernels Khx (x − xi ) and Khy (y − yi ). (The y positions of these kernels are lifted to the levels of their respective yi for visualization purposes.) From Fig. A.1 (a), it is clear that Khx (x − xi ) is a function of x shifted to xi . Evidently, wi (x) in Eq. (A.41) is also a function of x. In other words, when the argument of the function takes a specific value, say x∗ , wi (x∗ ) evaluates into a number, and for different i, wi (x∗ ) evaluates into different numbers. Similarly from Fig. A.1 (b), we know that Khy (y − yi ) is a function of y. When these functions of y are summed according to the evaluated wi (x∗ )’s, the resultant function, i.e., f (y|x = x∗ ) is still a function of y. In this regard, Fig. A.2 (a) shows f (y|x = x∗ ) for 20 different x∗ values, namely, {x1 , · · · , x2 0}, using bandwidths hx = hy = 0.03. Similarly, Fig. A.2 (b) depicts the result using another set of bandwidths, hx = hy = 0.2. The smoothness of f (y|x) depends on the bandwidth chosen.

A.7

CALCULATING CRPS FROM ENSEMBLE FORECAST

In probabilistic forecasting, the forecast for a time instance at a location is not a number, instead, it is characterized by a predictive distribution, F, in that, the forecast can be thought of as a random variable X ∼ F, i.e., X is distributed according to F.

574

Solar Irradiance and Photovoltaic Power Forecasting

(a)

(b) 0.75

0.50

0.50 y

y

0.75

0.25

0.25

0.00

0.00 0.00

0.25

0.50 x

0.75

0.00

0.25

0.50 x

0.75

Figure A.2 Kernel conditional density estimates of f (y|x = x∗ ), at 20 different x∗ values, using bandwidths of (a) hx = hy = 0.03 and (b) hx = hy = 0.2.

The continuous ranked probability score (CRPS) of F, with respect to the corresponding verification y, is defined as: crps(F, y) =

∞ −∞

[F(x) −

2 x≥y ] dx,

(A.42)

where x is a generic integration variable, and x≥y is the Heaviside (or unit) step function shifted to y. In other words, x≥y takes the value of 1 if x ≥ y, 0 otherwise. This integral implies that the CRPS is evaluated for a continuous random variable with range (−∞, ∞). Of course, if the variable of interest, such as irradiance, is known to be bounded between (a, b), one can replace the limits of the integral accordingly. Now, instead of F, only an empirical cumulative distribution function (ECDF), F m , is available from an m-member ensemble forecast with members denoted as x1 , · · · , xm . And we are interested in the expression of CRPS(F m , y). Recall the ECDF is by definition F m (x) = m−1 ∑m i=1 x≥xi , Eq. (A.42) thus becomes: 2 1 m ∑ x≥xi − x≥y dx m i=1 a 

b 2 m 1 m m = − ∑ ∑ x≥xi x≥x j m ∑ m2 i=1 a j=1 i=1

 

crps F m , y =

=

b



8 x≥xi x≥y + x≥y

dx

2 m 1 m m [b − max(x , x )] − i j ∑∑ ∑ [b − max(xi , y)] + b − y m2 i=1 m i=1 j=1

=−

2 m 1 m m max(x , x ) + i j ∑∑ ∑ max(xi , y) − y. m2 i=1 m i=1 j=1

(A.43)

Statistical Derivations

575

Because 2 max(x, y) = |x − y| + x + y, Eq. (A.43) can be written as:   1 m m |xi − x j | + xi + x j crps F m , y = − 2 ∑ ∑ m i=1 j=1 2 + =−

1 m ∑ (|xi − y| + xi + y) − y m i=1 1 m m 1 m |x − x | + i j ∑ ∑ ∑ |xi − y|. 2m2 i=1 j=1 m i=1

(A.44)

It should be noted that when Eq. (A.44) can be written in terms of expectations, that is,     1 crps F m , y = − EF m |X − X  | + EF m (|X − y|) , 2 where X and X  are independent copies of the random variable that is distributed according to F m . Moreover, the more general, analogous expression for F is also valid. That is,  1  (A.45) crps (F, y) = − EF |X − X  | + EF (|X − y|) , 2

Derivation of the B Detailed Perez Model B.1

FUNDAMENTALS, DEFINITIONS, AND UNITS

Quantitative analysis of atmospheric radiation field often requires the consideration of the amount of radiant energy confined to an element of solid angle. A solid angle Ω, with a unit of steradian (sr), is defined as Ω = σ /r2 , where σ is the area of a segment of a spherical surface intercepted at the core, and r is the radius of the sphere, as shown in Fig. B.1. The differential solid angle element, dΩ, can thus be expressed as: dσ dΩ = 2 = sin ϑ dϑ dϕ, (B.1) r where ϑ and ϕ are the polar angle and azimuthal angle of a spherical coordinate system, respectively. The differential area, dσ , is illustrated in Fig. B.2. Ω σ r

Figure B.1 Definition of a solid angle. Given the monochromatic radiance at wavelength λ , namely, Lλ in W/m2 /sr, the monochromatic flux density (or monochromatic irradiance) is obtained by integrating the normal component1 of Lλ over the hemispheric solid angle, that is, Iλ = =



Ω

Lλ cos ϑ dΩ

2π π/2 0

0

Lλ (ϑ , ϕ) cos ϑ sin ϑ dϑ dϕ.

(B.2)

Finally, by integrating Iλ over the desired spectral range, i.e., I=

λ2 λ1

Iλ dλ ,

(B.3)

the corresponding irradiance of radiant energy over that spectral range can be obtained, which has the familiar unit of W/m2 . In radiation modeling, the word “irradiance” often refers to the shortwave irradiance, which is typically defined over a spectral range from 290 to 3000 nm (Blanc et al., 2014). 1 This is computed by multiplying L with the cosine of the angle between the incident direction and λ the surface normal.

DOI: 10.1201/9781003203971-B

576

Detailed Derivation of the Perez Model

577

r sin ϑ dϕ

rdϑ dσ ϑ



dA ϕ



Figure B.2 Illustration of a differential solid angle and its representation in polar coordinates.

If one wishes to derive irradiance from the above procedure through integration, the monochromatic radiance distribution, i.e., Lλ (ϑ , ϕ), or the total radiance over a spectral range, L(ϑ , ϕ) =

λ2 λ1

Lλ (ϑ , ϕ)dλ ,

(B.4)

is needed. As suggested by the need for having arguments ϑ and ϕ, L(ϑ , ϕ) is anisotropic, and its value depends on the position in the sky, as marked by the polar angle and azimuthal angle. For instance, the intensity of the beam radiance, often attributed to a point source at the center of the sun disk, would certainly be different from that of the diffuse radiance from the rest of the sky dome (Gracia et al., 2011; Ivanova and Gueymard, 2019). On this point, separated modeling of beam and diffuse radiation components was already a well-established practice in 1960s (Liu and Jordan, 1960). The beam irradiance on an arbitrary flat surface can be calculated using geometry. Thus, the literature thereafter has been focusing on the modeling of diffuse radiance and diffuse irradiance. In what follows, whenever the word radiance or the symbol L is used, it refers to diffuse radiance. Let Lλ (ϑ , ϕ) be the monochromatic radiance, by combining Eqs. (B.2), (B.3), and (B.4), we have Dh =

2π π/2 0

0

L(ϑ , ϕ) cos ϑ sin ϑ dϑ dϕ,

(B.5)

where Dh denotes broadband diffuse irradiance received by a horizontal surface. Therefore, to arrive at Dh , we are interested in knowing the expression of the radiance distribution of L(ϑ , ϕ). The earliest theoretical description of such a distribution was provided by Lord Rayleigh (Strutt, 1871). Since then, many attempts have been made to fit theories

578

Solar Irradiance and Photovoltaic Power Forecasting

of various schools to observed data, as exemplified by the works of Steven and Unsworth (1977, 1979, 1980). To give perspective of the procedure, consider the expression of the diffuse radiance distribution under an overcast sky (Steven and Unsworth, 1980): 1 + b cos ϑ L(ϑ ) = L(0) , (B.6) 1+b where the value of the constant b needs to be fitted, and L(0) is the radiance at the zenith, which can be measured by a radiance sensor, such as the EKO Sky Scanner (Torres et al., 2010). For example, Moon and Spencer (1942) suggested b = 2, Walsh (1961) suggested b = 1.5, whereas Goudriaan (1977) noticed an albedo dependency in the value of b, i.e., b varies from 1.5 to 1.14 for an albedo ranging from 0.1 to 0.2. The model described in Eq. (B.6) suggests that the anisotropy in diffuse radiance distribution is only a function of cos ϑ . Such ϑ -dependence might be sufficient for an overcast sky,2 provided that b can be fitted appropriately. However, in all-sky conditions, some other obvious treatments are required to better model the anisotropy. Recall that the beam radiance is assumed coming from a point source, however, due to forward scattering by aerosols, the sky in the vicinity of the sun—or the circumsolar region—appears brighter than the regions of the sky dome far from the sun. In another case, due to the larger airmass at the horizon as compared to that at the zenith, the blue light created by Rayleigh scattering is “diluted” by the white light created by Mie scattering. Hence, on a clear day, the horizon band appears white and bright. In this regard, the Perez model (Perez et al., 1986, 1987, 1988, 1990) separates the sky dome into three parts: (1) the circumsolar region, (2) the horizon band, and (3) the isotropic background.

B.2

THE THREE-PART GEOMETRICAL FRAMEWORK OF THE ORIGINAL PEREZ MODEL

Figure B.3 shows the three-part geometrical framework of the original Perez model (Perez et al., 1986). In the figure, the polar angle ϑ is replaced by the solar zenith angle Z, which is the angle between the normal of a horizon surface and the direction of the sun. Perez et al. (1986) assumed that the circumsolar region has a radius α = 15◦ , whereas the horizon band has an angular thickness of ξ = 6.5◦ . The overarching assumption of the original Perez model is that the radiances originated from these three parts are different, but remain constant within each part. The radiance of the circumsolar region is expressed as F1 × L, where F1 is a sky-condition-dependent coefficient, which will be discussed later. Following the integration method discussed in the previous section, the diffuse irradiance from this (cr) region seen by a horizontal surface, denoted as Dh , where the superscript “(cr)” 2 Under overcast skies, the horizontal directional effect is of little to no importance, hence the variation in radiance distribution on ϕ, the azimuth of the sun, may be ignored.

Detailed Derivation of the Perez Model

579

L F1 ×

α L

Z ξ

F2 × L

Figure B.3 An illustration of the three-part geometrical framework used in the original Perez model, with respect to a horizontal plane.

stands for “circumsolar region,” is given by: (cr) Dh

=



(cr)

(cr)

Ωh

F1 L dΩh

= F1 L cos Z = F1 L cos Z



(cr)

(cr)

dΩn

Ωn

2π α 0

0

sin ϑ dϑ dϕ

= 2π(1 − cos α)F1 L cos Z.

(B.7)

The integral in the first line of Eq. (B.7) represents the irradiance due to the circum(cr) solar region as seen by a horizontal surface, where Ωh denotes the corresponding solid angle, with subscript “h” indicating “horizontal,” i.e., the viewer’s plane. Since integrating the solid angle for an arbitrary region in the sky dome is difficult, it is possible to consider the solid angle as viewed by a plane normal to the incoming rays; this is expressed in the second line of Eq. (B.7), where the cos Z term “shifts” the circumsolar region to the zenith position, cf. Fig. B.3. Due to this change of viewing (cr) plane, Ωn , can be evaluated according to standard formula for the solid angle of a cone whose cross-section subtends the angle 2α, which is 2π (1 − cos α) sr; this is shown in the third and fourth lines of Eq. (B.7). There is however a caveat. Since for any Z > π/2−α, part of the circumsolar region falls below the horizon, hence should not contribute to the irradiance, and Eq. (B.7) must be modified to: (cr)

Dh

= 2π(1 − cos α)F1 LXh (Z) cos Z  ,

(B.8)

where Xh (·) is a function of Z, denoting the fraction of the circumsolar region above the horizon, and Z  is the average zenith angle of the visible part of the circumsolar

580

Solar Irradiance and Photovoltaic Power Forecasting

region. When the circumsolar region is fully visible, Xh (Z) = 1 and Z  = Z. The precise calculation method for Xh (Z) and Z  is complex, thus, in a later paper, Perez et al. (1987) gave an approximation of Xh (Z) cos Z  , that is,  cos Z, if Z < π2 − α,  χh ≡ Xh (Z) cos Z ≈ (B.9) ψh sin(ψh α), otherwise, ⎧ ⎨ π/2 − Z + α , if Z > π2 − α, ψh = 2α ⎩1, otherwise.

where

(B.10)

The second condition in Eq. (B.10) may appear redundant for now, it is nevertheless useful when the diffuse irradiance on a tilted surface is of interest, as we shall see below. The radiance originated from the horizon band is denoted as F2 × L, and F2 is another sky-condition-dependent coefficient. The diffuse irradiance coming from this (hb) part, denoted as Dh , where the superscript “(hb)” denotes “horizon band,” is thus: (hb)

Dh

=



(hb)

(hb)

Ωh

= F2 L =

F2 L dΩh

2π π/2 π/2−ξ

0

cos ϑ sin ϑ dϑ dϕ

πF2 L (1 − cos 2ξ ) , 2

(B.11)

cf. Eq. (B.5). In other words, instead of integrating the entire sky dome, ϑ is only integrated from π/2 − ξ to π/2, where ξ is the elevation angle of the upper edge of the horizon band. The evaluation of this integral would require the double angle formula. (ib) Lastly, the isotropic background contributes the diffuse irradiance, Dh , where the superscript “(ib)” denotes “isotropic background,” that is (ib)

Dh

= = =



(ib)

(ib)

Ωh



Ωh

L dΩh

L dΩh −

2π π/2 0

0

(cr)

L dΩh − (cr)

Ωh



(hb)

(hb)

Ωh

L dΩh (cr)

L cos ϑ sin ϑ dϑ dϕ − L ·

(cr)

= πL −



(hb)

D Dh −L· h F1 L F2 L

(hb)

Dh D − h , F1 F2

(B.12)

where the subtractions in the second line of Eq. (B.12) are due to the double-counting of the circumsolar region and horizon band during integration. Adding Eqs. (B.8),

Detailed Derivation of the Perez Model

581

(B.11), and (B.12), it follows that the diffuse irradiance on a horizontal surface, namely, the diffuse horizontal irradiance (DHI), can be expressed as: Dh = πL{1 + 2(1 − cos α)(F1 − 1)Xh (Z) cos Z  + 0.5(F2 − 1)(1 − cos 2ξ )}. (B.13) In this we arrive at Eq. (1) of Perez et al. (1986). Following a similar approach, the diffuse tilted irradiance (DTI), denoted with Dc , where c stands for “collector plane,” can be modeled. A diagram is shown in (cr) Fig. B.4, depicting the geometry involved in DTI modeling. The treatment for Dc , i.e., the portion of DTI that is attributed to the circumsolar region, is analogous to (cr) that of Dh : (cr)

Dc

= 2π(1 − cos α)F1 LXc (θ ) cos θ  ,

(B.14)

where Xc (·) is a function of the incident angle θ , denoting the fraction of the circumsolar region seen by the collector plane, and θ  is the average incident angle of the visible part of the circumsolar region. Here, the method for calculating Xc (θ ) and θ  is again complex, but Perez et al. (1987) suggested the following approximation: ⎧ ⎪ if θ < π2 − α, ⎨ψh cos θ ,  (B.15) χc ≡ Xc (θ ) cos θ ≈ ψh ψc sin(ψc α), if π2 − α ≤ θ ≤ π2 + α, ⎪ ⎩ 0, otherwise, with ψc =

π/2 − θ + α . 2α

(B.16)

→ − N L

S

F1 ×

θ

F2 ×

L

α

L

ξ

Figure B.4 An illustration of the three-part geometrical framework used in the original Perez model, with respect to a tilted plane. (hb)

The diffuse radiation on a tilted surface due to the horizon band, namely, Dc , is somewhat more complicated because of the integration involved. In Perez et al.

582

Solar Irradiance and Photovoltaic Power Forecasting (hb)

(1986), Dc was given as a sinusoidal approximation of the solid angle of the horizon band, seen by the corrector plane, i.e., (hb)

Dc

≈ 2F2 L ξ sin ξ  .

(B.17)

As mentioned by Torres et al. (2006), there are several expressions for ξ  in the literature, with some of them being wrong (e.g., the ones given in Perez et al., 1987; Utrillas and Martinez-Lozano, 1994). The correct one was given in Perez et al. (1986), and it should be:   1 S  ξ = S+ξ − . (B.18) 2 π Whereas the detailed integration procedure was not given in Perez et al. (1986), at least two attempts were made later on (Torres et al., 2006; Matagne and El Bachtiri, 2014). Torres et al. (2006) noted that if the horizon band were to be treated as a rectangular stripe up to 180◦ with flat ends (see Fig. 2 of their paper), the integration, even if without the sinusoidal approximation, would be inexact. Torres et al. (2006) then proposed to separate the horizon band into 3 or 5 thinner stripes (see Fig. 4 of that paper), so that the missing area near the band edges could be filled. Nonetheless, this procedure by Torres et al. (2006) is still not exact, and it did not consider the S < ξ case, namely, the case where collector’s tilt is lower than the horizon band. On this point, the issues were closed by Matagne and El Bachtiri (2014), who presented the exact analytical expression for the solid angle of the horizon band intercepted by an arbitrary tilted surface. In fact, their expression is suitable for integrating an isotropic sky up to an arbitrary angular height. More specifically, for any ξ , define Θ ≡ π/2 − ξ , then, if ξ < S ≤ π/2, Ω(hb) = c

cos Θ 1 1 2 1 + cos S sin S − cos2 Θ cos Θ + arcsin π π sin S 4 1 cos S arccos(− cot S cot Θ) cos 2Θ + 2π 1 cos S cos Θ , − cos S arctan  2π sin2 S − cos2 Θ

if S ≤ ξ ≤ π/2,

(B.19)

1 1 + cos S cos 2Θ, (B.20) 2 2 and if S ≥ π/2 and ξ > π − S, Eq. (B.19) can be used after replacing Θ with S − π/2. A comparison of different versions of Ω(hb) c , namely, the solid angle of the horizon band seen by a tilted collector, is shown in Fig. B.5. For a small ξ = 6.5◦ , the integration results of Torres et al. (2006) with 1 and 3 stripes are similar to the exact integration, whereas the approximation of Perez et al. (1986) differs slightly. As ξ increases, the differences between the approximations and the exact integration become obvious. Lastly, the portion of DTI due to the isotropic background is denoted as D(ib) c . By setting ξ = π/2 and using Eq. (B.20), one arrives with the familiar expression: Ω(hb) = c

Detailed Derivation of the Perez Model

583

Solid angle, Ωh

(hb)

[sr]

ξ = 6.5°

ξ = 15°

0.20 0.15

ξ = 30°

0.5

1.0

0.4

0.8

0.3

0.6

0.10 0.2 0.05

0.4

0.1 0

25

50

75

0

25 50 75 Collector tilt, S [°]

Perez et al. Torres et al. with 1 stripe

Method

0

25

50

75

Torres et al. with 3 stripes Matagne and El Bachtiri

Figure B.5 Comparison of different methods for calculating the solid angle due to the hori-

zon band seen by a tilted collector. Three angular widths ξ = 6.5◦ , 15◦ , and 30◦ are considered, versus the collector tilt s ranging from 0◦ to 90◦ .

(1 + cos S)/2. Another version of the derivation, using coordinate transformation, has been provided by Xie et al. (2018b), which gives identical result. Consequently, (ib)

Dc

= πL

(cr)

(hb)

1 + cos S Dc Dc − − , 2 F1 F2

(B.21)

if the circumsolar region and horizon band are subtracted from the hemisphere. By adding Eqs. (B.14), (B.17), and (B.21), the expression for DTI is hence: Dc = πL{0.5(1 + cos S) + 2(1 − cos α)(F1 − 1)Xc (θ ) cos θ  + 2ξ sin ξ  (F2 − 1)/π}.

(B.22)

This equation is identical to Eq. (2) of Perez et al. (1986). Dividing Eq. (B.22) with Eq. (B.13) gives the ratio of DTI and DHI, i.e., Dc 0.5(1 + cos S) + a(θ )(F1 − 1) + b(F2 − 1) , = Dh 1 + c(Z)(F1 − 1) + d(F2 − 1)

(B.23)

a(θ ) = 2(1 − cos α)Xc (θ ) cos θ  ,

(B.24)

where 

b = 2ξ sin ξ /π, c(Z) = 2(1 − cos α)Xh (Z) cos Z  , d = 0.5(1 − cos 2ξ ).

(B.25) (B.26) (B.27)

This concludes the modeling part of the original Perez model. It is worth noting that the ratio of DTI and DHI is called the diffuse transposition factor, commonly denoted

584

Solar Irradiance and Photovoltaic Power Forecasting

as Rd (Yang, 2016). When model coefficients F1 = F2 = 1, Eq. (B.23) reduces to Rd = 0.5(1 + cos S), which is the isotropic model, that can be traced at least to the early 1950s (Kamphuis et al., 2020). The original Perez model is complex. Moreover, there are still outstanding issues that have not been fully addressed. For instance, the effects on model performance are generally unclear. For due to those approximations used for χh , χc and D(hb) c low-sun condition, the circumsolar region overlaps with the horizon band, and thus leads to a double-counting problem during radiance integration. However, the most important drawback of the original Perez model is due to the lack of robustness and efficiency in its model coefficients. As mentioned earlier, F1 and F2 are sky-condition-dependent coefficients that need to be fitted using data. To describe the different sky conditions, Perez et al. (1986) considered three variables, namely, Z, Dh , and ε, the last of which is defined as: Dh + Bn , (B.28) ε= Dh where Bn is the beam normal irradiance (BNI). For each of these variables, several intervals, or bins, are defined, resulting in a three-dimensional array indexed by these intervals. More specifically, Z is divided into 5 bins; Dh is divided into 6 bins; and ε is divided into 8 bins (see Table 1 of Perez et al., 1986). Subsequently, for each of the 5 × 6 = 240 combination, F1 and F2 are fitted using least squares, creating a total of 480 model coefficients (not reported in the original paper). Obviously, this is not very efficient. Furthermore, considering the small amount of experimental data used by Perez et al. (1986), these fitted coefficients might not be robust for prediction at worldwide locations.

B.3

THE FRAMEWORK OF THE SIMPLIFIED PEREZ MODEL

A year after the original Perez model was proposed, a new, more accurate and much simpler version was proposed by Perez et al. (1987). The simplified Perez model provided four major modifications to the original Perez model: (1) a reparameterization of the model coefficients; (2) allowance for negative coefficients; (3) a simplified geometric framework; and (4) a revised binning strategy for differentiating the sky conditions. The following paragraphs give a brief account on these modifications. B.3.1

REPARAMETERIZATION OF THE MODEL COEFFICIENTS

Let us recall that the overall diffuse irradiance on a tilted plane is written as sum of the three diffuse components coming from the circumsolar region, horizon band, and isotropic background, i.e., (hb) (ib) Dc = D(cr) c + Dc + Dc .

(B.29)

Detailed Derivation of the Perez Model

585

Then, by dividing both sides with Dh , and including some cancellable terms, one yields: (cr) (hb) D(cr) D(hb) D(ib) Dc c Dh c Dh c = + + Dh Dh Dh D(cr) Dh D(hb) h h (cr)

=

(hb)

D(ib) a(θ ) Dh b Dh c + + Dh c(Z) Dh d Dh

(B.30)

 1 + cos S  + D(hb) . Dh − D(cr) h h 2

(B.31)

By definition, we know that D(ib) c =

 Hence, by combining Eqs. (B.30) and (B.31), and defining F1 ≡ D(cr) h /Dh and F2 ≡ (hb) Dh /Dh , one obtains a reparameterized original Perez model:

 a(θ )  b  1 + cos S  Dc 1 − F1 − F2 + F + F. = Dh 2 c(Z) 1 d 2

(B.32)

Perez et al. (1987) referred F1 and F2 as the reduced brightness coefficients, which represent the normalized (by Dh ) contributions of the circumsolar region and horizon band to DHI. On the other hand, the original coefficients represent the increase in radiance over the isotropic background in those regions. The new coefficients F1 and F2 are related to the original coefficients F1 and F2 through the following equations: c(Z)F1 , 1 + c(Z)(F1 − 1) + d(F2 − 1) dF2 F2 = , 1 + c(Z)(F1 − 1) + d(F2 − 1)

F1 =

(B.33) (B.34)

(cr)  which can be derived by plugging in D(cr) h and Dh expressions to F1 = Dh /Dh , and (hb) (hb) Dh and Dh expressions to F2 = Dh /Dh , respectively. On this point, there are two minor typographic errors in Eqs. (6) and (7) of Perez et al. (1987).

B.3.2

ALLOWANCE FOR NEGATIVE COEFFICIENTS

In Perez et al. (1986), it was noted that the coefficients of the original Perez mode, namely, F1 and F2 , have a lower bound of 1. Consequently, the model could explain only the brightening of the circumsolar region and horizon band, as compared to the isotropic background. Notwithstanding, Perez et al. (1987) argued that during overcast situations, the zenith radiance is the highest—this is also reflected by the diffuse radiance model of Steven and Unsworth (1980), i.e., Eq. (B.6)—hence, the allowance of a negative F2 could explains situations when the top of the sky dome is brightest.

586

B.3.3

Solar Irradiance and Photovoltaic Power Forecasting

PHYSICAL SURROGATE OF THE THREE-PART GEOMETRICAL FRAMEWORK

The three-part geometrical framework shown in Figs. B.3 and B.4 is evidently quite complex, especially in terms of solid angle integration, as we have seen in the previous section. To that end, the simplified Perez model considered a physical surrogate model, which maintains the concept of using a three-part modeling framework, but reduces the modeling complexity. More specially, the simplified Perez model assumes the brightening of the horizon band is concentrated at the horizon, and the brightening of the circumsolar region is concentrated at the center of the disk. Firstly, if the brightening is assumed to occur exactly at the horizon, it can no longer be seen by a horizontal surface. Moreover, the irradiance originated from the infinitesimally thin line can be fully captured by a vertical surface—due to circular symmetry, the orientation of the vertical surface does not matter. When the collector surface is tilted at an angle S, the fraction of irradiance in the direction normal to the collector surface is sin S, see Fig. B.6. It follows that Eq. (B.32) can be reduced to:  a(θ )  Dc 1 + cos S  1 − F1 + F + F2 sin S. = (B.35) Dh 2 c(Z) 1  However, it should be noted that the physical  meaning ofF2 has now changed from (hb)

that in Eq. (B.32). More specifically, F2 = Dc / sin S /Dh , that is, the ratio between the diffuse irradiance originated from the (half) horizon band intercepted by the tilted surface and DHI. Furthermore, owing to the fact that the infinitesimal horizon band is not seen by horizontal surfaces, the “double-counting” problem when computing the diffuse irradiance from the isotropic background is no longer there. To that effect, F2 is also removed from the first term of Eq. (B.32). → − Z

Tilted plane

→ − N

sin S Horizon band

S

S

Figure B.6 Irradiance contribution from an infinitesimally thin horizon band, seen by a tilted plane. A side view is presented. As for the circumsolar region, once it is concentrated on a point, its effect can be accounted for using simple geometry, which leads to:  1 + cos S  cos θ Dc 1 − F1 + F1 = (B.36) + F2 sin S. Dh 2 cos Z Equation (B.36) is known as the simplified Perez model.

Detailed Derivation of the Perez Model

B.3.4

587

REVISED BINNING FOR DIFFERENTIATING THE SKY CONDITIONS

The sky conditions, as considered by the original Perez model, are defined through three variables, namely, the zenith angle Z, DHI Dh , and a parameter ε as defined in Eq. (B.28). In the simplified Perez model, Dh is replaced by a new parameter, Δ, in that: D h mr , (B.37) Δ= E0n where mr is the relative (not pressure-corrected) air mass and E0n is the extraterrestrial irradiance (ETR). Whereas E0n can be obtained from a solar positioning algorithm, there are several formulations of m (e.g., Kasten and Young, 1989; Gueymard, 1993). Given the time when the 1987 article was prepared, it is likely that the formulation of Kasten (1966) was used:  −1 , (B.38) mr = cos Z + 0.15 × (93.885 − θZ )−1.253 where θZ is solar zenith angle in degrees; this is the only occasion, throughout this appendix, where zenith angle is used in degrees. In any case, the choice of air mass formulation is thought to have only a marginal effect on the performance of the Perez model. It should be, however, noted that the parameter Δ is called sky’s brightness which, to some extent, describes the physical aspect of the sky radiance. Besides Δ, another change also took place in the parameterization of the simplified Perez model, that is, the binning strategy. In the original Perez model, as discussed above, the binning is performed in a three-dimensional space, in that, the values of each variable are separated into different ranges, which has led to a total of 240 distinct sky conditions. This, on one hand, eliminates the need for computing F1 and F2 , but on the other hand, requires one to store a size-480 look-up table—240 values for F1 and 240 for F2 —which is somewhat inconvenient, insofar as implementation is concerned. For that reason, Perez et al. (1987) removed the binning requirements on Δ and Z. Instead, binning is only applied for ε. Stated differently, in the simplified Perez, calculated Δ and Z values are used “as is.” Mathematically, the new coefficients F1 and F2 are given as:    (ε) + F12 (ε) · Δ + F13 (ε) · Z, F1 = F11     F2 = F21 (ε) + F22 (ε) · Δ + F23 (ε) · Z,

(B.39) (B.40)

 (ε), F  (ε), · · · , F  (ε) depend on the bin in which ε rewhere the values of F11 12 23 sides. Solar zenith angles in Eqs. (B.39) and (B.40) are in radians. This concludes the review for the simplified Perez model as appeared in Perez et al. (1987).

B.4

THE CANONICAL PEREZ MODEL

Three years after the simplified Perez model was published, the canonical version of the Perez model became available (Perez et al., 1990), which is also the most popular version of the Perez model. The chief reason which has led to the widespread uptake of the model is that the set of model coefficients can be considered to have reached

588

Solar Irradiance and Photovoltaic Power Forecasting

an asymptotic level of optimization. In other words, the model coefficients can lead to satisfactory performance for most cases, regardless of geographical location and orientation of the inclined surface. As compared to the simplified Perez model as appeared in Perez et al. (1987), three changes are made in (Perez et al., 1990). The first change is on the modeling of sky’s clearness, ε. In order to remove the zenith dependence of this parameter, its formulation has been modified to: ε =

(Dh + Bn )/Dh + 1.041Z 3 , 1 + 1.041Z 3

(B.41)

where Z is in radians. The second change is on the binning of sky’s clearness. Since the partition of ε is performed in a way such that the mean variation of F1 and F2 in each bin is approximately equal to that in other bins (Perez et al., 1987), once the data is updated, the binning automatically would change. As compared to the earlier work, which only used data from two French sites, Perez et al. (1990) considered a large collection of data from nine locations, namely, Albany, Geneva, Los Angeles, Albuquerque, Phoenix, Cape Canaveral, Osage, Trappes. and Carpentras. In this regard, the new partitions of ε  are shown in the left-most column of Table 11.3. The last and the most important change is on the model coefficients, which are also listed in Table 11.3, resulting from the fitting of new data to the simplified Perez model. With that, we complete the derivation of the Perez model.

C

Derivation of the Momentum Equations

In Chapter 7 the momentum equations are introduced. In many textbooks and occasions, Eqs. (7.1)–(7.3) are often combined and expressed as a vector momentum equation: DV 1 (C.1) = −2Ω × V − ∇p +g + Fr , Dt ρ where V = uiˆ + v jˆ + wkˆ is the wind vector. Recall that the (x, y, z) coordinates are defined to point east, north, and up, as such the coordinate system is not strictly Cartesian. Since the directions of the unit vectors depend on their position on the earth’s surface, such dependence should be accounted for mathematically, by adding terms to each component of the total derivative: total derivative

 ! DV Dt

 =

 Du uv tan φ uw ˆ − + i Dt r r     Dw u2 + v2 ˆ Dv u2 tan φ vw ˆ + + j+ − k. + Dt r r Dt 2

(C.2)

Terminology-wise, the last (two) terms in each bracket on the right-hand-side of Eq. (C.2) are known as the curvature terms. The first right-hand-side term in Eq. (C.1) denotes the Coriolis acceleration, which expands into: Coriolis acceleration

  ! −2Ω × V

  iˆ  = 2Ω 0 u

 kˆ  sin φ  w 

jˆ cos φ v

ˆ = (2Ωv sin φ − 2Ωw cos φ ) iˆ − 2Ωu sin φ jˆ + 2Ωu cos φ k.

(C.3)

The second right-hand-side term in Eq. (C.1) is the pressure gradient term, in that, pressure gradient

 ! 1 ∇p ρ

=

1 ρ



 ∂pˆ ∂p ˆ ∂pˆ k , i+ j+ ∂x ∂y ∂z

(C.4)

where ∇ is the del operator—∇p is the gradient of p, and thus the name of the term. Next, the gravity term in Eq. (C.1) is simply gravity

 ! ˆ g = −gk,

DOI: 10.1201/9781003203971-C

(C.5) 589

590

and the friction term is:

Solar Irradiance and Photovoltaic Power Forecasting friction

 ! ˆ Fr = Frx iˆ + Fry jˆ + Frz k.

(C.6)

By combining Eqs. (C.2)–(C.6) for each of iˆ, jˆ, and kˆ components, one yields Eqs. (7.1)–(7.3).

Acronyms 4D-Var: four-dimensional variational data assimilation ABI: Advanced Baseline Imager ADF: augmented Dickey–Fuller (hypothesis test) ADS: automated dispatch system; Atmosphere Data Store AFT: ARIMA with Fourier terms AGC: automatic generation control AGRI: Advanced Geosynchronous Radiation Imager AIC: Akaike’s information criterion AnEn: analog ensemble ANN: artificial neural network ANOC: Assimilation of NWP winds and Optical flow CMVs (system) AOD: aerosol optical depth API: application programming interface ARIMA: autoregressive integrated moving average AROME: M´et´eo-France’s Application of Research to Operations at Mesoscale ASHRAE: American Society of Heating, Refrigeration, and Air Conditioning Engineers AST: appearent solar time ASTM: American Society for Testing and Materials BHI: beam horizontal irradiance BLP: beta-transformed linear pool BMA: Bayesian model averaging BMS: Baseline Measurement System BNI: beam normal irradiance BQN: Bernstein quantile network BRDF: bidirectional reflectance distribution function (MODIS, remote sensing) BS: Brier score BSRN: Baseline Surface Radiation Network CAISO: California Independent System Operator CAMS: Copernicus Atmosphere Monitoring Service CAMS-Rad: CAMS Radiation Service CapEx: capital expenditure CCG: Central China Grid CCN: cloud condensation nuclei 591

592

Acronyms

CDF: cumulative distribution function CDS: Climate Data Store CEC: California Energy Commission (module library) CERES: Clouds and the Earth’s Radiant Energy System CERN: Chinese Ecosystem Research Network CFL: Courant–Friedrichs–Lewy (condition) CFSR: NCEP’s Climate Forecast System Reanalysis CIRACast: Cooperative Institute for Research in the Atmosphere Nowcast (model) CLIPER: optimal convex combination of climatology and persistence CMA: China Meteorological Administration CMV: cloud motion vector CNN: convolutional neural network CONUS: continental United States COSMO: Consortium for Small-scale Modeling (system) CQRS: constrained quantile regression splines CRAN: Comprehensive R Archive Network CRPS: continuous ranked probability score CSG: China Southern Power Grid CSP: concentrating solar power CWC: coverage width-based criterion CV: cross validation D2D: deterministic-to-deterministic (post-processing) D2P: deterministic-to-probabilistic (post-processing) DAM: day-ahead market DEM: digital elevation model DHI: direct horizontal irradiance DM: Diebold–Mariano (hypothesis test) DOT: dispatch operating target DPV: distributed photovoltaic DRA: Desert Rock (station) DSO: distribution system operator DSTATCOM: distribution static compensator DSTM: dynamical spatio-temporal model EBAF: Energy Balanced and Filled (a CERES’ product) ECDF: empirical cumulative distribution function ECG: East China Grid ECMWF: European Centre for Medium-Range Weather Forecasts

Acronyms ELM: extreme learning machine ELRP: Emergency Load Reduction Program ELUBE: ensemble lower-upper bound estimation EMOS: ensemble model output statistics ENIAC: Electronic Numerical Integrator and Computer ENS: ECMWF’s Ensemble model EPS: Ensemble Prediction System ERA5: fifth-generation ECMWF Reanalysis ERL: extremely rare limits ETR: extraterrestrial radiation ETS: exponential smoothing EUMETSAT: European Organisation for the Exploitation of Meteorological Satellites FARMS: Fast All-sky Radiation Model for Solar applications FPK: Fort Peck (station) FY-4A: Fengyun-4A GAM: generalized additive model GAMLSS: generalized additive models for location, scale and shape GBRT: gradient boosted regression tree GCR: ground coverage ratio GEFCom: Global Energy Forecasting Competition GEOS: Goddard Earth Observing System GES DISC: Goddard Earth Sciences Data and Information Services Center GEV: generalized extreme value (distribution) GHI: global horizontal irradiance GLM: generalized linear model GLS: generalized least squares GMAO: Global Modeling and Assimilation Office GOES: Geostationary Operational Environmental Satellites GRU: gate recurrent units GTI: global tilted irradiance GUI: graphical user interface GWN: Goodwin Creek (station) HDF: hierarchical data format HLS: hierarchical least squares HRES: ECMWF’s High-Resolution model HRRR: High-Resolution Rapid Refresh IAM: incident angle modifier

593

594

Acronyms

IDM: intra-day market IEEE: Institute of Electrical and Electronics Engineers IFS: Integrated Forecasting System IGN: ignorance score iid: independent and identically distributed IMS: Interactive Multisensor Snow and Ice Mapping System IoT: internet of things LPPP: loss of produced power probability ISO: independent system operator; International Organization for Standardization JAXA: JMA’s Japan Aerospace Exploration Agency JMA: Japan Meteorological Agency KCDE: kernel conditional density estimation KGC: K¨oppen–Geiger climate classification KGPV: K¨oppen–Geiger–photovoltaic (climate classification) KPSS: Kwiatkowski–Phillips–Schmidt–Shin (hypothesis test) KS: Komogorov–Smirnov (hypothesis test) LAD: least absolute deviation (regression) lasso: least absolute shrinkage and selection operator LCOE: levelized cost of electricity LPQR: lasso-penalized quantile regression LSTM: long short-term memory LTLF: long-term load forecasting LUT: look-up table MACC: Monitoring Atmospheric Composition and Climate (reanalysis) MAE: mean absolute error MAPE: mean absolute percentage error MARS: Meteorological Archival and Retrieval System MBE: mean bias error MCM: Markov-chain mixture (distribution model) MCP: measure–correlate–predict MERRA-2: Modern Era Retrospective analysis for Research and Applications, version 2 MFG: Meteosat First Generation MIDC: Measurement and Instrumentation Data Center MinT: minimum trace (estimator) MISR: Multi-angle Imaging SpectroRadiometer MLP: multilayer perceptron MLR: multiple linear regression

Acronyms MODIS: Moderate-resolution Imaging Spectroradiometer MOS: model output statistics MPP: maximum power point MPPT: maximum power point tracker MRE: mean relative error MSE: mean square error MSG: Meteosat Second Generation MTSAT: Multifunction Transport Satellite MVUE: minimum variance unbiased estimator NAM: North American Mesoscale Forecast System NASA: National Aeronautics and Space Administration NCEP: National Centers for Environmental Prediction NEA: National Energy Administration NEG: Northeast China Grid NGR: nonhomogeneous Gaussian regression NOAA: National Oceanic and Atmospheric Administration NOCT: nominal operating cell temperature NOMADS: NOAA’s Operational Model Archive and Distribution System NREL: National Renewable Energy Laboratory NWP: numerical weather prediction NSRDB: National Solar Radiation Database O&M: operations & maintenance OLR: outgoing longwave radiation OLS: ordinary least squares OLTC: on-load tap changer OpEx: operational expenditure OSMG: Oahu Solar Measurement Grid P2D: probabilistic-to-deterministic (post-processing) P2P: probabilistic-to-probabilistic (post-processing) PBL: planetary boundary layer PDF: probability density function PeEn: persistence ensemble PIT: probability integral transform PMF: probability mass function POA: plane of array PPF: probabilistic power flow PPL: physically possible limits

595

596 PSM: Physical Solar Model PSU: Penn. State University (station) PV: photovoltaic PyPI: Python Package Index QRF: quantile regression forest QRNN: quantile regression neural network QS: quantile score RAP: Rapid Refresh REST2: Reference Evaluation of Solar Transmittance, 2 bands RMSE: root mean square error RMSPE: root mean square percentage error RMV: root mean variance RNN: recurrent neural network RPS: ranked probability score RRTM: Rapid Radiative Transfer Model RSS: residual sum of squares RTED: real-time economic dispatch RTM: real-time market RTUC: real-time unit commitment RUC: Rapid Update Cycle SAM: System Advisory Model SAPM: Sandia Array Performance Model SCADA: supervisory control and data acquisition (power system) SCOPE: Spectral Cloud Optical Property Estimation SEVIRI: Spinning Enhanced Visible and Infrared Imager SGCC: State Grid Corporation of China SLP: spread-adjusted linear pool SOLRAD: Solar Radiation Network SPA: solar position algorithm SPDIS: Solar Power Data for Integration Studies SRML: Solar Radiation Monitoring Laboratory SRTM: Shuttle Radar Topography Mission SSRD: surface solar radiation downwards STC: standard test condition STLF: short-term load forecasting STUC: short-term unit commitment SURFRAD: Surface Radiation Budget Network

Acronyms

Acronyms SVM: support vector machine SXF: Sioux Falls (station) TBL: Table Mountain (station) TKE: turbulence kinetic energy TLP: traditional linear pool TMY: typical meteorological year TOA: top of atmosphere TSO: transmission system operator UTC: Coordinated Universal Time VAR: vector autoregressive VIIRS: Visible Infrared Imaging Radiometer Suite VST: variance stabilizing transformation WLS: weighted least squares WRF: Weather Research and Forecasting WRMC: World Radiation Monitoring Center

597

References 3TIER (2010). Development of regional wind resource and wind plant output datasets. Technical Report NREL/SR-550-47676, 3TIER, Seattle, WA (United States). Aalami, H. A., Pashaei-Didani, H., and Nojavan, S. (2019). Deriving nonlinear models for incentive-based demand response programs. International Journal of Electrical Power & Energy Systems, 106:223–231. Abdeen, E., Orabi, M., and Hasaneen, E.-S. (2017). Optimum tilt angle for photovoltaic system in desert environment. Solar Energy, 155:267–280. Abedinia, O., Amjady, N., and Zareipour, H. (2017). A new feature selection technique for load and price forecast of electrical power systems. IEEE Transactions on Power Systems, 32(1):62–74. Abolghasemi, M., Hyndman, R. J., Spiliotis, E., and Bergmeir, C. (2022). Model selection in reconciling hierarchical time series. Machine Learning, 111(2). Abreu, E. F. M., Canhoto, P., and Costa, M. J. (2019). Prediction of diffuse horizontal irradiance using a new climate zone model. Renewable and Sustainable Energy Reviews, 110:28–42. Acikgoz, H. (2022). A novel approach based on integration of convolutional neural networks and deep feature selection for short-term solar radiation forecasting. Applied Energy, 305:117912. Agoua, X. G., Girard, R., and Kariniotakis, G. (2019). Probabilistic models for spatiotemporal photovoltaic power forecasting. IEEE Transactions on Sustainable Energy, 10(2):780–789. Ahmed, R., Sreeram, V., Mishra, Y., and Arif, M. D. (2020). A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renewable and Sustainable Energy Reviews, 124:109792. Aicardi, D., Mus´e, P., and Alonso-Su´arez, R. (2022). A comparison of satellite cloud motion vectors techniques to forecast intra-day hourly solar global horizontal irradiation. Solar Energy, 233:46–60. Akarslan, E. and Hocaoglu, F. O. (2017). A novel method based on similarity for hourly solar irradiance forecasting. Renewable Energy, 112:337–346. Akarslan, E., Hocaoglu, F. O., and Edizkan, R. (2018). Novel short term solar irradiance forecasting models. Renewable Energy, 123:58–66. Alasseri, R., Rao, T. J., and Sreekanth, K. J. (2020). Institution of incentive-based demand response programs and prospective policy assessments for a subsidized electricity market. Renewable and Sustainable Energy Reviews, 117:109490. Alessandrini, S., Delle Monache, L., Sperati, S., and Cervone, G. (2015a). An analog ensemble for short-term probabilistic solar power forecast. Applied Energy, 157:95–110. Alessandrini, S., Delle Monache, L., Sperati, S., and Nissen, J. (2015b). A novel application of an analog ensemble for short-term wind power forecasting. Renewable Energy, 76:768– 781. Alessandrini, S., Sperati, S., and Pinson, P. (2013). A comparison between the ECMWF and COSMO Ensemble Prediction Systems applied to short-term wind power forecasting on real data. Applied Energy, 107:271–280.

599

600

References

Alvo, M. and Yu, P. L. H. (2014). Exploratory analysis of ranking data. In Statistical Methods for Ranking Data, Frontiers in Probability and the Statistical Sciences, pages 7–21. Springer, New York. Amara-Ouali, Y., Fasiolo, M., Goude, Y., and Yan, H. (2023). Daily peak electrical load forecasting with a multi-resolution approach. International Journal of Forecasting, 39(3):1272– 1286. Amaro e Silva, R. and Brito, M. C. (2018). Impact of network layout and time resolution on spatio-temporal solar forecasting. Solar Energy, 163:329–337. Amillo, A. G., Huld, T., and M¨uller, R. (2014). A new database of global and direct solar radiation using the Eastern Meteosat satellite, models and validation. Remote Sensing, 6(9):8165–8189. Amjady, N. (2017). Short-term electricity price forecasting. In Catal˜ao, J. P. S., editor, Electric Power Systems: Advanced Forecasting Techniques and Optimal Generation Scheduling. Routledge. Andales, A. A., Bauder, T. A., and Doesken, N. J. (2009). Colorado Agricultural meteorological network (CoAgMet) and crop ET reports. Technical Report No. 4.723, Colorado State University. Anderson, J. L. (1996). A method for producing and evaluating probabilistic forecasts from ensemble model integrations. Journal of Climate, 9(7):1518–1530. Andrawis, R. R., Atiya, A. F., and El-Shishiny, H. (2011). Combination of long term and short term forecasts, with application to tourism demand forecasting. International Journal of Forecasting, 27(3):870–886. Andr´e, M., Perez, R., Soubdhan, T., Schlemmer, J., Calif, R., and Monjoly, S. (2019). Preliminary assessment of two spatio-temporal forecasting technics for hourly satellite-derived irradiance in a complex meteorological context. Solar Energy, 177:703–712. Appelbaum, J. and Bany, J. (1979). Shadow effect of adjacent solar collectors in large scale systems. Solar Energy, 23(6):497–507. Appelbaum, J., Massalha, Y., and Aronescu, A. (2019). Corrections to anisotropic diffuse radiation model. Solar Energy, 193:523–528. Arbizu-Barrena, C., Ruiz-Arias, J. A., Rodr´ıguez-Ben´ıtez, F. J., Pozo-V´azquez, D., and TovarPescador, J. (2017). Short-term solar radiation forecasting by advecting and diffusing MSG cloud index. Solar Energy, 155:1092–1103. Arias-Castro, E., Kleissl, J., and Lave, M. (2014). A Poisson model for anisotropic solar ramp rate correlations. Solar Energy, 101:192–202. Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79. Armstrong, J. S. (1980). Unintelligible management research and academic prestige. INFORMS Journal on Applied Analytics, 10(2):80–86. Armstrong, J. S. (2001). Combining forecasts. In Principles of Forecasting, pages 417–439. Springer. Aryaputera, A. W., Verbois, H., and Walsh, W. M. (2016). Probabilistic accumulated irradiance forecast for singapore using ensemble techniques. In 2016 IEEE 43rd Photovoltaic Specialists Conference (PVSC), pages 1113–1118. Aryaputera, A. W., Yang, D., Zhao, L., and Walsh, W. M. (2015). Very short-term irradiance forecasting at unobserved locations using spatio-temporal kriging. Solar Energy, 122:1266–1278. Ascencio-V´asquez, J., Brecl, K., and Topiˇc, M. (2019). Methodology of K¨oppen-GeigerPhotovoltaic climate classification and implications to worldwide mapping of PV system performance. Solar Energy, 191:672–685.

References

601

Ashouri, M., Hyndman, R. J., and Shmueli, G. (2022). Fast forecast reconciliation using linear models. Journal of Computational and Graphical Statistics, 31(1):263–282. Astriani, Y., Shafiullah, G., and Shahnia, F. (2021). Incentive determination of a demand response program for microgrids. Applied Energy, 292:116624. Athanasopoulos, G., Ahmed, R. A., and Hyndman, R. J. (2009). Hierarchical forecasts for Australian domestic tourism. International Journal of Forecasting, 25(1):146–166. Athanasopoulos, G., Gamakumara, P., Panagiotelis, A., Hyndman, R. J., and Affan, M. (2020). Hierarchical forecasting. In Fuleky, P., editor, Macroeconomic Forecasting in the Era of Big Data: Theory and Practice, pages 689–719. Springer International Publishing. Athanasopoulos, G., Hyndman, R. J., Kourentzes, N., and Petropoulos, F. (2017). Forecasting with temporal hierarchies. European Journal of Operational Research, 262(1):60–74. Atiya, A. F. (2020). Why does forecast combination work so well? International Journal of Forecasting, 36(1):197–200. Augustine, J. A., DeLuisi, J. J., and Long, C. N. (2000). SURFRAD–A National Surface Radiation Budget Network for atmospheric research. Bulletin of the American Meteorological Society, 81(10):2341–2358. Augustine, J. A., Hodges, G. B., Cornwall, C. R., Michalsky, J. J., and Medina, C. I. (2005). An update on SURFRAD—The GCOS Surface Radiation Budget Network for the continental United States. Journal of Atmospheric and Oceanic Technology, 22(10):1460–1472. Ayet, A. and Tandeo, P. (2018). Nowcasting solar irradiance using an analog method and geostationary satellite images. Solar Energy, 164:301–315. Ayompe, L. M., Duffy, A., McCormack, S. J., and Conlon, M. (2010). Validated real-time energy models for small-scale grid-connected PV-systems. Energy, 35(10):4086–4091. Bacher, P., Madsen, H., and Nielsen, H. A. (2009). Online short-term solar power forecasting. Solar Energy, 83(10):1772–1783. Badescu, V. (2002). 3d isotropic approximation for solar diffuse irradiance on tilted surfaces. Renewable Energy, 26(2):221–233. Bakker, K., Whan, K., Knap, W., and Schmeits, M. (2019). Comparison of statistical postprocessing methods for probabilistic NWP forecasts of solar radiation. Solar Energy, 191:138–150. Baran, S. and Lerch, S. (2016). Mixture EMOS model for calibrating ensemble forecasts of wind speed. Environmetrics, 27(2):116–130. Baran, S. and Lerch, S. (2018). Combining predictive distributions for the statistical postprocessing of ensemble forecasts. International Journal of Forecasting, 34(3):477–496. Baran, S. and Nemoda, D. (2016). Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting. Environmetrics, 27(5):280– 292. Barnett, T. P., Ritchie, J., Foat, J., and Stokes, G. (1998). On the space–time scales of the surface solar radiation field. Journal of Climate, 11(1):88–96. Barry, J., B¨ottcher, D., Pfeilsticker, K., Herman-Czezuch, A., Kimiaie, N., Meilinger, S., Schirrmeister, C., Deneke, H., Witthuhn, J., and G¨odde, F. (2020). Dynamic model of photovoltaic module temperature as a function of atmospheric conditions. Advances in Science and Research, 17:165–173. Bates, J. M. and Granger, C. W. J. (1969). The combination of forecasts. Journal of the Operational Research Society, 20(4):451–468. Bauer, P., Dueben, P., Chantry, M., Doblas-Reyes, F., Hoefler, T., McGovern, A., and Stevens, B. (2023). Deep learning and a changing economy in weather and climate prediction. Nature Reviews Earth & Environment, 4(8):507–509.

602

References

Bauer, P., Thorpe, A., and Brunet, G. (2015). The quiet revolution of numerical weather prediction. Nature, 525(7567):47–55. Baumhefner, D. P. (1984). The relationship between present large-scale forecast skill and new estimates of predictability error growth. AIP Conference Proceedings, 106(1):169–180. Beltran, H., Cardo-Miota, J., Segarra-Tamarit, J., and P´erez, E. (2021). Battery size determination for photovoltaic capacity firming using deep learning irradiance forecasts. Journal of Energy Storage, 33:102036. Ben Bouall`egue, Z. (2017). Statistical postprocessing of ensemble global radiation forecasts with penalized quantile regression. Meteorologische Zeitschrift, 26(3):253–264. Ben Taieb, S. and Hyndman, R. J. (2014). A gradient boosting approach to the Kaggle load forecasting competition. International Journal of Forecasting, 30(2):382–394. Ben Taieb, S., Taylor, J. W., and Hyndman, R. J. (2021). Hierarchical probabilistic forecasting of electricity demand with smart meter data. Journal of the American Statistical Association, 116(533):27–43. ´ and Cira, C.-I. (2022). ReBenavides Cesar, L., Amaro e Silva, R., Manso Callejo, M. A., view on spatio-temporal solar forecasting methods driven by in situ measurements or their combination with satellite and numerical weather prediction (NWP) estimates. Energies, 15(12):4341. Benjamin, S. G., D´ev´enyi, D., Weygandt, S. S., Brundage, K. J., Brown, J. M., Grell, G. A., Kim, D., Schwartz, B. E., Smirnova, T. G., Smith, T. L., and Manikin, G. S. (2004). An hourly assimilation–forecast cycle: The RUC. Monthly Weather Review, 132(2):495–518. Benjamin, S. G., Weygandt, S. S., Brown, J. M., Hu, M., Alexander, C. R., Smirnova, T. G., Olson, J. B., James, E. P., Dowell, D. C., Grell, G. A., Lin, H., Peckham, S. E., Smith, T. L., Moninger, W. R., Kenyon, J. S., and Manikin, G. S. (2016). A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Monthly Weather Review, 144(4):1669–1694. Bentzien, S. and Friederichs, P. (2014). Decomposition and graphical portrayal of the quantile score. Quarterly Journal of the Royal Meteorological Society, 140(683):1924–1934. Berner, J., Achatz, U., Batt´e, L., Bengtsson, L., de la C´amara, A., Christensen, H. M., Colangeli, M., Coleman, D. R. B., Crommelin, D., Dolaptchiev, S. I., Franzke, C. L. E., Friederichs, P., Imkeller, P., J¨arvinen, H., Juricke, S., Kitsios, V., Lott, F., Lucarini, V., Mahajan, S., Palmer, T. N., Penland, C., Sakradzija, M., von Storch, J.-S., Weisheimer, A., Weniger, M., Williams, P. D., and Yano, J.-I. (2017). Stochastic parameterization: Toward a new view of weather and climate models. Bulletin of the American Meteorological Society, 98(3):565–588. Bessa, R. J., Trindade, A., and Miranda, V. (2015a). Spatial-temporal solar power forecasting for smart grids. IEEE Transactions on Industrial Informatics, 11(1):232–241. Bessa, R. J., Trindade, A., Silva, C. S. P., and Miranda, V. (2015b). Probabilistic solar power forecasting in smart grids using distributed information. International Journal of Electrical Power & Energy Systems, 72:16–23. Bessac, J., Constantinescu, E., and Anitescu, M. (2018). Stochastic simulation of predictive space–time scenarios of wind speed using observations and physical model outputs. The Annals of Applied Statistics, 12(1):432–458. Beyer, H. G., Betcke, J., Drews, A., Heinemann, D., Lorenz, E., Heilscher, G., and Bofinger, S. (2004). Identification of a general model for the MPP performance of PV-modules for the application in a procedure for the performance check of grid connected systems. In 19th European Photovoltaic Solar Energy Conference and Exhibition, Paris, France, pages 1–5. Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q. (2023). Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970):533–538.

References

603

Bird, L., Lew, D., Milligan, M., Carlini, E. M., Estanqueiro, A., Flynn, D., Gomez-Lazaro, E., Holttinen, H., Menemenlis, N., Orths, A., Eriksen, P. B., Smith, J. C., Soder, L., Sorensen, P., Altiparmakis, A., Yasuda, Y., and Miller, J. (2016). Wind and solar energy curtailment: A review of international experience. Renewable and Sustainable Energy Reviews, 65:577– 586. Bjerknes, V. (1999). The problem of weather forecasting as a problem in mechanics and physics. In Shapiro, M. A. and Grøn˚as, S., editors, The Life Cycles of Extratropical Cyclones, pages 1–4. American Meteorological Society, Boston, MA. Blaga, R., Sabadus, A., Stefu, N., Dughir, C., Paulescu, M., and Badescu, V. (2019). A current perspective on the accuracy of incoming solar energy forecasting. Progress in Energy and Combustion Science, 70:119–144. Blanc, P., Espinar, B., Geuder, N., Gueymard, C., Meyer, R., Pitz-Paal, R., Reinhardt, B., Renn´e, D., Sengupta, M., Wald, L., and Wilbert, S. (2014). Direct normal irradiance related definitions and applications: The circumsolar issue. Solar Energy, 110:561–577. Blanc, P., Massip, P., Kazantzidis, A., Tzoumanikas, P., Kuhn, P., Wilbert, S., Sch¨uler, D., and Prahl, C. (2017a). Short-term forecasting of high resolution local DNI maps with multiple fish-eye cameras in stereoscopic mode. AIP Conference Proceedings, 1850(1):140004. Blanc, P., Remund, J., and Vallance, L. (2017b). Short-term solar power forecasting based on satellite images. In Kariniotakis, G., editor, Renewable Energy Forecasting, Woodhead Publishing Series in Energy, pages 179–198. Woodhead Publishing. Blanc, P. and Wald, L. (2012). The SG2 algorithm for a fast and accurate computation of the position of the sun for multi-decadal time period. Solar Energy, 86(10):3072–3083. Bloom, A., Townsend, A., Palchak, D., Novacheck, J., King, J., Barrows, C., Ibanez, E., O’Connell, M., Jordan, G., Roberts, B., Draxl, C., and Gruchalla, K. (2016). Eastern renewable generation integration study. Technical Report NREL/TP-6A20-64472, National Renewable Energy Lab.(NREL), Golden, CO (United States). Bl¨oschl, G. and Sivapalan, M. (1995). Scale issues in hydrological modelling: A review. Hydrological Processes, 9(3–4):251–290. Bonavita, M. and Lean, P. (2021). 4D-Var for numerical weather prediction. Weather, 76(2):65–66. Borkowska, B. (1974). Probabilistic load flow. IEEE Transactions on Power Apparatus and Systems, PAS-93(3):752–759. Bosch, J. L., Zheng, Y., and Kleissl, J. (2013). Deriving cloud velocity from an array of solar radiation measurements. Solar Energy, 87:196–203. Bossavy, A., Girard, R., and Kariniotakis, G. (2013). Forecasting ramps of wind power production with numerical weather prediction ensembles. Wind Energy, 16(1):51–63. Bouzgou, H. and Gueymard, C. A. (2019). Fast short-term global solar irradiance forecasting with wrapper mutual information. Renewable Energy, 133:1055–1065. Bowman, D. C. and Lees, J. M. (2015). Near real time weather and ocean model data access with rNOMADS. Computers & Geosciences, 78:88–95. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control. John Wiley & Sons. Boylan, J. E., Goodwin, P., Mohammadipour, M., and Syntetos, A. A. (2015). Reproducibility in forecasting research. International Journal of Forecasting, 31(1):79–90. Brabec, M., Paulescu, M., and Badescu, V. (2015). Tailored vs black-box models for forecasting hourly average solar irradiance. Solar Energy, 111:320–331. Bracale, A., Carpinelli, G., and De Falco, P. (2017). A probabilistic competitive ensemble method for short-term photovoltaic power forecasting. IEEE Transactions on Sustainable Energy, 8(2):551–560.

604

References

Bracale, A., Carpinelli, G., and De Falco, P. (2019). Developing and comparing different strategies for combining probabilistic photovoltaic power forecasts in an ensemble method. Energies, 12(6):1011. Brandemuehl, M. J. and Beckman, W. A. (1980). Transmission of diffuse radiation through CPC and flat plate collector glazings. Solar Energy, 24(5):511–513. Breiman, L. (2001). Random forests. Machine Learning, 45:5–32. Bremnes, J. B. (2004). Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Monthly Weather Review, 132(1):338–347. Bremnes, J. B. (2019). Constrained quantile regression splines for ensemble postprocessing. Monthly Weather Review, 147(5):1769–1780. Bremnes, J. B. (2020). Ensemble postprocessing using quantile function regression based on neural networks and Bernstein polynomials. Monthly Weather Review, 148(1):403–414. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1–3. Bright, J. M. (2019a). The impact of globally diverse GHI training data: Evaluation through application of a simple Markov chain downscaling methodology. Journal of Renewable and Sustainable Energy, 11(2):023703. Bright, J. M. (2019b). Solcast: Validation of a satellite-derived solar irradiance dataset. Solar Energy, 189:435–449. Bright, J. M. (2021). Synthetic Solar Irradiance. AIP Publishing LLC, Melville, New York. Bright, J. M., Bai, X., Zhang, Y., Sun, X., Acord, B., and Wang, P. (2020). irradpy: Python package for MERRA-2 download, extraction and usage for clear-sky irradiance modelling. Solar Energy, 199:685–693. Bright, J. M. and Engerer, N. A. (2019). Engerer2: Global re-parameterisation, update, and validation of an irradiance separation model at different temporal resolutions. Journal of Renewable and Sustainable Energy, 11(3):033701. Bright, J. M., Smith, C. J., Taylor, P. G., and Crook, R. (2015). Stochastic generation of synthetic minutely irradiance time series derived from mean hourly weather observation data. Solar Energy, 115:229–242. Brock, F. V., Crawford, K. C., Elliott, R. L., Cuperus, G. W., Stadler, S. J., Johnson, H. L., and Eilts, M. D. (1994). The Oklahoma Mesonet: A technical overview. Journal of Atmospheric and Oceanic Technology, 12(1):5–19. Br¨ocker, J. (2008). On reliability analysis of multi-categorical forecasts. Nonlinear Processes in Geophysics, 15(4):661–673. Br¨ocker, J. (2012). Probability forecasts. In Jolliffe, I. T. and Stephenson, D. B., editors, Forecast Verification: A Practitioner’s Guide in Atmospheric Science, pages 119–139. American Meteorological Society, Boston MA, USA. Br¨ocker, J. and Smith, L. A. (2007a). Increasing the reliability of reliability diagrams. Weather and Forecasting, 22(3):651–661. Br¨ocker, J. and Smith, L. A. (2007b). Scoring probabilistic forecasts: The importance of being proper. Weather and Forecasting, 22(2):382–388. Brotzge, J. A., Wang, J., Thorncroft, C. D., Joseph, E., Bain, N., Bassill, N., Farruggio, N., Freedman, J. M., Hemker, K., Johnston, D., Kane, E., McKim, S., Miller, S. D., Minder, J. R., Naple, P., Perez, S., Schwab, J. J., Schwab, M. J., and Sicker, J. (2020). A technical overview of the New York State Mesonet standard network. Journal of Atmospheric and Oceanic Technology, 37(10):1827–1845. Buchard, V., Randles, C. A., da Silva, A. M., Darmenov, A., Colarco, P. R., Govindaraju, R., Ferrare, R., Hair, J., Beyersdorf, A. J., Ziemba, L. D., and Yu, H. (2017). The MERRA-2

References

605

aerosol reanalysis, 1980 onward. Part II: Evaluation and case studies. Journal of Climate, 30(17):6851–6872. Bugler, J. W. (1977). The determination of hourly insolation on an inclined plane using a diffuse irradiance model based on hourly measured global horizontal insolation. Solar Energy, 19(5):477–491. B¨uhlmann, P. (1997). Sieve bootstrap for time series. Bernoulli, 3(2):123–148. B¨uhlmann, P. (2002). Bootstraps for time series. Statistical Science, 17(1):52–72. Burger, B. and R¨uther, R. (2006). Inverter sizing of grid-connected photovoltaic systems in the light of local solar resource distribution characteristics and temperature. Solar Energy, 80(1):32–45. Buster, G., Bannister, M., Habte, A., Hettinger, D., Maclaurin, G., Rossol, M., Sengupta, M., and Xie, Y. (2022). Physics-guided machine learning for improved accuracy of the National Solar Radiation Database. Solar Energy, 232:483–492. Cabrera-Tobar, A., Bullich-Massagu´e, E., Arag¨ue´ s-Pe˜nalba, M., and Gomis-Bellmunt, O. (2016). Topologies for large scale photovoltaic power plants. Renewable and Sustainable Energy Reviews, 59:309–319. Cai, L., Gu, J., and Jin, Z. (2020). Two-layer transfer-learning-based architecture for shortterm load forecasting. IEEE Transactions on Industrial Informatics, 16(3):1722–1732. Ca˜nadillas, D., Valizadeh, H., Kleissl, J., Gonz´alez-D´ıaz, B., and Guerrero-Lemus, R. (2021). Eda-based optimized global control for PV inverters in distribution grids. IET Renewable Power Generation, 15(2):382–396. Candille, G. and Talagrand, O. (2005). Evaluation of probabilistic prediction systems for a scalar variable. Quarterly Journal of the Royal Meteorological Society, 131(609):2131– 2150. Cannon, A. J. (2011). Quantile regression neural networks: Implementation in R and application to precipitation downscaling. Computers & Geosciences, 37(9):1277–1284. Cannon, A. J. (2018). Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stochastic Environmental Research and Risk Assessment, 32:3207–3225. Carrillo, C., Obando Monta˜no, A. F., Cidr´as, J., and D´ıaz-Dorado, E. (2013). Review of power curve modelling for wind turbines. Renewable and Sustainable Energy Reviews, 21:572– 581. Carslaw, D. C. and Ropkins, K. (2012). openair — An R package for air quality data analysis. Environmental Modelling & Software, 27–28:52–61. Carvallo, J. P., Larsen, P. H., Sanstad, A. H., and Goldman, C. A. (2018). Long term load forecasting accuracy in electric utility integrated resource planning. Energy Policy, 119:410– 422. Causi, S. L., Messana, C., Noviello, G., Parretta, A., Sarno, A., Freiesleben, W., Palz, W., Ossenbrink, H. A., and Helm, P. (1995). Performance analysis of single crystal silicon modules in real operating conditions. In Proceedings of 13th European PV Solar Energy Conference, page 1469. Ceamanos, X., Carrer, D., and Roujean, J.-L. (2014). An efficient approach to estimate the transmittance and reflectance of a mixture of aerosol components. Atmospheric Research, 137:125–135. ˇ uri, M. (2012). Correction of satellite-derived DNI time series using Cebecauer, T. and S´ locally-resolved aerosol data. In Proceedings of the SolarPACES Conference, Marrakech, Morocco. ˇ uri, M., and Perez, R. (2010). High performance MSG satellite model for Cebecauer, T., S´ operational solar energy applications. In ASES National Solar Conference, Phoenix, USA.

606

References

Cervone, G., Clemente-Harding, L., Alessandrini, S., and Delle Monache, L. (2017). Shortterm photovoltaic power forecasting using artificial neural networks and an analog ensemble. Renewable Energy, 108:274–286. ¨ Erg¨un, A., G¨urel, A. E., Acar, B., and ˙Ilker Aksu, A. (2019). Ceylan, ˙I., Yilmaz, S., ˙Inanc¸, O., Determination of the heat transfer coefficient of PV panels. Energy, 175:978–985. Chan, F. and Pauwels, L. L. (2018). Some theoretical results on forecast combinations. International Journal of Forecasting, 34(1):64–74. Charlton, N. and Singleton, C. (2014). A refined parametric model for short term load forecasting. International Journal of Forecasting, 30(2):364–368. Chen, B.-J., Chang, M.-W., and lin, C.-J. (2004). Load forecasting using support vector machines: A study on EUNITE competition 2001. IEEE Transactions on Power Systems, 19(4):1821–1830. Chen, S., Li, P., Brady, D., and Lehman, B. (2013). Determining the optimum grid-connected photovoltaic inverter size. Solar Energy, 87:96–116. Chen, S., Liang, Z., Guo, S., and Li, M. (2022). Estimation of high-resolution solar irradiance data using optimized semi-empirical satellite method and GOES-16 imagery. Solar Energy, 241:404–415. Chen, X., Du, Y., Lim, E., Wen, H., and Jiang, L. (2019a). Sensor network based PV power nowcasting with spatio-temporal preselection for grid-friendly control. Applied Energy, 255:113760. Chen, Y., Luh, P. B., Guan, C., Zhao, Y., Michel, L. D., Coolbeth, M. A., Friedland, P. B., and Rourke, S. J. (2010). Short-term load forecasting: Similar day-based wavelet neural networks. IEEE Transactions on Power Systems, 25(1):322–330. Chen, Y., Zhang, S., Zhang, W., Peng, J., and Cai, Y. (2019b). Multifactor spatio-temporal correlation model based on a combination of convolutional neural network and long short-term memory neural network for wind speed forecasting. Energy Conversion and Management, 185:783–799. Choi, M., Rachunok, B., and Nateghi, R. (2021). Short-term solar irradiance forecasting using convolutional neural networks and cloud imagery. Environmental Research Letters, 16(4):044045. Chow, C. W., Urquhart, B., Lave, M., Dominguez, A., Kleissl, J., Shields, J., and Washom, B. (2011). Intra-hour forecasting with a total sky imager at the UC San Diego solar energy testbed. Solar Energy, 85(11):2881–2893. Chowdhury, B. H. and Rahman, S. (1987). Forecasting sub-hourly solar irradiance for prediction of photovoltaic output. In 19th IEEE Photovoltaic Specialists Conference, pages 171–176. Claeskens, G., Magnus, J. R., Vasnev, A. L., and Wang, W. (2016). The forecast combination puzzle: A simple theoretical explanation. International Journal of Forecasting, 32(3):754– 762. Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5(4):559–583. Coimbra, C. F. M., Kleissl, J., and Marquez, R. (2013). Chapter 8 - Overview of solarforecasting methods and a metric for accuracy evaluation. In Kleissl, J., editor, Solar Energy Forecasting and Resource Assessment, pages 171–194. Academic Press, Boston. Cole, W., Frazier, W., Donohoo-Vallett, P., Mai, T., and Das, P. (2018). 2018 standard scenarios report: A U.S. electricity sector outlook. Technical Report NREL/TP-6A20-71913, National Renewable Energy Lab.(NREL), Golden, CO (United States).

References

607

Conceic¸a˜ o, R., Gonz´alez-Aguilar, J., Merrouni, A. A., and Romero, M. (2022). Soiling effect in solar energy conversion systems: A review. Renewable and Sustainable Energy Reviews, 162:112434. Corripio, J. G. (2021). insol: Solar Radiation. R package version 1.2.2. Costa, A., Crespo, A., Navarro, J., Lizcano, G., Madsen, H., and Feitosa, E. (2008). A review on the young history of the wind power short-term prediction. Renewable and Sustainable Energy Reviews, 12(6):1725–1744. Coulson, K. C., Gray, E. L., and Bouricius, G. M. (1965). A study of the reflection and polarization characteristics of selected natural and artificial surfaces. Technical Report R65SD4, General Electric Co. Philadelphia, PA (United States). Courtier, P., Th´epaut, J.-N., and Hollingsworth, A. (1994). A strategy for operational implementation of 4D-Var, using an incremental approach. Quarterly Journal of the Royal Meteorological Society, 120(519):1367–1387. Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance functions. Journal of the American Statistical Association, 94(448):1330–1339. Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):209–226. Cressie, N., Shi, T., and Kang, E. L. (2010). Fixed rank filtering for spatio-temporal data. Journal of Computational and Graphical Statistics, 19(3):724–745. Cressie, N. and Wikle, C. K. (2015). Statistics for Spatio-Temporal Data. John Wiley & Sons. Dahmani, K., Dizene, R., Notton, G., Paoli, C., Voyant, C., and Nivet, M. L. (2014). Estimation of 5-min time-step data of tilted solar global irradiation using ANN (artificial neural network) model. Energy, 70:374–381. Dambreville, R., Blanc, P., Chanussot, J., and Boldo, D. (2014). Very short term forecasting of the global horizontal irradiance using a spatio-temporal autoregressive model. Renewable Energy, 72:291–300. Das, S. and Nason, G. P. (2016). Measuring the degree of non-stationarity of a time series. Stat, 5(1):295–305. David, M., Lauret, P., and Boland, J. (2013). Evaluating tilted plane models for solar radiation using comprehensive testing procedures, at a southern hemisphere location. Renewable Energy, 51:124–131. David, M., Luis, M. A., and Lauret, P. (2018). Comparison of intraday probabilistic forecasting of solar irradiance using only endogenous data. International Journal of Forecasting, 34(3):529–547. Davis-Stober, C. P., Budescu, D. V., Dana, J., and Broomell, S. B. (2014). When is a crowd wise? Decision, 1(2):79–101. Dav`o, F., Alessandrini, S., Sperati, S., Delle Monache, L., Airoldi, D., and Vespucci, M. T. (2016). Post-processing techniques and principal component analysis for regional wind power and solar irradiance forecasting. Solar Energy, 134:327–338. Dawid, A., DeGroot, M., Mortera, J., Cooke, R., French, S., Genest, C., Schervish, M., Lindley, D., McConway, K., and Winkler, R. (1995). Coherent combination of experts’ opinions. TEST, 4(2):263–313. Dawid, A. P. (1984). Present position and potential developments: Some personal views: Statistical theory: The prequential approach. Journal of the Royal Statistical Society. Series A (General), 147(2):278–292. De Prada Gil, M., Dom´ınguez-Garc´ıa, J. L., D´ıaz-Gonz´alez, F., Arag¨ue´ s-Pe˜nalba, M., and Gomis-Bellmunt, O. (2015). Feasibility analysis of offshore wind power plants with DC collection grid. Renewable Energy, 78:467–477.

608

References

De Soto, W., Klein, S. A., and Beckman, W. A. (2006). Improvement and validation of a model for photovoltaic array performance. Solar Energy, 80(1):78–88. Decker, M., Brunke, M. A., Wang, Z., Sakaguchi, K., Zeng, X., and Bosilovich, M. G. (2012). Evaluation of the reanalysis products from GSFC, NCEP, and ECMWF using flux tower observations. Journal of Climate, 25(6):1916–1944. Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., H´olm, E. V., Isaksen, L., K˚allberg, P., K¨ohler, M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey, C., de Rosnay, P., Tavolato, C., Th´epaut, J.-N., and Vitart, F. (2011). The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society, 137(656):553–597. Demirhan, H. and Renwick, Z. (2018). Missing value imputation for short to mid-term horizontal solar irradiance data. Applied Energy, 225:998–1012. Dennett, D. C. (2013). Intuition Pumps and Other Tools for Thinking. W. W. Norton & Company. Di Fonzo, T. and Girolimetto, D. (2023). Spatio-temporal reconciliation of solar forecasts. Solar Energy, 251:13–29. Diagne, M., David, M., Boland, J., Schmutz, N., and Lauret, P. (2014). Post-processing of solar irradiance forecasts from WRF model at Reunion Island. Solar Energy, 105:99–108. Dickey, D. G. (2011). Dickey–Fuller tests. In Lovric, M., editor, International Encyclopedia of Statistical Science, pages 385–388. Springer, Berlin, Heidelberg. Diebold, F. X., Gunther, T. A., and Tay, A. S. (1998). Evaluating density forecasts with applications to financial risk management. International Economic Review, 39(4):863–883. Diebold, F. X. and Lopez, J. A. (1996). Forecast evaluation and combination. In Statistical Methods in Finance, volume 14 of Handbook of Statistics, pages 241–268. Elsevier. Diebold, F. X. and Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3):253–263. Dimet, F.-X. L. and Talagrand, O. (1986). Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus A, 38A(2):97–110. Dobos, A. P. (2012). An improved coefficient calculator for the California Energy Commission 6 parameter photovoltaic module model. Journal of Solar Energy Engineering, 134(2):021011. Dobos, A. P. (2014). PVWatts version 5 manual. Technical Report NREL/TP-6A20-62641, National Renewable Energy Lab.(NREL), Golden, CO (United States). Doelling, D. R., Sun, M., Nguyen, L. T., Nordeen, M. L., Haney, C. O., Keyes, D. F., and Mlynczak, P. E. (2016). Advances in geostationary-derived longwave fluxes for the CERES Synoptic (SYN1deg) product. Journal of Atmospheric and Oceanic Technology, 33(3):503– 521. Dong, Z., Yang, D., Reindl, T., and Walsh, W. M. (2013). Short-term solar irradiance forecasting using exponential smoothing state space model. Energy, 55:1104–1113. Dong, Z., Yang, D., Reindl, T., and Walsh, W. M. (2014). Satellite image analysis and a hybrid ESSS/ANN model to forecast solar irradiance in the tropics. Energy Conversion and Management, 79:66–73. Dorn, H. F. (1950). Pitfalls in population forecasts and projections. Journal of the American Statistical Association, 45(251):311–334.

References

609

Doubleday, K., Jascourt, S., Kleiber, W., and Hodge, B.-M. (2021). Probabilistic solar power forecasting using Bayesian model averaging. IEEE Transactions on Sustainable Energy, 12(1):325–337. Doubleday, K., Van Scyoc Hernandez, V., and Hodge, B.-M. (2020). Benchmark probabilistic solar forecasts: Characteristics and recommendations. Solar Energy, 206:52–67. Drews, A., Beyer, H. G., and Rindelhardt, U. (2008). Quality of performance assessment of PV plants based on irradiation maps. Solar Energy, 82(11):1067–1075. Drews, A., de Keizer, A. C., Beyer, H. G., Lorenz, E., Betcke, J., van Sark, W. G. J. H. M., Heydenreich, W., Wiemken, E., Stettler, S., Toggweiler, P., Bofinger, S., Schneider, M., Heilscher, G., and Heinemann, D. (2007). Monitoring and remote failure detection of gridconnected PV systems based on satellite observations. Solar Energy, 81(4):548–564. Driemel, A., Augustine, J., Behrens, K., Colle, S., Cox, C., Cuevas-Agull´o, E., Denn, F. M., Duprat, T., Fukuda, M., Grobe, H., Haeffelin, M., Hodges, G., Hyett, N., Ijima, O., Kallis, A., Knap, W., Kustov, V., Long, C. N., Longenecker, D., Lupi, A., Maturilli, M., Mimouni, M., Ntsangwane, L., Ogihara, H., Olano, X., Olefs, M., Omori, M., Passamani, L., Pereira, E. B., Schmith¨usen, H., Schumacher, S., Sieger, R., Tamlyn, J., Vogt, R., Vuilleumier, L., Xia, X., Ohmura, A., and K¨onig-Langlo, G. (2018). Baseline Surface Radiation Network (BSRN): Structure and data description (1992–2017). Earth System Science Data, 10(3):1491–1501. Dudek, G. (2016). Multilayer perceptron for GEFCom2014 probabilistic electricity price forecasting. International Journal of Forecasting, 32(3):1057–1060. Duffie, J. A. and Beckman, W. A. (2013). Solar Energy Thermal Processes. John Wiley & Sons. Durran, D. R. (2013). Numerical Methods for Wave Equations in Geophysical Fluid Dynamics, volume 32. Springer Science & Business Media. Eckel, F. A. and Walters, M. K. (1998). Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Weather and Forecasting, 13(4):1132–1147. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2):407–499. Eissa, Y., Marpu, P. R., Gherboudj, I., Ghedira, H., Ouarda, T. B. M. J., and Chiesa, M. (2013). Artificial neural network based model for retrieval of the direct normal, diffuse horizontal and global horizontal irradiances using SEVIRI images. Solar Energy, 89:1–16. Ekstr¨om, J., Koivisto, M., Mellin, I., Millar, R. J., and Lehtonen, M. (2017). A statistical model for hourly large-scale wind and photovoltaic generation in new locations. IEEE Transactions on Sustainable Energy, 8(4):1383–1393. Emde, C., Buras-Schnell, R., Kylling, A., Mayer, B., Gasteiger, J., Hamann, U., Kylling, J., Richter, B., Pause, C., Dowling, T., and Bugliaro, L. (2016). The libRadtran software package for radiative transfer calculations (version 2.0.1). Geoscientific Model Development, 9(5):1647–1672. Engerer, N. A. (2015). Minute resolution estimates of the diffuse fraction of global irradiance for southeastern Australia. Solar Energy, 116:215–237. Engerer, N. A. and Mills, F. P. (2014). KPV: A clear-sky index for photovoltaics. Solar Energy, 105:679–693. Engerer, N. A. and Mills, F. P. (2015). Validating nine clear sky radiation models in Australia. Solar Energy, 120:9–24. Erbs, D. G., Klein, S. A., and Duffie, J. A. (1982). Estimation of the diffuse radiation fraction for hourly, daily and monthly-average global radiation. Solar Energy, 28(4):293–302.

610

References

Espeholt, L., Agrawal, S., Sønderby, C., Kumar, M., Heek, J., Bromberg, C., Gazen, C., Carver, R., Andrychowicz, M., Hickey, J., Bell, A., and Kalchbrenner, N. (2022). Deep learning for twelve hour precipitation forecasts. Nature Communications, 13(1):5145. Evans, D. L. and Florschuetz, L. W. (1977). Cost studies on terrestrial photovoltaic power systems with sunlight concentration. Solar Energy, 19(3):255–262. Every, J. P., Li, L., and Dorrell, D. G. (2020). K¨oppen-Geiger climate classification adjustment of the BRL diffuse irradiation model for Australian locations. Renewable Energy, 147:2453–2469. Evseev, E. G. and Kudish, A. I. (2009). An assessment of a revised Olmo et al. model to predict solar global radiation on a tilted surface at Beer Sheva, Israel. Renewable Energy, 34(1):112–119. Faiman, D. (2008). Assessing the outdoor operating temperature of photovoltaic modules. Progress in Photovoltaics: Research and Applications, 16(4):307–315. Fan, S. and Hyndman, R. J. (2012). Short-term load forecasting based on a semi-parametric additive model. IEEE Transactions on Power Systems, 27(1):134–141. Fanelli, D. (2009). How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLOS ONE, 4:1–11. Fanney, A. H., Dougherty, B. P., and Davis, M. W. (2003). Short-term characterization of building integrated photovoltaic panels. Journal of Solar Energy Engineering, 125(1):13– 20. Farneb¨ack, G. (2003). Two-frame motion estimation based on polynomial expansion. In Bigun, J. and Gustavsson, T., editors, Image Analysis, pages 363–370. Springer, Berlin, Heidelberg. Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., and Alsdorf, D. (2007). The shuttle radar topography mission. Reviews of Geophysics, 45(2):RG2004. Fatemi, S. A., Kuh, A., and Fripp, M. (2018). Parametric methods for probabilistic forecasting of solar irradiance. Renewable Energy, 129:666–676. Fekri, M. N., Grolinger, K., and Mir, S. (2022). Distributed load forecasting using smart meter data: Federated learning with recurrent neural networks. International Journal of Electrical Power & Energy Systems, 137:107669. Feng, C., Sun, M., and Zhang, J. (2020). Reinforced deterministic and probabilistic load forecasting via q-learning dynamic model selection. IEEE Transactions on Smart Grid, 11(2):1377–1386. Feng, C., Yang, D., Hodge, B.-M., and Zhang, J. (2019). OpenSolar: Promoting the openness and accessibility of diverse public solar datasets. Solar Energy, 188:1369–1379. Fern´andez-Peruchena, C. M., Polo, J., Mart´ın, L., and Mazorra, L. (2020). Site-adaptation of modeled solar radiation data: The siteadapt procedure. Remote Sensing, 12(13):2127. Ferrera Cobos, F., Valenzuela, R. X., Ram´ırez, L., Zarzalejo, L. F., Nouri, B., Wilbert, S., and Garc´ıa, G. (2018). Assessment of the impact of meteorological conditions on pyrheliometer calibration. Solar Energy, 168:44–59. Fiorucci, J. A. and Louzada, F. (2020). GROEC: Combination method via generalized rolling origin evaluation. International Journal of Forecasting, 36(1):105–109. Flemming, J., Benedetti, A., Inness, A., Engelen, R. J., Jones, L., Huijnen, V., Remy, S., Parrington, M., Suttie, M., Bozzo, A., Peuch, V.-H., Akritidis, D., and Katragkou, E. (2017). The CAMS interim reanalysis of carbon monoxide, ozone and aerosol for 2003–2015. Atmospheric Chemistry and Physics, 17(3):1945–1983. Forscher, B. K. (1963). Chaos in the brickyard. Science, 142(3590):339–339.

References

611

Fortin, V., Favre, A.-C., and Sa¨ıd, M. (2006). Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by using a different weight and dressing kernel for each member. Quarterly Journal of the Royal Meteorological Society, 132(617):1349–1369. Fosl, P. S. and Baggini, J. (2020). The Philosopher’s Toolkit: A Compendium of Philosophical Concepts and Methods. John Wiley & Sons. Fraley, C., Raftery, A. E., Sloughter, J. M., Gneiting, T., and of Washington., U. (2021). ensembleBMA: Probabilistic Forecasting using Ensembles and Bayesian Model Averaging. R package version 5.1.7. Frank, C. W., Wahl, S., Keller, J. D., Pospichal, B., Hense, A., and Crewell, S. (2018). Bias correction of a novel European reanalysis data set for solar energy applications. Solar Energy, 164:12–24. Frimane, A., Bright, J. M., Yang, D., Ouhammou, B., and Aggour, M. (2020). Dirichlet downscaling model for synthetic solar irradiance time series. Journal of Renewable and Sustainable Energy, 12(6):063702. Fu, D., Gueymard, C. A., Yang, D., Zheng, Y., Xia, X., and Bian, J. (2023). Improving aerosol optical depth retrievals from Himawari-8 with ensemble learning enhancement: Validation over Asia. Atmospheric Research, 284:106624. Fu, D., Liu, M., Yang, D., Che, H., and Xia, X. (2022). Influences of atmospheric reanalysis on the accuracy of clear-sky irradiance estimates: Comparing MERRA-2 and CAMS. Atmospheric Environment, 277:119080. Fuentes, M., Nofuentes, G., Aguilera, J., Talavera, D. L., and Castro, M. (2007). Application and validation of algebraic methods to predict the behaviour of crystalline silicon PV modules in Mediterranean climates. Solar Energy, 81(11):1396–1408. Fuentes, M. K. (1987). A simplified thermal model for flat-plate photovoltaic arrays. Technical Report SAND85-0330, Sandia National Laboratories, Albuquerque, NM (United States). Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4):193–202. Fukushima, K. and Miyake, S. (1982). Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recognition, 15(6):455–469. Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523. Gaba, A., Tsetlin, I., and Winkler, R. L. (2017). Combining interval forecasts. Decision Analysis, 14(1):1–20. Gagne, D. J., McGovern, A., Haupt, S. E., and Williams, J. K. (2017). Evaluation of statistical learning configurations for gridded solar irradiance forecasting. Solar Energy, 150:383– 393. Gallo, R., Castangia, M., Macii, A., Macii, E., Patti, E., and Aliberti, A. (2022). Solar radiation forecasting with deep learning techniques integrating geostationary satellite images. Engineering Applications of Artificial Intelligence, 116:105493. Galton, F. (1907). Vox Populi. Nature, 75(1949):450–451. Ganger, D., Zhang, J., and Vittal, V. (2018). Forecast-based anticipatory frequency control in power systems. IEEE Transactions on Power Systems, 33(1):1004–1012. Garay, M. J., Witek, M. L., Kahn, R. A., Seidel, F. C., Limbacher, J. A., Bull, M. A., Diner, D. J., Hansen, E. G., Kalashnikova, O. V., Lee, H., Nastan, A. M., and Yu, Y. (2020). Introducing the 4.4 km spatial resolution Multi-Angle Imaging SpectroRadiometer (MISR) aerosol product. Atmospheric Measurement Techniques, 13(2):593–628.

612

References

Garratt, A., Mitchell, J., Vahey, S. P., and Wakerly, E. C. (2011). Real-time inflation forecast densities from ensemble phillips curves. The North American Journal of Economics and Finance, 22(1):77–87. GE Energy (2010). Western wind and solar integration study. Technical Report NREL/SR550-47434, GE Energy Management, Schenectady, NY (United States). Gelaro, R., McCarty, W., Su´arez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.-K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B. (2017). The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). Journal of Climate, 30(14):5419–5454. Gensler, A., Henze, J., Sick, B., and Raabe, N. (2016). Deep learning for solar power forecasting – An approach using autoencoder and LSTM neural networks. In 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 002858–002865. Genton, M. G. (2007). Separable approximations of space-time covariance matrices. Environmetrics, 18(7):681–695. Geuder, N., Wolfertstetter, F., Wilbert, S., Sch¨uler, D., Affolter, R., Kraas, B., L¨upfert, E., and Espinar, B. (2015). Screening and flagging of solar irradiation and ancillary meteorological data. Energy Procedia, 69:1989–1998. International Conference on Concentrating Solar Power and Chemical Energy Systems, SolarPACES 2014. Ghalanos, A. and Theussl, S. (2015). Rsolnp: General Non-linear Optimization Using Augmented Lagrange Multiplier Method. R package version 1.16. Ghofrani, M., Ghayekhloo, M., and Azimi, R. (2016). A novel soft computing framework for solar radiation forecasting. Applied Soft Computing, 48:207–216. Gianfreda, A., Parisio, L., and Pelagatti, M. (2016). The impact of RES in the italian dayahead and balancing markets. The Energy Journal, 37:161–184. Giebel, G. and Kariniotakis, G. (2017). Wind power forecasting—A review of the state of the art. In Kariniotakis, G., editor, Renewable Energy Forecasting, Woodhead Publishing Series in Energy, pages 59–109. Woodhead Publishing. Gilleland, E., Ahijevych, D. A., Brown, B. G., and Ebert, E. E. (2010). Verifying forecasts spatially. Bulletin of the American Meteorological Society, 91(10):1365–1376. Gilman, P., Dobos, A., DiOrio, N., Freeman, J., Janzou, S., and Ryberg, D. (2018). SAM photovoltaic model technical reference update. Technical Report NREL/TP-6A20-67399, National Renewable Energy Lab.(NREL), Golden, CO (United States). Glahn, H. R. and Lowry, D. A. (1972). The use of model output statistics (MOS) in objective weather forecasting. Journal of Applied Meteorology, 11(8):1203–1211. Gneiting, T. (1999). Correlation functions for atmospheric data analysis. Quarterly Journal of the Royal Meteorological Society, 125(559):2449–2464. Gneiting, T. (2002). Nonseparable, stationary covariance functions for space–time data. Journal of the American Statistical Association, 97(458):590–600. Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494):746–762. Gneiting, T., Balabdaoui, F., and Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2):243–268. Gneiting, T. and Katzfuss, M. (2014). Probabilistic forecasting. Annual Review of Statistics and Its Application, 1(1):125–151.

References

613

Gneiting, T., Larson, K., Westrick, K., Genton, M. G., and Aldrich, E. (2006). Calibrated probabilistic forecasting at the Stateline Wind Energy Center. Journal of the American Statistical Association, 101(475):968–979. Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378. Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133(5):1098–1118. Gneiting, T. and Ranjan, R. (2011). Comparing density forecasts using threshold-and quantileweighted scoring rules. Journal of Business & Economic Statistics, 29(3):411–422. Gneiting, T. and Ranjan, R. (2013). Combining predictive distributions. Electronic Journal of Statistics, 7:1747–1782. Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., and Johnson, N. A. (2008). Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. TEST, 17:211–235. Goehry, B., Goude, Y., Massart, P., and Poggi, J.-M. (2020). Aggregation of multi-scale experts for bottom-up load forecasting. IEEE Transactions on Smart Grid, 11(3):1895–1904. Gonzalez-Castellanos, A. J., Pozo, D., and Bischi, A. (2020). Non-ideal linear operation model for Li-ion batteries. IEEE Transactions on Power Systems, 35(1):672–682. Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society: Series B (Methodological), 14(1):107–114. Goude, Y., Nedellec, R., and Kong, N. (2014). Local short and middle term electricity load forecasting with semi-parametric additive models. IEEE Transactions on Smart Grid, 5(1):440–446. Goudriaan, J. (1977). Crop Micrometeorology: A Simulation Study. PhD thesis, Pudoc, Department of Theoretical Production Ecology, Agricultural Univeristy, Wageningen, the Netherlands. Gracia, A., Torres, J. L., De Blas, M., Garc´ıa, A., and Perez, R. (2011). Comparison of four luminance and radiance angular distribution models for radiance estimation. Solar Energy, 85(9):2202–2216. Graham, J. R. (1996). Is a group of economists better than one? Than none? The Journal of Business, 69(2):193–232. Gr¨aler, B., Pebesma, E., and Heuvelink, G. (2016). Spatio-temporal interpolation using gstat. The R Journal, 8:204–218. Grantham, A., Gel, Y. R., and Boland, J. (2016). Nonparametric short-term probabilistic forecasting for solar radiation. Solar Energy, 133:465–475. Grena, R. (2012). Five new algorithms for the computation of sun position from 2010 to 2110. Solar Energy, 86(5):1323–1337. Gross, G. and Galiana, F. D. (1987). Short-term load forecasting. Proceedings of the IEEE, 75(12):1558–1573. Grossi, L. and Nan, F. (2019). Robust forecasting of electricity prices: Simulations, models and the impact of renewable sources. Technological Forecasting and Social Change, 141:305–318. Grunfeld, Y. and Griliches, Z. (1960). Is aggregation necessarily bad? The Review of Economics and Statistics, 42(1):1–13. Grushka-Cockayne, Y. and Jose, V. R. R. (2020). Combining prediction intervals in the M4 competition. International Journal of Forecasting, 36(1):178–185. Grushka-Cockayne, Y., Jose, V. R. R., and Lichtendahl, K. C. (2017). Ensembles of overfit and overconfident forecasts. Management Science, 63(4):1110–1130.

614

References

Gschwind, B., Wald, L., Blanc, P., Lef`evre, M., Schroedter-Homscheidt, M., and Arola, A. (2019). Improving the McClear model estimating the downwelling solar radiation at ground level in cloud-free conditions – McClear-v3. Meteorologische Zeitschrift, 28(2):147–163. Gu, C., Yang, D., Jirutitijaroen, P., Walsh, W. M., and Reindl, T. (2014). Spatial load forecasting with communication failure using time-forward kriging. IEEE Transactions on Power Systems, 29(6):2875–2882. Gu, Y. and Xie, L. (2014). Fast sensitivity analysis approach to assessing congestion induced wind curtailment. IEEE Transactions on Power Systems, 29(1):101–110. Gueymard, C. (1987). An anisotropic solar irradiance model for tilted surfaces and its comparison with selected engineering algorithms. Solar Energy, 38(5):367–386. Gueymard, C. (1988). Erratum. Solar Energy, 40(2):175–175. Gueymard, C. (1989). A two-band model for the calculation of clear sky solar irradiance, illuminance, and photosynthetically active radiation at the earth’s surface. Solar Energy, 43(5):253–265. Gueymard, C. (1993). Critical analysis and performance assessment of clear sky solar irradiance models using theoretical and measured data. Solar Energy, 51(2):121–138. Gueymard, C. A. (2001). Parameterized transmittance model for direct beam and circumsolar spectral irradiance. Solar Energy, 71(5):325–346. Gueymard, C. A. (2003). Direct solar transmittance and irradiance predictions with broadband models. Part I: detailed theoretical performance assessment. Solar Energy, 74(5):355–379. Gueymard, C. A. (2004). The sun’s total and spectral irradiance for solar energy applications and solar radiation models. Solar Energy, 76(4):423–453. Gueymard, C. A. (2005). Importance of atmospheric turbidity and associated uncertainties in solar radiation and luminous efficacy modelling. Energy, 30(9):1603–1621. Gueymard, C. A. (2008). REST2: High-performance solar radiation model for cloudless-sky irradiance, illuminance, and photosynthetically active radiation – Validation with a benchmark dataset. Solar Energy, 82(3):272–285. Gueymard, C. A. (2009). Direct and indirect uncertainties in the prediction of tilted irradiance for solar engineering applications. Solar Energy, 83(3):432–444. Gueymard, C. A. (2017a). Cloud and albedo enhancement impacts on solar irradiance using high-frequency measurements from thermopile and photodiode radiometers. Part 1: Impacts on global horizontal irradiance. Solar Energy, 153:755–765. Gueymard, C. A. (2017b). Cloud and albedo enhancement impacts on solar irradiance using high-frequency measurements from thermopile and photodiode radiometers. Part 2: Performance of separation and transposition models for global tilted irradiance. Solar Energy, 153:766–779. Gueymard, C. A. (2018). A reevaluation of the solar constant based on a 42-year total solar irradiance time series and a reconciliation of spaceborne observations. Solar Energy, 168:2–9. Gueymard, C. A. (2019). The SMARTS spectral irradiance model after 25 years: New developments and validation of reference spectra. Solar Energy, 187:233–253. Gueymard, C. A., Lara-Fanego, V., Sengupta, M., and Xie, Y. (2019). Surface albedo and reflectance: Review of definitions, angular and spectral effects, and intercomparison of major data sources in support of advanced solar irradiance modeling over the Americas. Solar Energy, 182:194–212. Gueymard, C. A. and Myers, D. R. (2009). Evaluation of conventional and high-performance routine solar radiation measurements for improved solar resource, climatological trends, and radiative modeling. Solar Energy, 83(2):171–185.

References

615

Gueymard, C. A. and Ruiz-Arias, J. A. (2016). Extensive worldwide validation and climate sensitivity analysis of direct irradiance predictions from 1-min global irradiance. Solar Energy, 128:1–30. Gueymard, C. A. and Yang, D. (2020). Worldwide validation of CAMS and MERRA-2 reanalysis aerosol optical depth products using 15 years of AERONET observations. Atmospheric Environment, 225:117216. Haben, S. and Giasemidis, G. (2016). A hybrid model of kernel density estimation and quantile regression for GEFCom2014 probabilistic load forecasting. International Journal of Forecasting, 32(3):1017–1022. Habte, A., Sengupta, M., Andreas, A., Wilcox, S., and Stoffel, T. (2016). Intercomparison of 51 radiometers for determining global horizontal irradiance and direct normal irradiance measurements. Solar Energy, 133:372–393. Hafez, B., Krishnamoorthy, H. S., Enjeti, P., Borup, U., and Ahmed, S. (2014). Medium voltage AC collection grid for large scale photovoltaic plants based on medium frequency transformers. In 2014 IEEE Energy Conversion Congress and Exposition (ECCE), pages 5304–5311. Haffaf, A., Lakdja, F., Ould Abdeslam, D., and Meziane, R. (2021). Monitoring, measured and simulated performance analysis of a 2.4 kWp grid-connected PV system installed on the Mulhouse campus, France. Energy for Sustainable Development, 62:44–55. Hallenbeck, C. (1920). Forecasting precipitation in percentages of probability. Monthly Weather Review, 48(11):645–647. Hamill, T. M. (1997). Reliability diagrams for multicategory probabilistic forecasts. Weather and Forecasting, 12(4):736–741. Hamill, T. M. (2001). Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129(3):550–560. Hamill, T. M. and Colucci, S. J. (1997). Verification of Eta–RSM short-range ensemble forecasts. Monthly Weather Review, 125(6):1312–1327. Hammer, A., Heinemann, D., Hoyer, C., Kuhlemann, R., Lorenz, E., M¨uller, R., and Beyer, H. G. (2003). Solar energy assessment using remote sensing technologies. Remote Sensing of Environment, 86(3):423–432. Hammer, A., Heinemann, D., Lorenz, E., and L¨uckehe, B. (1999). Short-term forecasting of solar radiation: A statistical approach using satellite data. Solar Energy, 67(1):139–150. Hansen, C. W. (2015). Parameter estimation for single diode models of photovoltaic modules. Technical Report SAND2015-2065, Sandia National Laboratories, Albuquerque, NM (United States). Hansson, S. O. (2017). Science and pseudo-science. Stanford Encyclopedia of Philosophy. Harty, T. M., Holmgren, W. F., Lorenzo, A. T., and Morzfeld, M. (2019). Intra-hour cloud index forecasting with data assimilation. Solar Energy, 185:270–282. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media. Hastie, T., Tibshirani, R., and Wainwright, M. (2019). Statistical Learning With Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC. Haupt, S. E. and Kosovi´c, B. (2017). Variable generation power forecasting as a big data problem. IEEE Transactions on Sustainable Energy, 8(2):725–732. Haupt, S. E., Kosovi´c, B., Jensen, T., Lazo, J. K., Lee, J. A., Jim´enez, P. A., Cowie, J., Wiener, G., McCandless, T. C., Rogers, M., Miller, S., Sengupta, M., Xie, Y., Hinkelman, L., Kalb, P., and Heiser, J. (2018). Building the Sun4Cast system: Improvements in solar power forecasting. Bulletin of the American Meteorological Society, 99(1):121–136.

616

References

Hay, J. E. (1993). Calculating solar radiation for inclined surfaces: Practical approaches. Renewable Energy, 3(4–5):373–380. Hay, J. E. and Davies, J. A. (1980). Calculation of the solar irradiance incident on an inclined surface. In Hay, J. E. and Won, T. K., editors, First Canadian Solar Radiation Data Workshop, pages 59–72, Toronto, Ontario, Canada. Hay, J. E. and McKay, D. C. (1985). Estimating solar irradiance on inclined surfaces: A review and assessment of methodologies. International Journal of Solar Energy, 3(4–5):203–240. He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778. Heesen, R. and Bright, L. K. (2020). Is peer review a good idea? The British Journal for the Philosophy of Science, axz029. Heidinger, A. K., Evan, A. T., Foster, M. J., and Walther, A. (2012). A naive Bayesian clouddetection scheme derived from CALIPSO and applied within PATMOS-x. Journal of Applied Meteorology and Climatology, 51(6):1129–1144. Heidinger, A. K., Foster, M. J., Walther, A., and Zhao, X. T. (2014). The Pathfinder Atmospheres–Extended AVHRR climate dataset. Bulletin of the American Meteorological Society, 95(6):909–922. Heidinger, A. K. and Pavolonis, M. J. (2009). Gazing at cirrus clouds for 25 years through a split window. Part I: Methodology. Journal of Applied Meteorology and Climatology, 48(6):1100–1116. Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15(5):559–570. Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Hor´anyi, A., Mu˜noz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., H´olm, E., Janiskov´a, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Th´epaut, J.-N. (2020). The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society, 146(730):1999–2049. Heusinger, J., Broadbent, A. M., Sailor, D. J., and Georgescu, M. (2020). Introduction, evaluation and application of an energy balance model for photovoltaic modules. Solar Energy, 195:382–395. Heydari, A., Astiaso Garcia, D., Keynia, F., Bisegna, F., and De Santoli, L. (2019). A novel composite neural network based method for wind and solar power forecasting in microgrids. Applied Energy, 251:113353. Hicks, B. B., DeLuisi, J. J., and Matt, D. R. (1996). The NOAA Integrated Surface Irradiance Study (ISIS)—A new surface radiation monitoring program. Bulletin of the American Meteorological Society, 77(12):2857–2864. Hijmans, R. J. (2020). raster: Geographic Data Analysis and Modeling. R package version 3.4-5. Hinkelman, L. M. (2013). Differences between along-wind and cross-wind solar irradiance variability on small spatial scales. Solar Energy, 88:192–203. Hippert, H. S., Pedreira, C. E., and Souza, R. C. (2001). Neural networks for short-term load forecasting: A review and evaluation. IEEE Transactions on Power Systems, 16(1):44–55. Hoadley, D. (2021). Efficient calculation of solar position using rectangular coordinates. Solar Energy, 220:80–87.

References

617

Hoff, T. E. and Perez, R. (2012). Modeling PV fleet output variability. Solar Energy, 86(8):2177–2189. Hogan, R. J. and Illingworth, A. J. (2000). Deriving cloud overlap statistics from radar. Quarterly Journal of the Royal Meteorological Society, 126(569):2903–2909. Hollands, K. G. T. (1985). A derivation of the diffuse fraction’s dependence on the clearness index. Solar Energy, 35(2):131–136. Hollands, K. G. T. and Crha, S. J. (1987). An improved model for diffuse radiation: Correction for atmospheric back-scattering. Solar Energy, 38(4):233–236. Hollands, K. G. T. and Suehrcke, H. (2013). A three-state model for the probability distribution of instantaneous solar radiation, with applications. Solar Energy, 96:103–112. Holmgren, W. F., Hansen, C. W., and Mikofski, M. A. (2018). pvlib python: A python package for modeling solar energy systems. Journal of Open Source Software, 3(29):884. Hong, T. (2020). Forecasting with high frequency data: M4 competition and beyond. International Journal of Forecasting, 36(1):191–194. Hong, T. and Fan, S. (2016). Probabilistic electric load forecasting: A tutorial review. International Journal of Forecasting, 32(3):914–938. Hong, T., Pinson, P., and Fan, S. (2014). Global Energy Forecasting Competition 2012. International Journal of Forecasting, 30(2):357–363. Hong, T., Pinson, P., Fan, S., Zareipour, H., Troccoli, A., and Hyndman, R. J. (2016). Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond. International Journal of Forecasting, 32(3):896–913. Hong, T., Pinson, P., Wang, Y., Weron, R., Yang, D., and Zareipour, H. (2020). Energy forecasting: A review and outlook. IEEE Open Access Journal of Power and Energy, 7:376– 388. Hong, T., Xie, J., and Black, J. (2019). Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. International Journal of Forecasting, 35(4):1389–1399. Hora, S. C. (2004). Probability judgments for continuous quantities: Linear combinations and calibration. Management Science, 50(5):597–604. Hottel, H. C. and Sarofin, A. F. (1967). Radiative Transfer. McGraw Hill. Hou, N., Zhang, X., Zhang, W., Wei, Y., Jia, K., Yao, Y., Jiang, B., and Cheng, J. (2020). Estimation of surface downward shortwave radiation over China from Himawari-8 AHI data based on random forest. Remote Sensing, 12(1):181. Howse, J. (2013). OpenCV Computer Vision with Python. Packt Publishing Birmingham. Hsu, W. and Murphy, A. H. (1986). The attributes diagram a geometrical framework for assessing the quality of probability forecasts. International Journal of Forecasting, 2(3):285– 293. Hu, M. and Huang, Y. (2020). atakrig: An R package for multivariate area-to-area and areato-point kriging predictions. Computers & Geosciences, 139:104471. Huang, C., Shi, H., Yang, D., Gao, L., Zhang, P., Fu, D., Xia, X., Chen, Q., Yuan, Y., Liu, M., Hu, B., Lin, K., and Li, X. (2023). Retrieval of sub-kilometer resolution solar irradiance from Fengyun-4A satellite using a region-adapted Heliosat-2 method. Solar Energy, 264:112038. Huang, C., Wang, L., and Lai, L. L. (2019a). Data-driven short-term solar irradiance forecasting based on information of neighboring sites. IEEE Transactions on Industrial Electronics, 66(12):9918–9927. Huang, G., Li, Z., Li, X., Liang, S., Yang, K., Wang, D., and Zhang, Y. (2019b). Estimating surface solar irradiance from satellites: Past, present, and future perspectives. Remote Sensing of Environment, 233:111371.

618

References

Huang, J. and Perry, M. (2016). A semi-empirical approach using gradient boosting and k-nearest neighbors regression for GEFCom2014 probabilistic solar power forecasting. International Journal of Forecasting, 32(3):1081–1086. Huang, J., Rikus, L. J., Qin, Y., and Katzfey, J. (2018). Assessing model performance of daily solar irradiance forecasts over Australia. Solar Energy, 176:615–626. Huang, W., Zhang, N., Kang, C., Capuder, T., Holjevac, N., and Kuzle, I. (2020). Beijing subsidiary administrative center multi-energy systems: An optimal configuration planning. Electric Power Systems Research, 179:106082. Huang, Y., Hasan, N., Deng, C., and Bao, Y. (2022). Multivariate empirical mode decomposition based hybrid model for day-ahead peak load forecasting. Energy, 239:122245. Huang, Y., Lu, J., Liu, C., Xu, X., Wang, W., and Zhou, X. (2010). Comparative study of power forecasting methods for PV stations. In 2010 International Conference on Power System Technology, pages 1–6. Huber, P. J. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35(1):73–101. Huber, P. J. (1973). Robust regression: Asymptotics, conjectures and monte carlo. Annals of Statistics, 1(5):799–821. Hubicka, K., Marcjasz, G., and Weron, R. (2019). A note on averaging day-ahead electricity price forecasts across calibration windows. IEEE Transactions on Sustainable Energy, 10(1):321–323. Huertas-Tato, J., Aler, R., Galv´an, I. M., Rodr´ıguez-Ben´ıtez, F. J., Arbizu-Barrena, C., and Pozo-V´azquez, D. (2020). A short-term solar radiation forecasting system for the Iberian Peninsula. Part 2: Model blending approaches based on machine learning. Solar Energy, 195:685–696. Huld, T., Friesen, G., Skoczek, A., Kenny, R. P., Sample, T., Field, M., and Dunlop, E. D. (2011). A power-rating model for crystalline silicon PV modules. Solar Energy Materials and Solar Cells, 95(12):3359–3369. Hummon, M., Ibanez, E., Brinkman, G., and Lew, D. (2012). Sub-hour solar data for power system modeling from static spatial variability analysis. In 2nd International Workshop on Integration of Solar Power in Power Systems, Lisbon, Portugal. Hussain, N., Shahzad, N., Yousaf, T., Waqas, A., Hussain Javed, A., Khan, S., Ali, M., and Liaquat, R. (2021). Designing of homemade soiling station to explore soiling loss effects on PV modules. Solar Energy, 225:624–633. Huttunen, J., Kokkola, H., Mielonen, T., Mononen, M. E. J., Lipponen, A., Reunanen, J., Lindfors, A. V., Mikkonen, S., Lehtinen, K. E. J., Kouremeti, N., Bais, A., Niska, H., and Arola, A. (2016). Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based lookup table. Atmospheric Chemistry and Physics, 16(13):8181–8191. Hyndman, R., Koehler, A. B., Ord, J. K., and Snyder, R. D. (2008). Forecasting with Exponential Smoothing: The State Space Approach. Springer Science & Business Media. Hyndman, R., Lee, A., Wang, E., and Wickramasuriya, S. (2020). hts: Hierarchical and Grouped Time Series. R package version 6.0.1. Hyndman, R. J. (2020). A brief history of forecasting competitions. International Journal of Forecasting, 36(1):7–14. Hyndman, R. J., Ahmed, R. A., Athanasopoulos, G., and Shang, H. L. (2011). Optimal combination forecasts for hierarchical time series. Computational Statistics & Data Analysis, 55(9):2579–2589. Hyndman, R. J. and Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.

References

619

Hyndman, R. J., Bashtannyk, D. M., and Grunwald, G. K. (1996). Estimating and visualizing conditional densities. Journal of Computational and Graphical Statistics, 5(4):315–336. Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: the forecast package for R. Journal of Statistical Software, 26(3):1–22. Hyndman, R. J., Lee, A. J., and Wang, E. (2016). Fast computation of reconciled forecasts for hierarchical and grouped time series. Computational Statistics & Data Analysis, 97:16–32. Hyndman, R. J., Wang, E., and Laptev, N. (2015). Large-scale unusual time series detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pages 1616– 1619. Ili´c, M. D., Xie, L., and Joo, J.-Y. (2011). Efficient coordination of wind power and priceresponsive demand—Part I: Theoretical foundations. IEEE Transactions on Power Systems, 26(4):1875–1884. Ineichen, P. and Perez, R. (2002). A new airmass independent formulation for the Linke turbidity coefficient. Solar Energy, 73(3):151–157. Ingel, A., Shahroudi, N., K¨angsepp, M., T¨attar, A., Komisarenko, V., and Kull, M. (2020). Correlated daily time series and forecasting in the M4 competition. International Journal of Forecasting, 36(1):121–128. Inman, R. H., Pedro, H. T. C., and Coimbra, C. F. M. (2013). Solar forecasting methods for renewable energy integration. Progress in Energy and Combustion Science, 39(6):535–576. Inness, A., Ades, M., Agust´ı-Panareda, A., Barr´e, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J. J., Engelen, R., Eskes, H., Flemming, J., Huijnen, V., Jones, L., Kipling, Z., Massart, S., Parrington, M., Peuch, V.-H., Razinger, M., Remy, S., Schulz, M., and Suttie, M. (2019). The CAMS reanalysis of atmospheric composition. Atmospheric Chemistry and Physics, 19(6):3515–3556. Inness, P. M. and Dorling, S. (2012). Operational Weather Forecasting. John Wiley & Sons. Ivanova, S. M. and Gueymard, C. A. (2019). Simulation and applications of cumulative anisotropic sky radiance patterns. Solar Energy, 178:278–294. Jaganathan, S. and Prakash, P. K. S. (2020). A combination-based forecasting method for the M4-competition. International Journal of Forecasting, 36(1):98–104. Jain, A. and Kapoor, A. (2004). Exact analytical solutions of the parameters of real solar cells using Lambert W-function. Solar Energy Materials and Solar Cells, 81(2):269–277. Jamaly, M. and Kleissl, J. (2017). Spatiotemporal interpolation and forecast of irradiance data using kriging. Solar Energy, 158:407–423. Janczura, J. and Michalak, A. (2020). Optimization of electric energy sales strategy based on probabilistic forecasts. Energies, 13(5):1045. Janczura, J., Tr¨uck, S., Weron, R., and Wolff, R. C. (2013). Identifying spikes and seasonal components in electricity spot price data: A guide to robust modeling. Energy Economics, 38:96–110. J¨arvel¨a, M., Lappalainen, K., and Valkealahti, S. (2020). Characteristics of the cloud enhancement phenomenon and PV power plants. Solar Energy, 196:137–145. Jefferson, T., Alderson, P., Wager, E., and Davidoff, F. (2002). Effects of editorial peer review: A systematic review. Jama, 287(21):2784–2786. Jeon, J., Panagiotelis, A., and Petropoulos, F. (2019). Probabilistic forecast reconciliation with applications to wind power and electric load. European Journal of Operational Research, 279(2):364–379. Jeppesen, J. H., Jacobsen, R. H., Inceoglu, F., and Toftegaard, T. S. (2019). A cloud detection algorithm for satellite imagery based on deep learning. Remote Sensing of Environment, 229:247–259.

620

References

Jiang, H. and Dong, Y. (2016). A nonlinear support vector machine model with hard penalty function based on glowworm swarm optimization for forecasting daily global solar radiation. Energy Conversion and Management, 126:991–1002. Jiang, H., Lu, N., Qin, J., Tang, W., and Yao, L. (2019). A deep learning algorithm to estimate hourly global solar radiation from geostationary satellite data. Renewable and Sustainable Energy Reviews, 114:109327. Jiao, J., Tang, Z., Zhang, P., Yue, M., and Yan, J. (2022). Cyberattack-resilient load forecasting with adaptive robust regression. International Journal of Forecasting, 38(3):910–919. Jimenez, P. A., Hacker, J. P., Dudhia, J., Haupt, S. E., Ruiz-Arias, J. A., Gueymard, C. A., Thompson, G., Eidhammer, T., and Deng, A. (2016). WRF-Solar: Description and clearsky assessment of an augmented NWP model for solar power prediction. Bulletin of the American Meteorological Society, 97(7):1249–1264. Jim´enez, P. A., Yang, J., Kim, J.-H., Sengupta, M., and Dudhia, J. (2022). Assessing the WRF-Solar model performance using satellite-derived irradiance from the National Solar Radiation Database. Journal of Applied Meteorology and Climatology, 61(2):129–142. Jolliffe, I. T. (2008). The impenetrable hedge: A note on propriety, equitability and consistency. Meteorological Applications, 15(1):25–29. Jolliffe, I. T. and Stephenson, D. B. (2012). Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley & Sons. Jones, A. S. and Fletcher, S. J. (2013). Chapter 13 - Data assimilation in numerical weather prediction and sample applications. In Kleissl, J., editor, Solar Energy Forecasting and Resource Assessment, pages 319–355. Academic Press, Boston. Jordan, A., Kr¨uger, F., and Lerch, S. (2019). Evaluating probabilistic forecasts with scoringRules. Journal of Statistical Software, 90(12):1–37. Jose, V. R. R., Grushka-Cockayne, Y., and Lichtendahl, K. C. (2014). Trimmed opinion pools and the crowd’s calibration problem. Management Science, 60(2):463–475. Jose, V. R. R. and Winkler, R. L. (2008). Simple robust averages of forecasts: Some empirical results. International Journal of Forecasting, 24(1):163–169. Juban, R., Ohlsson, H., Maasoumy, M., Poirier, L., and Kolter, J. Z. (2016). A multiple quantile regression approach to the wind, solar, and price tracks of GEFCom2014. International Journal of Forecasting, 32(3):1094–1102. Junk, C., Delle Monache, L., and Alessandrini, S. (2015a). Analog-based ensemble model output statistics. Monthly Weather Review, 143(7):2909–2917. Junk, C., Delle Monache, L., Alessandrini, S., Cervone, G., and von Bremen, L. (2015b). Predictor-weighting strategies for probabilistic wind power forecasting with an analog ensemble. Meteorologische Zeitschrift, 24(4):361–379. Kaba, K., Sarıg¨ul, M., Avcı, M., and Kandırmaz, H. M. (2018). Estimation of daily global solar radiation using deep learning model. Energy, 162:126–135. Kallio-Myers, V., Riihel¨a, A., Lahtinen, P., and Lindfors, A. (2020). Global horizontal irradiance forecast for Finland based on geostationary weather satellite data. Solar Energy, 198:68–80. Kamphuis, N. R. (2018). A New Direction for Distributed-Scale Solar-Thermal Cogeneration. PhD thesis, Department of Mechanical Engineering, Texas A&M University. Kamphuis, N. R., Gueymard, C., Holtzapple, M. T., Duggleby, A. T., and Annamalai, K. (2020). Perspectives on the origin, derivation, meaning, and significance of the isotropic sky model. Solar Energy, 201:8–12. Kang, Y., Hyndman, R. J., and Smith-Miles, K. (2017). Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting, 33(2):345–358.

References

621

Kardakos, E. G., Alexiadis, M. C., Vagropoulos, S. I., Simoglou, C. K., Biskas, P. N., and Bakirtzis, A. G. (2013). Application of time series and artificial neural network models in short-term forecasting of PV power generation. In 2013 48th International Universities’ Power Engineering Conference (UPEC), pages 1–6. Kariniotakis, G. N., Stavrakakis, G. S., and Nogaret, E. F. (1996). Wind power forecasting using advanced neural networks models. IEEE Transactions on Energy Conversion, 11(4):762–767. Kasten, F. (1966). A new table and approximation formula for the relative optical air mass. Technical Report 136, U.S. Army Material Command, CRREL, Hanover, NH (United States). Kasten, F. and Young, A. T. (1989). Revised optical air mass tables and approximation formula. Applied Optics, 28(22):4735–4738. Kath, C., Nitka, W., Serafin, T., Weron, T., Zaleski, P., and Weron, R. (2020). Balancing generation from renewable energy sources: Profitability of an energy trader. Energies, 13(1):205. Kath, C. and Ziel, F. (2018). The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts. Energy Economics, 76:411–423. Kaur, A., Nonnenmacher, L., Pedro, H. T. C., and Coimbra, C. F. M. (2016). Benefits of solar forecasting for energy imbalance markets. Renewable Energy, 86:819–830. Kazantzidis, A., Tzoumanikas, P., Blanc, P., Massip, P., Wilbert, S., and Ramirez-Santigosa, L. (2017). Short-term forecasting based on all-sky cameras. In Kariniotakis, G., editor, Renewable Energy Forecasting, Woodhead Publishing Series in Energy, pages 153–178. Woodhead Publishing. Khalil, S. A. and Shaffie, A. M. (2013). A comparative study of total, direct and diffuse solar irradiance by using different models on horizontal and inclined surfaces for Cairo, Egypt. Renewable and Sustainable Energy Reviews, 27:853–863. Khosravi, A., Nahavandi, S., and Creighton, D. (2013). A neural network-GARCH-based method for construction of prediction intervals. Electric Power Systems Research, 96:185– 193. Khosravi, A., Nahavandi, S., and Creighton, D. (2013). Prediction intervals for shortterm wind farm power generation forecasts. IEEE Transactions on Sustainable Energy, 4(3):602–610. Khosravi, A., Nahavandi, S., Creighton, D., and Atiya, A. F. (2011). Comprehensive review of neural network-based prediction intervals and new advances. IEEE Transactions on Neural Networks, 22(9):1341–1356. Killinger, S., Engerer, N., and M¨uller, B. (2017). QCPV: A quality control algorithm for distributed photovoltaic array power output. Solar Energy, 143:120–131. King, D. L., Kratochvil, J. A., and Boyson, W. E. (2004). Photovoltaic array performance model. Technical Report SAND2004-3535, Sandia National Laboratories, Albuquerque, NM (United States). Klæboe, G., Eriksrud, A. L., and Fleten, S.-E. (2015). Benchmarking time series based forecasting models for electricity balancing market prices. Energy Systems, 6(1):43–61. Klucher, T. M. (1979). Evaluation of models to predict insolation on tilted surfaces. Solar Energy, 23(2):111–114. Kobayashi, S., Ota, Y., Harada, Y., Ebita, A., Moriya, M., Onoda, H., Onogi, K., Kamahori, H., Kobayashi, C., Endo, H., Miyaoka, K., and Takahashi, K. (2015). The JRA-55 Reanalysis: General specifications and basic characteristics. Journal of the Meteorological Society of Japan. Ser. II, 93(1):5–48.

622

References

Koehl, M., Heck, M., Wiesmeier, S., and Wirth, J. (2011). Modeling of the nominal operating cell temperature based on outdoor weathering. Solar Energy Materials and Solar Cells, 95(7):1638–1646. Koenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press. Koenker, R. (2020). quantreg: Quantile Regression. R package version 5.67. Kolassa, S. (2020). Why the “best” point forecast depends on the error or accuracy measure. International Journal of Forecasting, 36(1):208–211. Kolios, S. and Hatzianastassiou, N. (2019). Quantitative aerosol optical depth detection during dust outbreaks from Meteosat imagery using an artificial neural network model. Remote Sensing, 11(9):1022. K¨onig-Langlo, G., Sieger, R., Schmith¨usen, H., B¨ucker, A., Richter, F., and Dutton, E. (2013). The Baseline Surface Radiation Network and its World Radiation Monitoring Centre at the Alfred Wegener Institute. Koronakis, P. S. (1986). On the choice of the angle of tilt for south facing solar collectors in the Athens basin area. Solar Energy, 36(3):217–225. Kourentzes, N. and Petropoulos, F. (2016). Forecasting with multivariate temporal aggregation: The case of promotional modelling. International Journal of Production Economics, 181:145–153. Kourentzes, N., Petropoulos, F., and Trapero, J. R. (2014). Improving forecasting by estimating time series structural components across multiple frequencies. International Journal of Forecasting, 30(2):291–302. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Pereira, F., Burges, C. J., Bottou, L., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems, volume 25, pages 1097–1105. Kr¨uger, F., Lerch, S., Thorarinsdottir, T., and Gneiting, T. (2021). Predictive inference based on Markov chain Monte Carlo output. International Statistical Review, 89(2):274–301. Kuehnert, J., Lorenz, E., and Heinemann, D. (2013). Chapter 11 - Satellite-based irradiance and power forecasting for the german energy market. In Kleissl, J., editor, Solar Energy Forecasting and Resource Assessment, pages 319–355. Academic Press, Boston. Kuhn, M. (2021). caret: Classification and Regression Training. R package version 6.0-90. Kumari, P. and Toshniwal, D. (2021). Long short term memory–convolutional neural network based deep hybrid approach for solar irradiance forecasting. Applied Energy, 295:117061. Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3):480–498. Kurtz, B., Mejia, F., and Kleissl, J. (2017). A virtual sky imager testbed for solar energy forecasting. Solar Energy, 158:753–759. Kuster, C., Rezgui, Y., and Mourshed, M. (2017). Electrical load forecasting models: A critical systematic review. Sustainable Cities and Society, 35:257–270. Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). Penalized regression, standard errors, and Bayesian lassos. Bayesian Analysis, 5(2):369–411. Lago, J., De Ridder, F., and De Schutter, B. (2018). Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms. Applied Energy, 221:386–405. Lago, J., Marcjasz, G., De Schutter, B., and Weron, R. (2021). Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Applied Energy, 293:116983. Laio, F. and Tamea, S. (2007). Verification tools for probabilistic forecasts of continuous hydrological variables. Hydrology and Earth System Sciences, 11(4):1267–1277.

References

623

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Pritzel, A., Ravuri, S., Ewalds, T., Alet, F., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Stott, J., Vinyals, O., Mohamed, S., and Battaglia, P. (2022). GraphCast: Learning skillful medium-range global weather forecasting. ArXiv. Preprint. Lanconelli, C., Busetto, M., Dutton, E. G., K¨onig-Langlo, G., Maturilli, M., Sieger, R., Vitale, V., and Yamanouchi, T. (2011). Polar baseline surface radiation measurements during the International Polar Year 2007–2009. Earth System Science Data, 3(1):1–8. Larson, D. P., Li, M., and Coimbra, C. F. M. (2020). SCOPE: Spectral cloud optical property estimation using real-time GOES-R longwave imagery. Journal of Renewable and Sustainable Energy, 12(2):026501. Larson, V. E. (2013). Chapter 12 - Forecasting solar irradiance with numerical weather prediction models. In Kleissl, J., editor, Solar Energy Forecasting and Resource Assessment, pages 319–355. Academic Press, Boston. Laudani, A., Lozito, G. M., Mancilla-David, F., Riganti-Fulginei, F., and Salvini, A. (2015). An improved method for SRC parameter estimation for the CEC PV module model. Solar Energy, 120:525–535. Lauret, P., David, M., and Pinson, P. (2019). Verification of solar irradiance probabilistic forecasts. Solar Energy, 194:254–271. Lauret, P., Lorenz, E., and David, M. (2016). Solar forecasting in a challenging insular context. Atmosphere, 7(2):18. Lave, M. and Kleissl, J. (2010). Solar variability of four sites across the state of Colorado. Renewable Energy, 35(12):2867–2873. Le, N. D., Zidek, J. V., et al. (2006). Statistical Analysis of Environmental Space-Time Processes. Springer. Le Gal La Salle, J., Badosa, J., David, M., Pinson, P., and Lauret, P. (2020). Added-value of ensemble prediction system on the quality of solar irradiance probabilistic forecasts. Renewable Energy, 162:1321–1339. Le Gal La Salle, J., David, M., and Lauret, P. (2021). A new climatology reference model to benchmark probabilistic solar forecasts. Solar Energy, 223:398–414. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4):541–551. Lee, G., Ding, Y., Genton, M. G., and Xie, L. (2015). Power curve estimation with multivariate environmental factors for inland and offshore wind farms. Journal of the American Statistical Association, 110(509):56–67. Lee, J. A., Haupt, S. E., Jim´enez, P. A., Rogers, M. A., Miller, S. D., and McCandless, T. C. (2017). Solar irradiance nowcasting case studies near Sacramento. Journal of Applied Meteorology and Climatology, 56(1):85–108. Lef`evre, M., Oumbe, A., Blanc, P., Espinar, B., Gschwind, B., Qu, Z., Wald, L., SchroedterHomscheidt, M., Hoyer-Klick, C., Arola, A., Benedetti, A., Kaiser, J. W., and Morcrette, J.-J. (2013). McClear: a new model estimating downwelling solar radiation at ground level in clear-sky conditions. Atmospheric Measurement Techniques, 6(9):2403–2418. Letu, H., Yang, K., Nakajima, T. Y., Ishimoto, H., Nagao, T. M., Riedi, J., Baran, A. J., Ma, R., Wang, T., Shang, H., Khatri, P., Chen, L., Shi, C., and Shi, J. (2020). High-resolution retrieval of cloud microphysical properties and surface solar radiation using Himawari-8/AHI next-generation geostationary satellite. Remote Sensing of Environment, 239:111583. Levy, R. C., Mattoo, S., Munchak, L. A., Remer, L. A., Sayer, A. M., Patadia, F., and Hsu, N. C. (2013). The Collection 6 MODIS aerosol products over land and ocean. Atmospheric Measurement Techniques, 6(11):2989–3034.

624

References

Lew, D., Brinkman, G., Ibanez, E., Florita, A., Heaney, M., Hodge, B.-M., Hummon, M., Stark, G., King, J., Lefton, S. A., Kumar, N., Agen, D., Jordan, G., and Venkataraman, S. (2013). The western wind and solar integration study phase 2. Technical Report NREL/TP5500-55588, National Renewable Energy Lab.(NREL), Golden, CO (United States). Li, B. and Zhang, J. (2020). A review on the integration of probabilistic solar forecasting in power systems. Solar Energy, 210:68–86. Li, C., Disfani, V. R., Pecenak, Z. K., Mohajeryami, S., and Kleissl, J. (2018). Optimal OLTC voltage control scheme to enable high solar penetrations. Electric Power Systems Research, 160:318–326. Li, R., Zeng, B., and Liou, M. (1994). A new three-step search algorithm for block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology, 4(4):438– 442. Li, T., Wang, Y., and Zhang, N. (2020). Combining probability density forecasts for power electrical loads. IEEE Transactions on Smart Grid, 11(2):1679–1690. Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3):18–22. Liberzon, A., K¨aufer, T., Bauer, A., Vennemann, P., and Zimmer, E. (2021). Openpiv/openpivpython: Openpiv-python v0.23.4. Lichtendahl, K. C., Grushka-Cockayne, Y., and Winkler, R. L. (2013). Is it better to average probabilities or quantiles? Management Science, 59(7):1594–1611. Lichtendahl, K. C. and Winkler, R. L. (2020). Why do some combinations perform better than others? International Journal of Forecasting, 36(1):142–149. Lim, L. H. I., Ye, Z., Ye, J., Yang, D., and Du, H. (2015a). A linear identification of diode models from single i–v characteristics of PV panels. IEEE Transactions on Industrial Electronics, 62(7):4181–4193. Lim, L. H. I., Ye, Z., Ye, J., Yang, D., and Du, H. (2015b). A linear method to extract diode model parameters of solar panels from a single i–v curve. Renewable Energy, 76:135–142. Lima, F. J. L., Martins, F. R., Pereira, E. B., Lorenz, E., and Heinemann, D. (2016). Forecast for surface solar irradiance at the Brazilian Northeastern region using NWP model and artificial neural networks. Renewable Energy, 87:807–818. Lin, P., Peng, Z., Lai, Y., Cheng, S., Chen, Z., and Wu, L. (2018). Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets. Energy Conversion and Management, 177:704–717. Lindberg, K. B., Seljom, P., Madsen, H., Fischer, D., and Korp˚as, M. (2019). Long-term electricity load forecasting: Current and future trends. Utilities Policy, 58:102–119. Liou, K.-N. (2002). An Introduction to Atmospheric Radiation. Elsevier. Lipperheide, M., Bosch, J. L., and Kleissl, J. (2015). Embedded nowcasting method using cloud speed persistence for a photovoltaic power plant. Solar Energy, 112:232–238. Liu, B., Yang, D., Mayer, M. J., Coimbra, C. F. M., Kleissl, J., Kay, M., Wang, W., Bright, J. M., Xia, X., Lv, X., Srinivasan, D., Wu, Y., Beyer, H. G., Yagli, G. M., and Shen, Y. (2023). Predictability and forecast skill of solar irradiance over the contiguous United States. Renewable and Sustainable Energy Reviews, 182:113359. Liu, B. Y. H. and Jordan, R. C. (1960). The interrelationship and characteristic distribution of direct, diffuse and total solar radiation. Solar Energy, 4(3):1–19. Liu, H., Hu, B., Zhang, L., Zhao, X., Shang, K., Wang, Y., and Wang, J. (2017). Ultraviolet radiation over China: Spatial distribution and trends. Renewable and Sustainable Energy Reviews, 76:1371–1383.

References

625

Liu, H., Li, Q., Bai, Y., Yang, C., Wang, J., Zhou, Q., Hu, S., Shi, T., Liao, X., and Wu, G. (2021). Improving satellite retrieval of oceanic particulate organic carbon concentrations using machine learning methods. Remote Sensing of Environment, 256:112316. Liu, L., Zhan, M., and Bai, Y. (2019). A recursive ensemble model for forecasting the power output of photovoltaic systems. Solar Energy, 189:291–298. Liu, X., Aichhorn, A., Liu, L., and Li, H. (2012). Coordinated control of distributed energy storage system with tap changer transformers for voltage rise mitigation under high photovoltaic penetration. IEEE Transactions on Smart Grid, 3(2):897–906. Long, C. N. and Dutton, E. G. (2010). BSRN global network recommended QC tests, V2. Technical Report 10013/epic.38770, PANGAEA. Long, C. N. and Shi, Y. (2008). An automated quality assessment and control algorithm for surface radiation measurements. The Open Atmospheric Science Journal, 2(1). Lonij, V. P., Brooks, A. E., Cronin, A. D., Leuthold, M., and Koch, K. (2013). Intra-hour forecasts of solar power production using measurements from a network of irradiance sensors. Solar Energy, 97:58–66. Lorenz, E., Hammer, A., Heinemann, D., et al. (2004). Short term forecasting of solar radiation based on satellite data. In EUROSUN2004 (ISES Europe Solar Congress), volume 1, page 2004. Freiburg Germany. Lorenz, E., Hurka, J., Heinemann, D., and Beyer, H. G. (2009). Irradiance forecasting for the power prediction of grid-connected photovoltaic systems. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2(1):2–10. Lorenz, E. N. (1969). Atmospheric predictability as revealed by naturally occurring analogues. Journal of the Atmospheric Sciences, 26(4):636–646. Lorenzo, A. T., Holmgren, W. F., and Cronin, A. D. (2015). Irradiance forecasts based on an irradiance monitoring network, cloud motion, and spatial averaging. Solar Energy, 122:1158–1169. Lu, L., Yang, H., and Burnett, J. (2002). Investigation on wind power potential on Hong Kong islands—an analysis of wind power and wind turbine characteristics. Renewable Energy, 27(1):1–12. Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In IJCAI’81: 7th international joint conference on Artificial intelligence, volume 2, pages 674–679, Vancouver, Canada. Lund, H., Sorknæs, P., Mathiesen, B. V., and Hansen, K. (2018). Beyond sensitivity analysis: A methodology to handle fuel and electricity prices when designing energy scenarios. Energy Research & Social Science, 39:108–116. Lundstrom, L. (2016). camsRad: Client for CAMS Radiation Service. R package version 0.3.0. Luo, J., Hong, T., and Fang, S.-C. (2019). Robust regression models for load forecasting. IEEE Transactions on Smart Grid, 10(5):5397–5404. Luo, J., Hong, T., Gao, Z., and Fang, S.-C. (2023). A robust support vector regression model for electric load forecasting. International Journal of Forecasting, 39(2):1005–1020. Luo, X., Zhang, D., and Zhu, X. (2021). Deep learning based forecasting of photovoltaic power generation by incorporating domain knowledge. Energy, 225:120240. Luoma, J., Kleissl, J., and Murray, K. (2012). Optimal inverter sizing considering cloud enhancement. Solar Energy, 86(1):421–429. Luoma, J., Mathiesen, P., and Kleissl, J. (2014). Forecast value considering energy pricing in California. Applied Energy, 125:230–237. L¨utkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Springer Science & Business Media.

626

References

Lydia, M., Kumar, S. S., Selvakumar, A. I., and Prem Kumar, G. E. (2014). A comprehensive review on wind turbine power curve modeling techniques. Renewable and Sustainable Energy Reviews, 30:452–460. Lynch, P. (2006). The Emergence of Numerical Weather Prediction: Richardson’s Dream. Cambridge University Press. Ma, C. (2003). Families of spatio-temporal stationary covariance models. Journal of Statistical Planning and Inference, 116(2):489–501. Ma, R., Letu, H., Yang, K., Wang, T., Shi, C., Xu, J., Shi, J., Shi, C., and Chen, L. (2020). Estimation of surface shortwave radiation from Himawari-8 satellite data based on a combination of radiative transfer and deep neural network. IEEE Transactions on Geoscience and Remote Sensing, 58(8):5304–5316. Macˆedo, W. N. and Zilles, R. (2007). Operational results of grid-connected photovoltaic system with different inverter’s sizing factors (ISF). Progress in Photovoltaics: Research and Applications, 15(4):337–352. Maciejowska, K., Nitka, W., and Weron, T. (2019). Day-ahead vs. intraday—Forecasting the price spread to maximize economic benefits. Energies, 12(4):631. Mahmood, R., Boyles, R., Brinson, K., Fiebrich, C., Foster, S., Hubbard, K., Robinson, D., Andresen, J., and Leathers, D. (2017). Mesonets: Mesoscale weather and climate observations for the United States. Bulletin of the American Meteorological Society, 98(7):1349– 1361. Makarov, Y. V., Etingov, P. V., Ma, J., Huang, Z., and Subbarao, K. (2011). Incorporating uncertainty of wind power generation forecast into power system operation, dispatch, and unit commitment procedures. IEEE Transactions on Sustainable Energy, 2(4):433–442. Makarov, Y. V., Guttromson, R. T., Huang, Z., Subbarao, K., Etingov, P. V., Chakrabarti, B. B., and Ma, J. (2010). Incorporating wind generation and load forecast uncertainties into power grid operations. Technical Report PNNL-19189, Pacific Northwest National Laboratory, Richland, Washington. Makarov, Y. V., Loutan, C., Ma, J., and de Mello, P. (2009). Operational impacts of wind generation on California power systems. IEEE Transactions on Power Systems, 24(2):1039– 1050. Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2020). The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74. Makridakis, S., Wheelwright, S. C., and Hyndman, R. J. (2008). Forecasting Methods and Applications. John Wiley & Sons. Malamaki, K.-N. D. and Demoulias, C. S. (2014). Analytical calculation of the electrical energy losses on fixed-mounted PV plants. IEEE Transactions on Sustainable Energy, 5(4):1080–1089. Mandal, P., Senjyu, T., Urasaki, N., and Funabashi, T. (2006). A neural network based severalhour-ahead electric load forecasting using similar days approach. International Journal of Electrical Power & Energy Systems, 28(6):367–373. Maor, T. and Appelbaum, J. (2012). View factors of photovoltaic collector systems. Solar Energy, 86(6):1701–1708. Marcjasz, G., Uniejewski, B., and Weron, R. (2019). On the importance of the long-term seasonal component in day-ahead electricity price forecasting with NARX neural networks. International Journal of Forecasting, 35(4):1520–1532. Marion, B. (2002). A method for modeling the current–voltage curve of a PV module for outdoor conditions. Progress in Photovoltaics: Research and Applications, 10(3):205–214. Marion, B. (2017). Numerical method for angle-of-incidence correction factors for diffuse radiation incident photovoltaic modules. Solar Energy, 147:344–348.

References

627

Markovics, D. and Mayer, M. J. (2022). Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renewable and Sustainable Energy Reviews, 161:112364. Marquez, R. and Coimbra, C. F. M. (2011). A novel metric for evaluation of solar forecasting models. In ASME 2011 5th International Conference on Energy Sustainability, pages 1459– 1467. ASME. Marquez, R. and Coimbra, C. F. M. (2013). Proposed metric for evaluation of solar forecasting models. Journal of solar energy engineering, 135(1):011016. Martin, N. and Ruiz, J. M. (2001). Calculation of the PV modules angular losses under field conditions by means of an analytical model. Solar Energy Materials and Solar Cells, 70(1):25–38. Massidda, L. and Marrocu, M. (2018). Quantile regression post-processing of weather forecast for short-term solar power probabilistic forecasting. Energies, 11(7):1763. Masters, G. M. (2013). Renewable and Efficient Electric Power Systems. John Wiley & Sons. Matagne, E. and El Bachtiri, R. (2014). Exact analytical expression of the hemispherical irradiance on a sloped plane from the Perez sky. Solar Energy, 99:267–271. Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions. Management Science, 22(10):1087–1096. Mathiesen, P. and Kleissl, J. (2011). Evaluation of numerical weather prediction for intra-day solar forecasting in the continental United States. Solar Energy, 85(5):967–977. Matsunobu, L. M., Pedro, H. T. C., and Coimbra, C. F. M. (2021). Cloud detection using convolutional neural networks on remote sensing images. Solar Energy, 230:1020–1032. Mattei, M., Notton, G., Cristofari, C., Muselli, M., and Poggi, P. (2006). Calculation of the polycrystalline PV module temperature using a simple method of energy balance. Renewable Energy, 31(4):553–567. Maxwell, E. L. (1987). A quasi-physical model for converting hourly global horizontal to direct normal insolation. Technical Report SERI/TR-215-3087, Solar Energy Research Institute, Golden, CO (United States). Mayer, B. and Kylling, A. (2005). The libRadtran software package for radiative transfer calculations - description and examples of use. Atmospheric Chemistry and Physics, 5(7):1855–1877. Mayer, K. and Tr¨uck, S. (2018). Electricity markets around the world. Journal of Commodity Markets, 9:77–100. Mayer, M. J. (2021). Influence of design data availability on the accuracy of physical photovoltaic power forecasts. Solar Energy, 227:532–540. Mayer, M. J. (2022). Benefits of physical and machine learning hybridization for photovoltaic power forecasting. Renewable and Sustainable Energy Reviews, 168:112772. Mayer, M. J. and Gr´of, G. (2020). Techno-economic optimization of grid-connected, groundmounted photovoltaic power plants by genetic algorithm based on a comprehensive mathematical model. Solar Energy, 202:210–226. Mayer, M. J. and Gr´of, G. (2021). Extensive comparison of physical models for photovoltaic power forecasting. Applied Energy, 283:116239. Mayer, M. J. and Yang, D. (2022). Probabilistic photovoltaic power forecasting using a calibrated ensemble of model chains. Renewable and Sustainable Energy Reviews, 168:112821. Mayer, M. J. and Yang, D. (2023a). Calibration of deterministic NWP forecasts and its impact on verification. International Journal of Forecasting, 39(2):981–991.

628

References

Mayer, M. J. and Yang, D. (2023b). Pairing ensemble numerical weather prediction with ensemble physical model chain for probabilistic photovoltaic power forecasting. Renewable and Sustainable Energy Reviews, 175:113171. Mayer, M. J., Yang, D., and Szintai, B. (2023). Comparing global and regional downscaled NWP models for irradiance and photovoltaic power forecasting: ECMWF versus AROME. Applied Energy, 352:121958. Mazorra Aguiar, L., Pereira, B., Lauret, P., D´ıaz, F., and David, M. (2016). Combining solar irradiance measurements, satellite-derived data and a numerical weather prediction model to improve intra-day solar forecasting. Renewable Energy, 97:599–610. McPherson, R. A., Fiebrich, C. A., Crawford, K. C., Kilby, J. R., Grimsley, D. L., Martinez, J. E., Basara, J. B., Illston, B. G., Morris, D. A., Kloesel, K. A., Melvin, A. D., Shrivastava, H., Wolfinbarger, J. M., Bostic, J. P., Demko, D. B., Elliott, R. L., Stadler, S. J., Carlson, J. D., and Sutherland, A. J. (2007). Statewide monitoring of the mesoscale environment: A technical update on the Oklahoma Mesonet. Journal of Atmospheric and Oceanic Technology, 24(3):301–321. Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7:983–999. Meinshausen, N. (2017). quantregForest: Quantile Regression Forests. R package version 1.3-7. Mejia, F. A. and Kleissl, J. (2013). Soiling losses for solar photovoltaic systems in California. Solar Energy, 95:357–363. Mejia, J. F., Giordano, M., and Wilcox, E. (2018). Conditional summertime day-ahead solar irradiance forecast. Solar Energy, 163:610–622. Mermoud, A. (1994). PVsyst: a user-friendly software for PV-systems simulation. In Twelfth European Photovoltaic solar energy conference, pages 1703–1706. HS Stephens. Messenger, R. A. and Abtahi, A. (2004). Photovoltaic Systems Engineering. CRC press. Messner, J. W., Pinson, P., Browell, J., Bjerreg˚ard, M. B., and Schicker, I. (2020). Evaluation of wind power forecasts—an up-to-date view. Wind Energy, 23(6):1461–1481. Meydbray, J., Emery, K., and Kurtz, S. (2012). Pyranometers and reference cells, what’s the difference? Technical Report NREL/JA-5200-54498, National Renewable Energy Lab.(NREL), Golden, CO (United States). Michalsky, J., Dutton, E., Rubes, M., Nelson, D., Stoffel, T., Wesley, M., Splitt, M., and DeLuisi, J. (1999). Optimal measurement of surface shortwave irradiance using current instrumentation. Journal of Atmospheric and Oceanic Technology, 16(1):55–69. Michalsky, J. J. (1988). The Astronomical Almanac’s algorithm for approximate solar position (1950–2050). Solar Energy, 40(3):227–235. Michalsky, J. J., Kutchenreiter, M., and Long, C. N. (2017). Significant improvements in pyranometer nighttime offsets using high-flow DC ventilation. Journal of Atmospheric and Oceanic Technology, 34(6):1323–1332. Micheli, L., Fern´andez, E. F., Muller, M., and Almonacid, F. (2020). Extracting and generating PV soiling profiles for analysis, forecasting, and cleaning optimization. IEEE Journal of Photovoltaics, 10(1):197–205. Miller, A. (2002). Subset Selection in Regression. CRC Press. Miller, N. W., Shao, M., Pajic, S., and D’Aquila, R. (2014). Western wind and solar integration study phase 3 - Frequency response and transient stability. Technical Report NREL/SR5D00-62906, GE Energy Management, Schenectady, NY (United States). Miller, S. D., Rogers, M. A., Haynes, J. M., Sengupta, M., and Heidinger, A. K. (2018). Shortterm solar irradiance forecasting via satellite/model coupling. Solar Energy, 168:102–117.

References

629

Min, M., Li, J., Wang, F., Liu, Z., and Menzel, W. P. (2020). Retrieval of cloud top properties from advanced geostationary satellite imager measurements based on machine learning algorithms. Remote Sensing of Environment, 239:111616. Minnis, P., Sun-Mack, S., Young, D. F., Heck, P. W., Garber, D. P., Chen, Y., Spangenberg, D. A., Arduini, R. F., Trepte, Q. Z., Smith, W. L., Ayers, J. K., Gibson, S. C., Miller, W. F., Hong, G., Chakrapani, V., Takano, Y., Liou, K.-N., Xie, Y., and Yang, P. (2011). CERES edition-2 cloud property retrievals using TRMM VIRS and Terra and Aqua MODIS data– Part I: Algorithms. IEEE Transactions on Geoscience and Remote Sensing, 49(11):4374– 4400. Mitchell, J. and Hall, S. G. (2005). Evaluating, comparing and combining density forecasts using the KLIC with an application to the Bank of England and NIESR ‘fan’ charts of inflation. Oxford Bulletin of Economics and Statistics, 67(s1):995–1033. Mlawer, E. J., Taubman, S. J., Brown, P. D., Iacono, M. J., and Clough, S. A. (1997). Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. Journal of Geophysical Research: Atmospheres, 102(D14):16663–16682. M¨oller, A. and Groß, J. (2016). Probabilistic temperature forecasting based on an ensemble autoregressive modification. Quarterly Journal of the Royal Meteorological Society, 142(696):1385–1394. M¨oller, A. and Groß, J. (2020). Probabilistic temperature forecasting with a heteroscedastic autoregressive ensemble postprocessing model. Quarterly Journal of the Royal Meteorological Society, 146(726):211–224. Mondol, J. D., Yohanis, Y. G., and Norton, B. (2006). Optimal sizing of array and inverter for grid-connected photovoltaic systems. Solar Energy, 80(12):1517–1539. Montero, J.-M., Fern´andez-Avil´es, G., and Mateu, J. (2015). Spatial and Spatio-Temporal Geostatistical Modeling and Kriging. John Wiley & Sons. Montero-Manso, P., Athanasopoulos, G., Hyndman, R. J., and Talagala, T. S. (2020). FFORMA: Feature-based forecast model averaging. International Journal of Forecasting, 36(1):86–92. Moon, P. and Spencer, D. E. (1942). Illumination from a non-uniform sky. Illum. Engng.(NY), 37:707–726. Mora Segado, P., Carretero, J., and Sidrach-de Cardona, M. (2015). Models to predict the operating temperature of different photovoltaic modules in outdoor conditions. Progress in Photovoltaics: Research and Applications, 23(10):1267–1282. Morcrette, J.-J., Barker, H. W., Cole, J. N. S., Iacono, M. J., and Pincus, R. (2008). Impact of a new radiation package, McRad, in the ECMWF Integrated Forecasting System. Monthly Weather Review, 136(12):4773–4798. Morcrette, J.-J., Boucher, O., Jones, L., Salmond, D., Bechtold, P., Beljaars, A., Benedetti, A., Bonet, A., Kaiser, J. W., Razinger, M., Schulz, M., Serrar, S., Simmons, A. J., Sofiev, M., Suttie, M., Tompkins, A. M., and Untch, A. (2009). Aerosol analysis and forecast in the european centre for medium-range weather forecasts integrated forecast system: Forward modeling. Journal of Geophysical Research: Atmospheres, 114:D06206. ´ Moreno-Carbonell, S., S´anchez-Ubeda, E. F., and Mu˜noz, A. (2020). Rethinking weather station selection for electric load forecasting using genetic algorithms. International Journal of Forecasting, 36(2):695–712. M¨uller, R., Behrendt, T., Hammer, A., and Kemper, A. (2012). A new algorithm for the satellite-based retrieval of solar surface irradiance in spectral bands. Remote Sensing, 4(3):622–647. Muneer, T. (1990). Solar radiation model for Europe. Building Services Engineering Research and Technology, 11(4):153–163.

630

References

Muneer, T., Gueymard, C., and Kambezidis, H. (2004). Hourly slope irradiation and illuminance. In Solar Radiation and Daylight Models, pages 143–221. Butterworth-Heinemann, Oxford. Munkhammar, J., van der Meer, D., and Wid´en, J. (2019). Probabilistic forecasting of highresolution clear-sky index time-series using a Markov-chain mixture distribution model. Solar Energy, 184:688–695. Munkhammar, J., Wid´en, J., and Hinkelman, L. M. (2017). A copula method for simulating correlated instantaneous solar irradiance in spatial networks. Solar Energy, 143:10–21. Murphy, A. H. (1971). A note on the ranked probability score. Journal of Applied Meteorology and Climatology, 10(1):155–156. Murphy, A. H. (1972a). Scalar and vector partitions of the probability score: Part I. Two-state situation. Journal of Applied Meteorology and Climatology, 11(2):273–282. Murphy, A. H. (1972b). Scalar and vector partitions of the probability score: Part II. N-state situation. Journal of Applied Meteorology and Climatology, 11(8):1183–1192. Murphy, A. H. (1972c). Scalar and vector partitions of the ranked probability score. Monthly Weather Review, 100(10):701–708. Murphy, A. H. (1973a). Hedging and skill scores for probability forecasts. Journal of Applied Meteorology and Climatology, 12(1):215–223. Murphy, A. H. (1973b). A new vector partition of the probability score. Journal of Applied Meteorology (1962-1982), 12(4):595–600. Murphy, A. H. (1988). Skill scores based on the mean square error and their relationships to the correlation coefficient. Monthly Weather Review, 116(12):2417–2424. Murphy, A. H. (1991). Forecast verification: Its complexity and dimensionality. Monthly Weather Review, 119(7):1590–1601. Murphy, A. H. (1992). Climatology, persistence, and their linear combination as standards of reference in skill scores. Weather and Forecasting, 7(4):692–698. Murphy, A. H. (1993). What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8(2):281–293. Murphy, A. H. and Daan, H. (1985). Forecast evaluation. In Murphy, A. H. and Katz, R. W., editors, Probability, statistics, and decision making in the atmospheric sciences, pages 379– 437. CRC Press. Murphy, A. H. and Epstein, E. S. (1967). A note on probability forecasts and “hedging”. Journal of Applied Meteorology (1962-1982), 6(6):1002–1004. Murphy, A. H. and Winkler, R. L. (1987). A general framework for forecast verification. Monthly Weather Review, 115(7):1330–1338. Muzathik, A. M. (2014). Photovoltaic modules operating temperature estimation using a simple correlation. International Journal of Energy Engineering, 4(4):151–158. Naftulin, D. H., Ware, J. R., and Donnelly, F. A. (1973). The Doctor Fox Lecture: a paradigm of educational seduction. Journal of Medical Education, 48(7):630–635. Nagy, G. I., Barta, G., Kazi, S., Borb´ely, G., and Simon, G. (2016). GEFCom2014: Probabilistic solar and wind power forecasting using a generalized additive tree ensemble approach. International Journal of Forecasting, 32(3):1087–1093. Narajewski, M. and Ziel, F. (2020). Econometric modelling and forecasting of intraday electricity prices. Journal of Commodity Markets, 19:100107. Narang, D., Mahmud, R., Ingram, M., and Hoke, A. (2021). An overview of issues related to IEEE Std 1547-2018 requirements regarding voltage and reactive power control. Technical Report NREL/TP-5D00-77156, National Renewable Energy Lab.(NREL), Golden, CO (United States). NCAR (2015). verification: Weather Forecast Verification Utilities. R package version 1.42.

References

631

Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J. K., and Grover, A. (2023). ClimaX: A foundation model for weather and climate. ArXiv. Preprint. Ni, Q., Zhuang, S., Sheng, H., Kang, G., and Xiao, J. (2017). An ensemble prediction intervals approach for short-term PV power forecasting. Solar Energy, 155:1072–1083. Nielsen, A. H., Iosifidis, A., and Karstoft, H. (2021). Irradiancenet: Spatiotemporal deep learning model for satellite-derived solar irradiance short-term forecasting. Solar Energy, 228:659–669. Nikodinoska, D., K¨aso, M., and M¨usgens, F. (2022). Solar and wind power generation forecasts using elastic net in time-varying forecast combinations. Applied Energy, 306:117983. Nishioka, K., Hatayama, T., Uraoka, Y., Fuyuki, T., Hagihara, R., and Watanabe, M. (2003). Field-test analysis of pv system output characteristics focusing on module temperature. Solar Energy Materials and Solar Cells, 75(3):665–671. Notton, G., Cristofari, C., Mattei, M., and Poggi, P. (2005). Modelling of a double-glass photovoltaic module using finite differences. Applied Thermal Engineering, 25(17):2854– 2877. Notton, G., Lazarov, V., and Stoyanov, L. (2010). Optimal sizing of a grid-connected PV system for various PV module technologies and inclinations, inverter efficiency characteristics and locations. Renewable Energy, 35(2):541–554. Notton, G., Nivet, M.-L., Voyant, C., Paoli, C., Darras, C., Motte, F., and Fouilloy, A. (2018). Intermittent and stochastic character of renewable energy sources: Consequences, cost of intermittence and benefit of forecasting. Renewable and Sustainable Energy Reviews, 87:96–105. Nowotarski, J. and Weron, R. (2016). On the importance of the long-term seasonal component in day-ahead electricity price forecasting. Energy Economics, 57:228–235. Nowotarski, J. and Weron, R. (2018). Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renewable and Sustainable Energy Reviews, 81:1548– 1568. Nu˜no, E., Koivisto, M., Cutululis, N. A., and Sørensen, P. (2018). On the simulation of aggregated solar PV forecast errors. IEEE Transactions on Sustainable Energy, 9(4):1889–1898. Nychka, D., Furrer, R., Paige, J., and Sain, S. (2017). fields: Tools for spatial data. R package version 11.6. Ogliari, E., Dolara, A., Manzolini, G., and Leva, S. (2017). Physical and hybrid methods comparison for the day ahead PV output power forecast. Renewable Energy, 113:11–21. Ohmura, A., Dutton, E. G., Forgan, B., Fr¨ohlich, C., Gilgen, H., Hegner, H., Heimo, A., K¨onigLanglo, G., McArthur, B., M¨uller, G., Philipona, R., Pinker, R., Whitlock, C. H., Dehne, K., and Wild, M. (1998). Baseline Surface Radiation Network (BSRN/WCRP): New precision radiometry for climate research. Bulletin of the American Meteorological Society, 79(10):2115–2136. Olmo, F. J., Vida, J., Foyo, I., Castro-Diez, Y., and Alados-Arboledas, L. (1999). Prediction of global irradiance on inclined surfaces from horizontal global irradiance. Energy, 24(8):689– 704. Orgill, J. F. and Hollands, K. G. T. (1977). Correlation equation for hourly diffuse radiation on a horizontal surface. Solar Energy, 19(4):357–359. Orlanski, I. (1975). A rational subdivision of scales for atmospheric processes. Bulletin of the American Meteorological Society, 56(5):527–530. Orwig, K. D., Ahlstrom, M. L., Banunarayanan, V., Sharp, J., Wilczak, J. M., Freedman, J., Haupt, S. E., Cline, J., Bartholomy, O., Hamann, H. F., Hodge, B.-M., Finley, C., Nakafuji, D., Peterson, J. L., Maggio, D., and Marquis, M. (2015). Recent trends in variable gen-

632

References

eration forecasting and its value to the power system. IEEE Transactions on Sustainable Energy, 6(3):924–933. Osadciw, L. A., Yan, Y., Ye, X., Benson, G., and White, E. (2010). Wind turbine diagnostics based on power curve using particle swarm optimization. In Wang, L., Singh, C., and Kusiak, A., editors, Wind Power Systems: Applications of Computational Intelligence, pages 151–165. Springer Berlin Heidelberg, Berlin, Heidelberg. Osterwald, C. R. (1986). Translation of device performance measurements to reference conditions. Solar Cells, 18(3):269–279. Paletta, Q., Arbod, G., and Lasenby, J. (2021). Benchmarking of deep learning irradiance forecasting models from sky images – An in-depth analysis. Solar Energy, 224:855–867. Panagiotelis, A., Athanasopoulos, G., Gamakumara, P., and Hyndman, R. J. (2021). Forecast reconciliation: A geometric view with new insights on bias correction. International Journal of Forecasting, 37(1):343–359. Panamtash, H., Zhou, Q., Hong, T., Qu, Z., and Davis, K. O. (2020). A copula-based Bayesian method for probabilistic solar power forecasting. Solar Energy, 196:336–345. Passias, D. and K¨allb¨ack, B. (1984). Shading effects in rows of solar cell panels. Solar Cells, 11(3):281–291. Patrignani, A., Knapp, M., Redmond, C., and Santos, E. (2020). Technical overview of the Kansas Mesonet. Journal of Atmospheric and Oceanic Technology, 37(12):2167–2183. Paulescu, E. and Blaga, R. (2019). A simple and reliable empirical model with two predictors for estimating 1-minute diffuse fraction. Solar Energy, 180:75–84. Pavolonis, M. J., Heidinger, A. K., and Uttal, T. (2005). Daytime global cloud typing from AVHRR and VIIRS: Algorithm description, validation, and comparisons. Journal of Applied Meteorology, 44(6):804–826. Pawlikowski, M. and Chorowska, A. (2020). Weighted ensemble of statistical models. International Journal of Forecasting, 36(1):93–97. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830. Pedro, H. T. C. and Coimbra, C. F. M. (2012). Assessment of forecasting techniques for solar power production with no exogenous inputs. Solar Energy, 86(7):2017–2028. Pedro, H. T. C. and Coimbra, C. F. M. (2015). Nearest-neighbor methodology for prediction of intra-hour global horizontal and direct normal irradiances. Renewable Energy, 80:770–782. Pedro, H. T. C., Coimbra, C. F. M., David, M., and Lauret, P. (2018). Assessment of machine learning techniques for deterministic and probabilistic intra-hour solar forecasts. Renewable Energy, 123:191–203. Pedro, H. T. C., Larson, D. P., and Coimbra, C. F. M. (2019). A comprehensive dataset for the accelerated development and benchmarking of solar forecasting methods. Journal of Renewable and Sustainable Energy, 11(3):036102. Pelland, S., Galanis, G., and Kallos, G. (2013). Solar and photovoltaic forecasting through post-processing of the global environmental multiscale numerical weather prediction model. Progress in Photovoltaics: Research and Applications, 21(3):284–296. Peratikou, S. and Charalambides, A. G. (2022). Estimating clear-sky PV electricity production without exogenous data. Solar Energy Advances, 2:100015. Pereira, S., Canhoto, P., Salgado, R., and Costa, M. J. (2019). Development of an ANN based corrective algorithm of the operational ECMWF global horizontal irradiation forecasts. Solar Energy, 185:387–405.

References

633

P´erez, E., P´erez, J., Segarra-Tamarit, J., and Beltran, H. (2021). A deep learning model for intra-day forecasting of solar irradiance using satellite-based estimations in the vicinity of a PV power plant. Solar Energy, 218:652–660. Perez, M. (2014). A Model for Optimizing the Combination of Solar Electricity Generation, Supply Curtailment, Transmission and Storage. PhD thesis, Columbia University. Perez, M., Perez, R., R´abago, K. R., and Putnam, M. (2019a). Overbuilding & curtailment: The cost-effective enablers of firm PV generation. Solar Energy, 180:412–422. Perez, M. J. and Fthenakis, V. M. (2015). On the spatial decorrelation of stochastic solar resource variability at long timescales. Solar Energy, 117:46–58. Perez, M. J., Perez, R., and Hoff, T. E. (2021). Ultra-high photovoltaic penetration: Where to deploy. Solar Energy, 224:1079–1098. ˇ uri, M. (2013a). Chapter 2 - Semi-empirical satellite models. Perez, R., Cebecauer, T., and S´ In Kleissl, J., editor, Solar Energy Forecasting and Resource Assessment, pages 319–355. Academic Press, Boston. Perez, R., Ineichen, P., Moore, K., Kmiecik, M., Chain, C., George, R., and Vignola, F. (2002). A new operational model for satellite-derived irradiances: description and validation. Solar Energy, 73(5):307–317. Perez, R., Ineichen, P., Seals, R., Michalsky, J., and Stewart, R. (1990). Modeling daylight availability and irradiance components from direct and global irradiance. Solar Energy, 44(5):271–289. Perez, R., Kivalov, S., Schlemmer, J., Hemker, K., Renn´e, D., and Hoff, T. E. (2010a). Validation of short and medium term operational solar radiation forecasts in the US. Solar Energy, 84(12):2161–2172. Perez, R., Kivalov, S., Zelenka, A., Schlemmer, J., and Hemker Jr., K. (2010b). Improving the performance of satellite-to-irradiance models using the satellite’s infrared sensors. In ASES National Solar Conference, Phoenix, USA. Perez, R., Lorenz, E., Pelland, S., Beauharnois, M., Van Knowe, G., Hemker, K., Heinemann, D., Remund, J., M¨uller, S. C., Traunm¨uller, W., Steinmauer, G., Pozo, D., Ruiz-Arias, J. A., Lara-Fanego, V., Ramirez-Santigosa, L., Gaston-Romero, M., and Pomares, L. M. (2013b). Comparison of numerical weather prediction solar irradiance forecasts in the US, Canada and Europe. Solar Energy, 94:305–326. Perez, R., Moore, K., Wilcox, S., Renn´e, D., and Zelenka, A. (2007). Forecasting solar radiation – Preliminary evaluation of an approach based upon the national forecast database. Solar Energy, 81(6):809–812. Perez, R., Perez, M., Pierro, M., Schlemmer, J., Kivalov, S., Dise, J., Keelin, P., Grammatico, M., Swierc, A., Ferreira, J., Foster, A., Putnam, M., and Hoff, T. (2019b). Operationally perfect solar power forecasts: A scalable strategy to lowest-cost firm solar power generation. In 2019 IEEE 46th Photovoltaic Specialists Conference (PVSC), volume 2, pages 1–6. Perez, R., Perez, M., Schlemmer, J., Dise, J., Hoff, T. E., Swierc, A., Keelin, P., Pierro, M., and Cornaro, C. (2020). From firm solar power forecasts to firm solar power generation an effective path to ultra-high renewable penetration a New York case study. Energies, 13(17):4489. Perez, R., R´abago, K. R., Trahan, M., Rawlings, L., Norris, B., Hoff, T., Putnam, M., and Perez, M. (2016a). Achieving very high PV penetration – The need for an effective electricity remuneration framework and a central role for grid operators. Energy Policy, 96:27–35. Perez, R., Schlemmer, J., Hemker, K., Kivalov, S., Kankiewicz, A., and Dise, J. (2016b). Solar energy forecast validation for extended areas & economic impact of forecast accuracy. In 2016 IEEE 43rd Photovoltaic Specialists Conference (PVSC), pages 1119–1124.

634

References

Perez, R., Schlemmer, J., Kivalov, S., Dise, J., Keelin, P., Grammatico, M., Hoff, T., and Tuohy, A. (2018). A new version of the SUNY solar forecast model: A scalable approach to site-specific model training. In IEEE 45th Photovoltaic Specialists Conference, pages 1–6. Perez, R., Seals, R., Ineichen, P., Stewart, R., and Menicucci, D. (1987). A new simplified version of the Perez diffuse irradiance model for tilted surfaces. Solar Energy, 39(3):221– 231. Perez, R., Stewart, R., Arbogast, C., Seals, R., and Scott, J. (1986). An anisotropic hourly diffuse radiation model for sloping surfaces: Description, performance validation, site dependency evaluation. Solar Energy, 36(6):481–497. Perez, R., Stewart, R., Seals, R., and Guertin, T. (1988). The development and verification of the Perez diffuse radiation model. Technical Report SAND88-7030, Atmospheric Sciences Research Center, SUNY at Albany, Albany, NY. Persson, C., Bacher, P., Shiga, T., and Madsen, H. (2017). Multi-site solar power forecasting using gradient boosted regression trees. Solar Energy, 150:423–436. Peterson, J. and Vignola, F. (2020). Structure of a comprehensive solar radiation dataset. Solar Energy, 211:366–374. Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R. Springer. Petropoulos, F. and Makridakis, S. (2020). The M4 Competition: Bigger. Stronger. Better. International Journal of Forecasting, 36(1):3–6. Petropoulos, F. and Svetunkov, I. (2020). A simple combination of univariate models. International Journal of Forecasting, 36(1):110–115. Pierce, B. G., Braid, J. L., Stein, J. S., Augustyn, J., and Riley, D. (2022). Solar transposition modeling via deep neural networks with sky images. IEEE Journal of Photovoltaics, 12(1):145–151. Pierce, D. (2019). ncdf4: Interface to Unidata netCDF (Version 4 or Earlier) Format Data Files. R package version 1.17. Pierro, M., Bucci, F., De Felice, M., Maggioni, E., Moser, D., Perotto, A., Spada, F., and Cornaro, C. (2016). Multi-model ensemble for day ahead prediction of photovoltaic power generation. Solar Energy, 134:132–146. Pierro, M., Gentili, D., Liolli, F. R., Cornaro, C., Moser, D., Betti, A., Moschella, M., Collino, E., Ronzio, D., and van der Meer, D. (2022). Progress in regional PV power forecasting: A sensitivity analysis on the italian case study. Renewable Energy, 189:983–996. Pierro, M., Perez, R., Perez, M., Prina, M. G., Moser, D., and Cornaro, C. (2021). Italian protocol for massive solar integration: From solar imbalance regulation to firm 24/365 solar generation. Renewable Energy, 169:425–436. Pinker, R. T. and Laszlo, I. (1992). Modeling surface solar irradiance for satellite applications on a global scale. Journal of Applied Meteorology and Climatology, 31(2):194–211. Pinson, P. (2013). Wind Energy: Forecasting Challenges for Its Operational Management. Statistical Science, 28(4):564–585. Pinson, P. and Girard, R. (2012). Evaluating the quality of scenarios of short-term wind power generation. Applied Energy, 96:12–20. Pinson, P., Han, L., and Kazempour, J. (2022). Regression markets and application to energy forecasting. TOP, 30(3):533–573. Pinson, P. and Kariniotakis, G. (2004). On-line assessment of prediction risk for wind power production forecasts. Wind Energy, 7(2):119–132. Pinson, P. and Kariniotakis, G. (2010). Conditional prediction intervals of wind power generation. IEEE Transactions on Power Systems, 25(4).

References

635

Pinson, P., McSharry, P., and Madsen, H. (2010). Reliability diagrams for non-parametric density forecasts of continuous variables: Accounting for serial correlation. Quarterly Journal of the Royal Meteorological Society, 136(646):77–90. Pinson, P., Nielsen, H. A., Madsen, H., and Kariniotakis, G. (2009). Skill forecasting from ensemble predictions of wind power. Applied Energy, 86(7):1326–1334. Pinson, P., Nielsen, H. A., Møller, J. K., Madsen, H., and Kariniotakis, G. N. (2007). Nonparametric probabilistic forecasts of wind power: required properties and evaluation. Wind Energy, 10(6):497–516. Pinson, P. and Tastu, J. (2014). Discussion of “Prediction Intervals for Short-Term Wind Farm Generation Forecasts” and “Combined Nonparametric Prediction Intervals for Wind Power Generation”. IEEE Transactions on Sustainable Energy, 5(3):1019–1020. Polo, J., Fern´andez-Peruchena, C., Salamalikis, V., Mazorra-Aguiar, L., Turpin, M., Mart´ınPomares, L., Kazantzidis, A., Blanc, P., and Remund, J. (2020). Benchmarking on improvement and site-adaptation techniques for modeled solar radiation datasets. Solar Energy, 201:469–479. ˇ uri, M., Mart´ın, L., MiesPolo, J., Wilbert, S., Ruiz-Arias, J. A., Meyer, R., Gueymard, C., S´ linger, T., Blanc, P., Grant, I., Boland, J., Ineichen, P., Remund, J., Escobar, R., Troccoli, A., Sengupta, M., Nielsen, K. P., Renne, D., Geuder, N., and Cebecauer, T. (2016). Preliminary survey on site-adaptation techniques for satellite-derived and reanalysis solar radiation datasets. Solar Energy, 132:25–37. Popper, K. (2014). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge. Potts, J. M. (2012). Basic concepts. In Jolliffe, I. T. and Stephenson, D. B., editors, Forecast Verification: A Practitioner’s Guide in Atmospheric Science, pages 11–30. American Meteorological Society, Boston MA, USA. Prusty, B. R. and Jena, D. (2017). A critical review on probabilistic load flow studies in uncertainty constrained power systems with photovoltaic generation and a new approach. Renewable and Sustainable Energy Reviews, 69:1286–1302. Pu, Z. and Kalnay, E. (2018). Numerical weather prediction basics: Models, numerical methods, and data assimilation. In Duan, Q., Pappenberger, F., Thielen, J., Wood, A., Cloke, H., and Schaake, J., editors, Handbook of Hydrometeorological Ensemble Forecasting, pages 67–97. Springer, Berlin, Heidelberg. Qin, J., Jiang, H., Lu, N., Yao, L., and Zhou, C. (2022). Enhancing solar PV output forecast by integrating ground and satellite observations with deep learning. Renewable and Sustainable Energy Reviews, 167:112680. Qu, Y., Xu, J., Sun, Y., and Liu, D. (2021). A temporal distributed hybrid deep learning model for day-ahead distributed PV power forecasting. Applied Energy, 304:117704. Qu, Z., Oumbe, A., Blanc, P., Espinar, B., Gesell, G., Gschwind, B., Kl¨user, L., Lef`evre, M., Saboret, L., Schroedter-Homscheidt, M., and Wald, L. (2017). Fast radiative transfer parameterisation for assessing the surface solar irradiance: The Heliosat-4 method. Meteorologische Zeitschrift, 26(1):33–57. Quan, H., Srinivasan, D., and Khosravi, A. (2014). Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Transactions on Neural Networks and Learning Systems, 25(2):303–315. Quan, H. and Yang, D. (2020). Probabilistic solar irradiance transposition models. Renewable and Sustainable Energy Reviews, 125:109814. Quesada-Ruiz, S., Linares-Rodr´ıguez, A., Ruiz-Arias, J. A., Pozo-V´azquez, D., and TovarPescador, J. (2015). An advanced ANN-based method to estimate hourly solar radiation from multi-spectral MSG imagery. Solar Energy, 115:494–504.

636

References

Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133(5):1155–1174. Ramanathan, R., Engle, R., Granger, C. W. J., Vahid-Araghi, F., and Brace, C. (1997). Short-run forecasts of electricity loads and peaks. International Journal of Forecasting, 13(2):161–174. Ramli, M. A. M., Twaha, S., and Al-Turki, Y. A. (2015). Investigating the performance of support vector machine and artificial neural networks in predicting solar radiation on a tilted surface: Saudi Arabia case study. Energy Conversion and Management, 105:442–452. Randles, C. A., da Silva, A. M., Buchard, V., Colarco, P. R., Darmenov, A., Govindaraju, R., Smirnov, A., Holben, B., Ferrare, R., Hair, J., Shinozuka, Y., and Flynn, C. J. (2017). The MERRA-2 aerosol reanalysis, 1980 onward. Part I: System description and data assimilation evaluation. Journal of Climate, 30(17):6823–6850. Ranjan, R. and Gneiting, T. (2010). Combining probability forecasts. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(1):71–91. Rasp, S., Dueben, P. D., Scher, S., Weyn, J. A., Mouatadid, S., and Thuerey, N. (2020). WeatherBench: A benchmark data set for data-driven weather forecasting. Journal of Advances in Modeling Earth Systems, 12(11):e2020MS002203. Reda, I. and Andreas, A. (2004). Solar position algorithm for solar radiation applications. Solar Energy, 76(5):577–589. Reda, I. and Andreas, A. (2008). Solar position algorithm for solar radiation applications. Technical Report NREL/TP-560-34302, National Renewable Energy Lab.(NREL), Golden, CO (United States). Reikard, G. (2009). Predicting solar radiation at high resolutions: A comparison of time series forecasts. Solar Energy, 83(3):342–349. Reindl, D. T., Beckman, W. A., and Duffie, J. A. (1990). Evaluation of hourly tilted surface radiation models. Solar Energy, 45(1):9–17. Reiss, A. J. (1951). The accuracy, efficiency, and validity of a prediction instrument. American Journal of Sociology, 56(6):552–561. Rekioua, D. and Matagne, E. (2012). Optimization of Photovoltaic Power Systems: Modelization, Simulation and Control. Springer Science & Business Media. Remer, L. A., Kaufman, Y. J., Tanr´e, D., Mattoo, S., Chu, D. A., Martins, J. V., Li, R.-R., Ichoku, C., Levy, R. C., Kleidman, R. G., Eck, T. F., Vermote, E., and Holben, B. N. (2005). The MODIS aerosol algorithm, products, and validation. Journal of the Atmospheric Sciences, 62(4):947–973. Remund, J., Perez, R., Perez, M., Pierro, M., and Yang, D. (2023). Firm photovoltaic power generation: Overview and economic outlook. Solar RRL, In Press. Richardson, L. F. (1922). Weather Prediction by Numerical Process. Cambridge University Press. Ridley, B., Boland, J., and Lauret, P. (2010). Modelling of diffuse solar fraction with multiple predictors. Renewable Energy, 35(2):478–483. Riedel-Lyngskær, N., Ribaconka, M., P´o, M., Thorseth, A., Thorsteinsson, S., Dam-Hansen, C., and Jakobsen, M. L. (2022). The effect of spectral albedo in bifacial photovoltaic performance. Solar Energy, 231:921–935. Rienecker, M. M., Suarez, M. J., Gelaro, R., Todling, R., Bacmeister, J., Liu, E., Bosilovich, M. G., Schubert, S. D., Takacs, L., Kim, G.-K., Bloom, S., Chen, J., Collins, D., Conaty, A., da Silva, A., Gu, W., Joiner, J., Koster, R. D., Lucchesi, R., Molod, A., Owens, T., Pawson, S., Pegion, P., Redder, C. R., Reichle, R., Robertson, F. R., Ruddick, A. G., Sienkiewicz, M., and Woollen, J. (2011). MERRA: NASA’s Modern-Era Retrospective Analysis for Research and Applications. Journal of Climate, 24(14):3624–3648.

References

637

Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape,(with discussion). Applied Statistics, 54:507–554. Rigby, R. A., Stasinopoulos, M. D., Heller, G. Z., and De Bastiani, F. (2019). Distributions for Modeling Location, Scale, and Shape: Using GAMLSS in R. CRC press. Rigollier, C., Lef`evre, M., and Wald, L. (2004). The method Heliosat-2 for deriving shortwave solar radiation from satellite images. Solar Energy, 77(2):159–169. Rodr´ıguez-Ben´ıtez, F. J., Arbizu-Barrena, C., Huertas-Tato, J., Aler-Mur, R., Galv´an-Le´on, I., and Pozo-V´azquez, D. (2020). A short-term solar radiation forecasting system for the Iberian Peninsula. Part 1: Models description and performance assessment. Solar Energy, 195:396–412. Rodr´ıguez-Ben´ıtez, F. J., L´opez-Cuesta, M., Arbizu-Barrena, C., Fern´andez-Le´on, M. M., ´ Tovar-Pescador, J., Santos-Alamillos, F. J., and Pozo-V´azquez, D. Pamos-Ure˜na, M. A., (2021). Assessment of new solar radiation nowcasting methods based on sky-camera and satellite imagery. Applied Energy, 292:116838. Rodr´ıguez-Gallegos, C. D., Liu, H., Gandhi, O., Singh, J. P., Krishnamurthy, V., Kumar, A., Stein, J. S., Wang, S., Li, L., Reindl, T., and Peters, I. M. (2020). Global techno-economic performance of bifacial and tracking photovoltaic systems. Joule, 4(7):1514–1541. Rodr´ıguez-Gallegos, C. D., Vinayagam, L., Gandhi, O., Yagli, G. M., Alvarez-Alvarado, M. S., Srinivasan, D., Reindl, T., and Panda, S. K. (2021). Novel forecast-based dispatch strategy optimization for PV hybrid systems in real time. Energy, 222:119918. Rojas, R. G., Alvarado, N., Boland, J., Escobar, R., and Castillejo-Cuberos, A. (2019). Diffuse fraction estimation using the BRL model and relationship of predictors under Chilean, Costa Rican and Australian climatic conditions. Renewable Energy, 136:1091–1106. Ross, R. G. (1982). Flat-plate photovoltaic module and array engineering. In 1982 Annual Meeting of the American Section of the International Solar Energy Society, pages 909–914. Roulston, M. S. and Smith, L. A. (2002). Evaluating probabilistic forecasts using information theory. Monthly Weather Review, 130(6):1653–1660. Roulston, M. S. and Smith, L. A. (2003). Combining dynamical and statistical ensembles. Tellus A: Dynamic Meteorology and Oceanography, 55(1):16–30. Roy, A., Hammer, A., Heinemann, D., L¨unsdorf, O., and Lezaca, J. (2022). Impact of tropical convective conditions on solar irradiance forecasting based on cloud motion vectors. Environmental Research Letters, 17(10):104048. Ruiz-Arias, J. A., Tovar-Pescador, J., Pozo-V´azquez, D., and Alsamamra, H. (2009). A comparative analysis of DEM-based models to estimate the solar radiation in mountainous terrain. International Journal of Geographical Information Science, 23(8):1049–1076. Ruppert, D., Sheather, S. J., and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association, 90(432):1257– 1270. Russell, B. (2017). The Scientific Outlook. Routledge. Saha, S., Moorthi, S., Pan, H.-L., Wu, X., Wang, J., Nadiga, S., Tripp, P., Kistler, R., Woollen, J., Behringer, D., Liu, H., Stokes, D., Grumbine, R., Gayno, G., Wang, J., Hou, Y.-T., Chuang, H.-Y., Juang, H.-M. H., Sela, J., Iredell, M., Treadon, R., Kleist, D., Delst, P. V., Keyser, D., Derber, J., Ek, M., Meng, J., Wei, H., Yang, R., Lord, S., van den Dool, H., Kumar, A., Wang, W., Long, C., Chelliah, M., Xue, Y., Huang, B., Schemm, J.-K., Ebisuzaki, W., Lin, R., Xie, P., Chen, M., Zhou, S., Higgins, W., Zou, C.-Z., Liu, Q., Chen, Y., Han, Y., Cucurull, L., Reynolds, R. W., Rutledge, G., and Goldberg, M. (2010). The NCEP Climate Forecast System Reanalysis. Bulletin of the American Meteorological Society, 91(8):1015– 1058.

638

References

Sahu, D. K., Yang, H., and Kleissl, J. (2018). Assimilating observations to simulate marine layer stratocumulus for solar forecasting. Solar Energy, 162:454–471. Sampson, P. D. and Guttorp, P. (1992). Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical Association, 87(417):108–119. Sanders, F. (1963). On subjective probability forecasting. Journal of Applied Meteorology and Climatology, 2(2):191–201. Sauer, K. J., Roessler, T., and Hansen, C. W. (2015). Modeling the irradiance and temperature dependence of photovoltaic modules in PVsyst. IEEE Journal of Photovoltaics, 5(1):152– 158. Schaaf, C. B., Gao, F., Strahler, A. H., Lucht, W., Li, X., Tsang, T., Strugnell, N. C., Zhang, X., Jin, Y., Muller, J.-P., Lewis, P., Barnsley, M., Hobson, P., Disney, M., Roberts, G., Dunderdale, M., Doll, C., d’Entremont, R. P., Hu, B., Liang, S., Privette, J. L., and Roy, D. (2002). First operational BRDF, albedo nadir reflectance products from MODIS. Remote Sensing of Environment, 83(1):135–148. Sch¨afer, J. and Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1):32. Schaffer, J. (2015). What not to multiply without necessity. Australasian Journal of Philosophy, 93(4):644–664. Scher, S. and Messori, G. (2019). Weather and climate forecasting with neural networks: using general circulation models (GCMs) with different complexity as a study ground. Geoscientific Model Development, 12(7):2797–2809. Schinke-Nendza, A., von Loeper, F., Osinski, P., Schaumann, P., Schmidt, V., and Weber, C. (2021). Probabilistic forecasting of photovoltaic power supply – A hybrid approach using D-vine copulas to model spatial dependencies. Applied Energy, 304:117599. Schlather, M. (2010). Some covariance models based on normal scale mixtures. Bernoulli, 16(3):780–797. Schlick, C. (1994). An inexpensive BRDF model for physically-based rendering. Computer Graphics Forum, 13(3):233–246. Schultz, M. G., Betancourt, C., Gong, B., Kleinert, F., Langguth, M., Leufen, L. H., Mozaffari, A., and Stadtler, S. (2021). Can deep learning beat numerical weather prediction? Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2194):20200097. Schulz, B., El Ayari, M., Lerch, S., and Baran, S. (2021). Post-processing numerical weather prediction ensembles for probabilistic solar irradiance forecasting. Solar Energy, 220:1016–1031. Schulz, B. and Lerch, S. (2022). Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Monthly Weather Review, 150(1):235– 257. Schulz, J., Albert, P., Behr, H.-D., Caprion, D., Deneke, H., Dewitte, S., D¨urr, B., Fuchs, P., Gratzki, A., Hechler, P., Hollmann, R., Johnston, S., Karlsson, K.-G., Manninen, T., M¨uller, R., Reuter, M., Riihel¨a, A., Roebeling, R., Selbach, N., Tetzlaff, A., Thomas, W., Werscheck, M., Wolters, E., and Zelenka, A. (2009). Operational climate monitoring from space: the EUMETSAT Satellite Application Facility on Climate Monitoring (CM-SAF). Atmospheric Chemistry and Physics, 9(5):1687–1709. Schwartz, C. S., Kain, J. S., Weiss, S. J., Xue, M., Bright, D. R., Kong, F., Thomas, K. W., Levit, J. J., Coniglio, M. C., and Wandishin, M. S. (2010). Toward improved convectionallowing ensembles: Model physics sensitivities and optimizing probabilistic guidance with small ensemble membership. Weather and Forecasting, 25(1):263–280.

References

639

Schwarz, M., Folini, D., Yang, S., Allan, R. P., and Wild, M. (2020). Changes in atmospheric shortwave absorption as important driver of dimming and brightening. Nature Geoscience, 13(2):110–115. Sengupta, M. and Andreas, A. (2010). Oahu Solar Measurement Grid (1-year archive): 1second solar irradiance; Oahu, Hawaii (data). Technical Report NREL/DA-5500-56506, National Renewable Energy Lab.(NREL), Golden, CO (United States). Sengupta, M., Habte, A., Gueymard, C., Wilbert, S., and Renn´e, D. (2017). Best practices handbook for the collection and use of solar resource data for solar energy applications. Technical Report NREL/TP-5D00-68886, National Renewable Energy Lab.(NREL), Golden, CO (United States). Sengupta, M., Habte, A., Wilbert, S., Gueymard, C., and Remund, J. (2021). Best practices handbook for the collection and use of solar resource data for solar energy applications. Technical Report NREL/TP-5D00-77635, National Renewable Energy Lab.(NREL), Golden, CO (United States). Sengupta, M., Xie, Y., Lopez, A., Habte, A., Maclaurin, G., and Shelby, J. (2018). The National Solar Radiation Data Base (NSRDB). Renewable and Sustainable Energy Reviews, 89:51–60. Shafer, M. A., Fiebrich, C. A., Arndt, D. S., Fredrickson, S. E., and Hughes, T. W. (2000). Quality assurance procedures in the Oklahoma Mesonetwork. Journal of Atmospheric and Oceanic Technology, 17(4):474–494. Shao, Z., Pan, Y., Diao, C., and Cai, J. (2019). Cloud detection in remote sensing images based on multiscale features-convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing, 57(6):4062–4076. Shaub, D. (2020). Fast and accurate yearly time series forecasting with forecast combinations. International Journal of Forecasting, 36(1):116–120. Shepero, M., Munkhammar, J., and Wid´en, J. (2019). A generative hidden Markov model of the clear-sky index. Journal of Renewable and Sustainable Energy, 11(4):043703. Shi, H., Li, W., Fan, X., Zhang, J., Hu, B., Husi, L., Shang, H., Han, X., Song, Z., Zhang, Y., Wang, S., Chen, H., and Xia, X. (2018a). First assessment of surface solar irradiance derived from Himawari-8 across China. Solar Energy, 174:164–170. Shi, H., Xu, M., and Li, R. (2018b). Deep learning for household load forecasting–A novel pooling deep rnn. IEEE Transactions on Smart Grid, 9(5):5271–5280. Shi, H., Yang, D., Wang, W., Fu, D., Gao, L., Zhang, J., Hu, B., Shan, Y., Zhang, Y., Bian, Y., Chen, H., and Xia, X. (2023). First estimation of high-resolution solar photovoltaic resource maps over China with Fengyun-4A satellite and machine learning. Renewable and Sustainable Energy Reviews, 184:113549. Shin, Y. and Schmidt, P. (1992). The KPSS stationarity test as a unit root test. Economics Letters, 38(4):387–392. Sieger, R. and Grobe, H. (2013). PanPlot 2 - software to visualize profiles and time series. Singh Doorga, J. R., Dhurmea, K. R., Rughooputh, S., and Boojhawon, R. (2019). Forecasting mesoscale distribution of surface solar irradiation using a proposed hybrid approach combining satellite remote sensing and time series models. Renewable and Sustainable Energy Reviews, 104:69–85. Siuta, D. M. and Stull, R. B. (2018). Benefits of a multimodel ensemble for hub-height wind prediction in mountainous terrain. Wind Energy, 21(9):783–800. Sjerps-Koomen, E. A., Alsema, E. A., and Turkenburg, W. C. (1996). A simple model for PV module reflection losses under field conditions. Solar Energy, 57(6):421–432. Skartveit, A. and Olseth, J. A. (1986). Modelling slope irradiance at high latitudes. Solar Energy, 36(4):333–344.

640

References

Skoplaki, E. and Palyvos, J. A. (2009a). On the temperature dependence of photovoltaic module electrical performance: A review of efficiency/power correlations. Solar Energy, 83(5):614–624. Skoplaki, E. and Palyvos, J. A. (2009b). Operating temperature of photovoltaic modules: A survey of pertinent correlations. Renewable Energy, 34(1):23–29. Sloughter, J. M., Gneiting, T., and Raftery, A. E. (2010). Probabilistic wind speed forecasting using ensembles and Bayesian model averaging. Journal of the American Statistical Association, 105(489):25–35. Smith, C. J., Bright, J. M., and Crook, R. (2017). Cloud cover effect of clear-sky index distributions and differences between human and automatic cloud observations. Solar Energy, 144:10–21. Smith, J. and Wallis, K. F. (2009). A simple explanation of the forecast combination puzzle. Oxford Bulletin of Economics and Statistics, 71(3):331–355. Smith, R. (2006). Peer review: A flawed process at the heart of science and journals. Journal of the royal society of medicine, 99(4):178–182. Sobhani, M., Hong, T., and Martin, C. (2020). Temperature anomaly detection for electric load forecasting. International Journal of Forecasting, 36(2):324–333. Sobri, S., Koohi-Kamali, S., and Rahim, N. A. (2018). Solar photovoltaic generation forecasting methods: A review. Energy Conversion and Management, 156:459–497. Soden, B. J. and Held, I. M. (2006). An assessment of climate feedbacks in coupled ocean– atmosphere models. Journal of Climate, 19(14):3354–3360. Sørensen, M. L., Nystrup, P., Bjerreg˚ard, M. B., Møller, J. K., Bacher, P., and Madsen, H. (2023). Recent developments in multivariate wind and solar power forecasting. WIREs Energy and Environment, 12(2):e465. Souka, A. F. and Safwat, H. H. (1966). Determination of the optimum orientations for the double-exposure, flat-plate collector and its reflectors. Solar Energy, 10(4):170–174. Souza, A. P. d. and Escobedo, J. F. (2013). Estimates of hourly diffuse radiation on tilted surfaces in Southeast of Brazil. International Journal of Renewable Energy Research, 3(1):207–221. Spade, P. V. and Panaccio, C. (2019). William of Ockham. In Zalta, E. N., editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2019 edition. Sperati, S., Alessandrini, S., and Delle Monache, L. (2016). An application of the ECMWF Ensemble Prediction System for short-term solar power forecasting. Solar Energy, 133:437–450. Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data. Basic Books. Spiliotis, E., Abolghasemi, M., Hyndman, R. J., Petropoulos, F., and Assimakopoulos, V. (2021). Hierarchical forecast reconciliation with machine learning. Applied Soft Computing, 112:107756. Starke, A. R., Lemos, L. F. L., Barni, C. M., Machado, R. D., Cardemil, J. M., Boland, J., and Colle, S. (2021). Assessing one-minute diffuse fraction models based on worldwide climate features. Renewable Energy, 177:700–714. Starke, A. R., Lemos, L. F. L., Boland, J., Cardemil, J. M., and Colle, S. (2018). Resolution of the cloud enhancement problem for one-minute diffuse radiation prediction. Renewable Energy, 125:472–484. Stasinopoulos, M. D. and Rigby, R. A. (2007). Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, 23(7):1–46. Stasinopoulos, M. D. and Rigby, R. A. (2022). gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape. R package version 6.0-3.

References

641

Stasinopoulos, M. D., Rigby, R. A., Heller, G. Z., Voudouris, V., and De Bastiani, F. (2017). Flexible Regression and Smoothing: Using GAMLSS in R. CRC Press. Stein, M. L. (2005). Space–time covariance functions. Journal of the American Statistical Association, 100(469):310–321. Stensrud, D. J. (2007). Parameterization Schemes: Keys to Understanding Numerical Weather Prediction Models. Cambridge University Press. Stephenson, D. B., Coelho, C. A. S., and Jolliffe, I. T. (2008). Two extra components in the brier score decomposition. Weather and Forecasting, 23(4):752–757. Steven, M. D. and Unsworth, M. H. (1977). Standard distributions of clear sky radiance. Quarterly Journal of the Royal Meteorological Society, 103(437):457–465. Steven, M. D. and Unsworth, M. H. (1979). The diffuse solar irradiance of slopes under cloudless skies. Quarterly Journal of the Royal Meteorological Society, 105(445):593– 602. Steven, M. D. and Unsworth, M. H. (1980). The angular distribution and interception of diffuse solar radiation below overcast skies. Quarterly Journal of the Royal Meteorological Society, 106(447):57–61. Stone, M. (1961). The opinion pool. The Annals of Mathematical Statistics, 32(4):1339–1342. Strutt, H. J. W. (1871). On the light from the sky, its polarization and colour. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 41(271):107–120. Stull, R. B. (2015). Practical Meteorology: An Algebra-based Survey of Atmospheric Science. University of British Columbia. Sun, J., Liu, J., Wang, Y., Yuan, H., and Yan, Z. (2023). Development status, policy, and market mechanisms for battery energy storage in the US, China, Australia, and the UK. Journal of Renewable and Sustainable Energy, 15(2):024101. Sun, X., Bright, J. M., Gueymard, C. A., Acord, B., Wang, P., and Engerer, N. A. (2019). Worldwide performance assessment of 75 global clear-sky irradiance models using principal component analysis. Renewable and Sustainable Energy Reviews, 111:550–570. Sun, X., Bright, J. M., Gueymard, C. A., Bai, X., Acord, B., and Wang, P. (2021a). Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance models using principal component analysis. Renewable and Sustainable Energy Reviews, 135:110087. Sun, X., Gnanamuthu, S., Zagade, N., Wang, P., and Bright, J. M. (2021b). Data article: Full disk real-time Himawari-8/9 satellite AHI imagery from JAXA. Journal of Renewable and Sustainable Energy, 13(6):063702. Sun, X., Yang, D., Gueymard, C. A., Bright, J. M., and Wang, P. (2022). Effects of spatial scale of atmospheric reanalysis data on clear-sky surface radiation modeling in tropical climates: A case study for Singapore. Solar Energy, 241:525–537. Susskind, L. and Hrabovsky, G. (2014). Classical Mechanics: The Theoretical Minimum. Penguin Books. Sweeney, C., Bessa, R. J., Browell, J., and Pinson, P. (2020). The future of forecasting for renewable energy. WIREs Energy and Environment, 9(2):e365. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9. Taillardat, M., Mestre, O., Zamo, M., and Naveau, P. (2016). Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Monthly Weather Review, 144(6):2375–2393. Talagrand, O., Vautard, R., and Strauss, B. (1997). Evaluation of probabilistic prediction systems. In ECMWF Workshop on Predictability, pages 1–26, Shinfield Park, Reading. ECMWF.

642

References

Taleb, N. N. (2007). The Black Swan: The Impact of the Highly Improbable, volume 2. Random house. Taleb, N. N. (2020). Foreword to the M4 Competition. International Journal of Forecasting, 36(1):1–2. TamizhMani, G., Ji, L., Tang, Y., Petacci, L., and Osterwald, C. (2003). Photovoltaic module thermal/wind performance: Long-term monitoring and model development for energy rating. Technical Report NREL/CP-520-35645, National Renewable Energy Lab.(NREL), Golden, CO (United States). Tashman, L. J. (2000). Out-of-sample tests of forecasting accuracy: an analysis and review. International Journal of Forecasting, 16(4):437–450. Tastu, J., Pinson, P., and Madsen, H. (2015). Space-time trajectories of wind power generation: Parametrized precision matrices under a Gaussian copula approach. In Antoniadis, A., Poggi, J.-M., and Brossat, X., editors, Modeling and Stochastic Learning for Forecasting in High Dimensions, pages 267–296. Springer. Taylor, J. W. and McSharry, P. E. (2007). Short-term load forecasting methods: An evaluation based on European data. IEEE Transactions on Power Systems, 22(4):2213–2219. Taylor, J. W. and McSharry, P. E. (2017). Univariate methods for short-term load forecasting. In El-Hawary, M. E., editor, Advances in Electric Power and Energy Systems, chapter 2, pages 17–40. John Wiley & Sons. Teague, K. A. and Gallicchio, N. (2017). The Evolution of Meteorology: A Look Into the Past, Present, and Future of Weather Forecasting. John Wiley & Sons. Temps, R. C. and Coulson, K. L. (1977). Solar radiation incident upon slopes of different orientations. Solar Energy, 19(2):179–184. Testa, A., De Caro, S., La Torre, R., and Scimone, T. (2012). A probabilistic approach to size step-up transformers for grid connected PV plants. Renewable Energy, 48:42–51. Thanh Nga, P. T., Ha, P. T., and Hang, V. T. (2021). Satellite-based regionalization of solar irradiation in vietnam by k-means clustering. Journal of Applied Meteorology and Climatology, 60(3):391–402. Theocharides, S., Makrides, G., Livera, A., Theristis, M., Kaimakis, P., and Georghiou, G. E. (2020). Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Applied Energy, 268:115023. ¨ Thomson, M. E., Pollock, A. C., Onkal, D., and G¨on¨ul, M. S. (2019). Combining forecasts: Performance and coherence. International Journal of Forecasting, 35(2):474–484. Thorarinsdottir, T. L. and Gneiting, T. (2010). Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173(2):371–388. Thorey, J., Chaussin, C., and Mallet, V. (2018). Ensemble forecast of photovoltaic power with online CRPS learning. International Journal of Forecasting, 34(4):762–773. Thorey, J., Mallet, V., Chaussin, C., Descamps, L., and Blanc, P. (2015). Ensemble forecast of solar radiation using TIGGE weather forecasts and HelioClim database. Solar Energy, 120:232–243. Thornton, S. (2018). Karl popper. Stanford Encyclopedia of Philosophy. Tian, Y. Q., Davies-Colley, R. J., Gong, P., and Thorrold, B. W. (2001). Estimating solar radiation on slopes of arbitrary aspect. Agricultural and Forest Meteorology, 109(1):67– 74. Tibshirani, R. (2011). The lasso: some novel algorithms and applications. Dept. of Statistics, Purdue. Timmermann, A. (2006). Forecast combinations. In Elliott, G., Granger, C. W. J., and Timmermann, A., editors, Handbook of Economic Forecasting, volume 1, pages 135–196. Elsevier.

References

643

Toreti Scarabelot, L., Arns Rampinelli, G., and Rambo, C. R. (2021). Overirradiance effect on the electrical performance of photovoltaic systems of different inverter sizing factors. Solar Energy, 225:561–568. Torres, J. L., Blas, M. D., and Garc´ıa, A. (2006). New equations for the calculation of the horizon brightness irradiance in the model of Perez. Solar Energy, 80(7):746–750. Torres, J. L., Garc´ıa, A., de Blas, M., Gracia, A., and Illanes, R. (2010). A study of zenith radiance in Pamplona under different sky conditions. Renewable Energy, 35(4):830–838. Torres, J. L., Prieto, E., Garcia, A., De Blas, M., Ramirez, F., and De Francisco, A. (2003). Effects of the model selected for the power curve on the site effectiveness and the capacity factor of a pitch regulated wind turbine. Solar Energy, 74(2):93–102. Tuohy, A., Zack, J., Haupt, S. E., Sharp, J., Ahlstrom, M., Dise, S., Grimit, E., Mohrlen, C., Lange, M., Casado, M. G., Black, J., Marquis, M., and Collier, C. (2015). Solar forecasting: Methods, challenges, and performance. IEEE Power and Energy Magazine, 13(6):50–59. Ullah, A., Amin, A., Haider, T., Saleem, M., and Butt, N. Z. (2020). Investigation of soiling effects, dust chemistry and optimum cleaning schedule for PV modules in Lahore, Pakistan. Renewable Energy, 150:456–468. Uniejewski, B., Marcjasz, G., and Weron, R. (2019). Understanding intraday electricity markets: Variable selection and very short-term price forecasting using lasso. International Journal of Forecasting, 35(4):1533–1547. Uniejewski, B., Nowotarski, J., and Weron, R. (2016). Automated variable selection and shrinkage for day-ahead electricity price forecasting. Energies, 9(8):621. Uniejewski, B., Weron, R., and Ziel, F. (2018). Variance stabilizing transformations for electricity spot price forecasting. IEEE Transactions on Power Systems, 33(2):2219–2229. Uppala, S. M., K˚allberg, P. W., Simmons, A. J., Andrae, U., Bechtold, V. D. C., Fiorino, M., Gibson, J. K., Haseler, J., Hernandez, A., Kelly, G. A., Li, X., Onogi, K., Saarinen, S., Sokka, N., Allan, R. P., Andersson, E., Arpe, K., Balmaseda, M. A., Beljaars, A. C. M., Berg, L. V. D., Bidlot, J., Bormann, N., Caires, S., Chevallier, F., Dethof, A., Dragosavac, M., Fisher, M., Fuentes, M., Hagemann, S., H´olm, E., Hoskins, B. J., Isaksen, L., Janssen, P. A. E. M., Jenne, R., Mcnally, A. P., Mahfouf, J.-F., Morcrette, J.-J., Rayner, N. A., Saunders, R. W., Simon, P., Sterl, A., Trenberth, K. E., Untch, A., Vasiljevic, D., Viterbo, P., and Woollen, J. (2005). The ERA-40 re-analysis. Quarterly Journal of the Royal Meteorological Society, 131(612):2961–3012. Urraca, R., Sanz-Garcia, A., and Sanz-Garcia, I. (2020). BQC: A free web service to quality control solar irradiance measurements across Europe. Solar Energy, 211:1–10. Utrillas, M. P. and Martinez-Lozano, J. A. (1994). Performance evaluation of several versions of the Perez tilted diffuse irradiance model. Solar Energy, 53(2):155–162. Valerino, M., Bergin, M., Ghoroi, C., Ratnaparkhi, A., and Smestad, G. P. (2020). Low-cost solar PV soiling sensor validation and size resolved soiling impacts: A comprehensive field study in Western India. Solar Energy, 204:307–315. Valor, E., Puchades, J., Nicl`os, R., Galve Romero, J. M., Lacave, O., and Puig, P. (2023). Determination and evaluation of surface solar irradiance with the MAGIC-Heliosat method adapted to MTSAT-2/Imager and Himawari-8/AHI sensors. IEEE Transactions on Geoscience and Remote Sensing, 61:4400519. van der Meer, D. (2021). A benchmark for multivariate probabilistic solar irradiance forecasts. Solar Energy, 225:286–296. van der Meer, D., Yang, D., Wid´en, J., and Munkhammar, J. (2020). Clear-sky index spacetime trajectories from probabilistic solar forecasts: Comparing promising copulas. Journal of Renewable and Sustainable Energy, 12(2):026102.

644

References

Varga, N. and Mayer, M. J. (2021). Model-based analysis of shading losses in ground-mounted photovoltaic power plants. Solar Energy, 216:428–438. Verbois, H., Huva, R., Rusydi, A., and Walsh, W. (2018). Solar irradiance forecasting in the tropics using numerical weather prediction and statistical learning. Solar Energy, 162:265– 277. Verzijlbergh, R. A., Heijnen, P. W., de Roode, S. R., Los, A., and Jonker, H. J. J. (2015). Improved model output statistics of numerical weather prediction based irradiance forecasts for solar power applications. Solar Energy, 118:634–645. Vignola, F., Grover, C., Lemon, N., and McMahan, A. (2012). Building a bankable solar radiation dataset. Solar Energy, 86(8):2218–2229. Vignola, F., Michalsky, J., and Stoffel, T. (2020). Solar and infrared radiation measurements. CRC press. Vill´an, A. F. (2019). Mastering OpenCV 4 with Python: A practical guide covering topics from image processing, augmented reality to deep learning with OpenCV 4 and Python 3.7. Packt Publishing Ltd. Visser, L., AlSkaif, T., and van Sark, W. (2022). Operational day-ahead solar power forecasting for aggregated PV systems with a varying spatial distribution. Renewable Energy, 183:267–282. von Loeper, F., Schaumann, P., de Langlard, M., Hess, R., B¨asmann, R., and Schmidt, V. (2020). Probabilistic prediction of solar power supply to distribution networks, using forecasts of global horizontal irradiation. Solar Energy, 203:145–156. Voyant, C., Haurant, P., Muselli, M., Paoli, C., and Nivet, M.-L. (2014). Time series modeling and large scale global solar radiation forecasting from geostationary satellites data. Solar Energy, 102:131–142. Voyant, C., Lauret, P., Notton, G., Duchaud, J.-L., Fouilloy, A., David, M., Yaseen, Z. M., and Soubdhan, T. (2021). A monte carlo based solar radiation forecastability estimation. Journal of Renewable and Sustainable Energy, 13(2):026501. Voyant, C., Notton, G., Duchaud, J.-L., Almorox, J., and Yaseen, Z. M. (2020). Solar irradiation prediction intervals based on Box–Cox transformation and univariate representation of periodic autoregressive model. Renewable Energy Focus, 33:43–53. Voyant, C., Notton, G., Duchaud, J.-L., Guti´errez, L. A. G., Bright, J. M., and Yang, D. (2022). Benchmarks for solar radiation time series forecasting. Renewable Energy, 191:747–762. Voyant, C., Notton, G., Kalogirou, S., Nivet, M.-L., Paoli, C., Motte, F., and Fouilloy, A. (2017). Machine learning methods for solar radiation forecasting: A review. Renewable Energy, 105:569 – 582. Wallis, K. F. (2011). Combining forecasts – forty years later. Applied Financial Economics, 21(1–2):33–41. Walsh, J. W. T. (1961). The Science of Daylight. Pitman. Walther, A. and Heidinger, A. K. (2012). Implementation of the daytime cloud optical and microphysical properties algorithm (DCOMP) in PATMOS-x. Journal of Applied Meteorology and Climatology, 51(7):1371–1390. Wang, F., Xuan, Z., Zhen, Z., Li, K., Wang, T., and Shi, M. (2020a). A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework. Energy Conversion and Management, 212:112766. Wang, G., Kurtz, B., and Kleissl, J. (2016). Cloud base height from sky imager and cloud speed sensor. Solar Energy, 131:208–221.

References

645

Wang, G. C., Kurtz, B., Bosch, J. L., de la Parra, ´I., and Kleissl, J. (2020b). Maximum expected ramp rates using cloud speed sensor measurements. Journal of Renewable and Sustainable Energy, 12(5):056302. Wang, G. C., Ratnam, E., Haghi, H. V., and Kleissl, J. (2019a). Corrective receding horizon EV charge scheduling using short-term solar forecasting. Renewable Energy, 130:1146– 1158. Wang, J., Yuan, Q., Shen, H., Liu, T., Li, T., Yue, L., Shi, X., and Zhang, L. (2020c). Estimating snow depth by combining satellite data and ground-based observations over Alaska: A deep learning approach. Journal of Hydrology, 585:124828. Wang, L., Zhang, Z., and Chen, J. (2017a). Short-term electricity price forecasting with stacked denoising autoencoders. IEEE Transactions on Power Systems, 32(4):2673–2681. Wang, P., van Westrhenen, R., Meirink, J. F., van der Veen, S., and Knap, W. (2019b). Surface solar radiation forecasts by advecting cloud physical properties derived from Meteosat Second Generation observations. Solar Energy, 177:47–58. Wang, W., Yang, D., Hong, T., and Kleissl, J. (2022). An archived dataset from the ECMWF Ensemble Prediction System for probabilistic solar power forecasting. Solar Energy, 248:64–75. Wang, X. and Bishop, C. H. (2005). Improvement of ensemble reliability with a new dressing kernel. Quarterly Journal of the Royal Meteorological Society, 131(607):965–986. Wang, X., Smith, K., and Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery, 13(3):335–364. Wang, Y., Chen, J., Chen, X., Zeng, X., Kong, Y., Sun, S., Guo, Y., and Liu, Y. (2021). Shortterm load forecasting for industrial customers based on TCN-LightGBM. IEEE Transactions on Power Systems, 36(3):1984–1997. Wang, Y., Hu, Q., Li, L., Foley, A. M., and Srinivasan, D. (2019c). Approaches to wind power curve modeling: A review and discussion. Renewable and Sustainable Energy Reviews, 116:109422. Wang, Y., Zhang, N., Chen, Q., Yang, J., Kang, C., and Huang, J. (2017b). Dependent discrete convolution based probabilistic load flow for the active distribution system. IEEE Transactions on Sustainable Energy, 8(3):1000–1009. Wang, Y., Zhang, N., Tan, Y., Hong, T., Kirschen, D. S., and Kang, C. (2019d). Combining probabilistic load forecasts. IEEE Transactions on Smart Grid, 10(4):3664–3674. Wang, Z., Koprinska, I., and Rana, M. (2017c). Solar power forecasting using pattern sequences. In Lintas, A., Rovetta, S., Verschure, P. F. M. J., and Villa, A. E. P., editors, Artificial Neural Networks and Machine Learning – ICANN 2017, pages 486–494, Cham. Springer International Publishing. Wang, Z., Sun, Q., Wang, B., and Zhang, B. (2019e). Purchasing intentions of Chinese consumers on energy-efficient appliances: Is the energy efficiency label effective? Journal of Cleaner Production, 238:117896. Warner, T. T. (2010). Numerical Weather and Climate Prediction. Cambridge University Press. Wasserman, L. (2006). All of Nonparametric Statistics. Springer Science & Business Media. Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference. Springer Science & Business Media. Watanabe, T. and Nohara, D. (2019). Prediction of time series for several hours of surface solar irradiance using one-granule cloud property data from satellite observations. Solar Energy, 186:113–125.

646

References

Watanabe, T., Takenaka, H., and Nohara, D. (2021). Post-processing correction method for surface solar irradiance forecast data from the numerical weather model using geostationary satellite observation data. Solar Energy, 223:202–216. Weron, R. (2014). Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting, 30(4):1030–1081. Weron, R. and Ziel, F. (2019). Electricity price forecasting. In Soytas¸, U. and Sarı, R., editors, Routledge Handbook of Energy Economics. Routledge. Weyn, J. A., Durran, D. R., and Caruana, R. (2019). Can machines learn to predict weather? Using deep learning to predict gridded 500-hpa geopotential height from historical weather data. Journal of Advances in Modeling Earth Systems, 11(8):2680–2693. Wickramasuriya, S. L., Athanasopoulos, G., and Hyndman, R. J. (2019). Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. Journal of the American Statistical Association, 114(526):804–819. Wikle, C. K., Zammit-Mangion, A., and Cressie, N. (2019). Spatio-Temporal Statistics with R. Chapman and Hall/CRC. Wild, M. (2009). Global dimming and brightening: A review. Journal of Geophysical Research: Atmospheres, 114:D00D16. Wild, M. (2016). Decadal changes in radiative fluxes at land and ocean surfaces and their relevance for global warming. WIREs Climate Change, 7(1):91–107. Wilks, D. S. (2019). Chapter 9 - Forecast verification. In Wilks, D. S., editor, Statistical Methods in the Atmospheric Sciences (Fourth Edition), pages 369–483. Elsevier, fourth edition edition. Willis, H. L. (2002). Spatial Electric Load Forecasting. CRC Press. Willis, H. L. and Northcote-Green, J. E. D. (1983). Spatial electric load forecasting: A tutorial review. Proceedings of the IEEE, 71(2):232–253. Willmott, C. J. (1982). On the climatic optimization of the tilt and azimuth of flat-plate solar collectors. Solar Energy, 28(3):205–216. Wilson, D. R. and Ballard, S. P. (1999). A microphysically based precipitation scheme for the UK meteorological office unified model. Quarterly Journal of the Royal Meteorological Society, 125(557):1607–1636. Winkler, R. L. (1968). The consensus of subjective probability distributions. Management Science, 15(2):B61–B75. Winkler, R. L. (1972). A decision-theoretic approach to interval estimation. Journal of the American Statistical Association, 67(337):187–191. Winkler, R. L., Grushka-Cockayne, Y., Lichtendahl, K. C., and Jose, V. R. R. (2019). Probability forecasts and their combination: A research perspective. Decision Analysis, 16(4):239– 260. Wolff, B., K¨uhnert, J., Lorenz, E., Kramer, O., and Heinemann, D. (2016). Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Solar Energy, 135:197–208. Wu, E., Clemesha, R. E. S., and Kleissl, J. (2018). Coastal stratocumulus cloud edge forecasts. Solar Energy, 164:355–369. Wu, Q., Guan, F., Lv, C., and Huang, Y. (2021). Ultra-short-term multi-step wind power forecasting based on CNN-LSTM. IET Renewable Power Generation, 15(5):1019–1029. Wu, X., Zhu, X., Wu, G.-Q., and Ding, W. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1):97–107. Wu, Y.-Y., Wu, S.-Y., and Xiao, L. (2017). Numerical study on convection heat transfer from inclined PV panel under windy environment. Solar Energy, 149:1–12.

References

647

Xian, D., Zhang, P., Gao, L., Sun, R., Zhang, H., and Jia, X. (2021). Fengyun meteorological satellite products for earth system science applications. Advances in Atmospheric Sciences, 38(8):1267–1284. Xie, J., Chen, Y., Hong, T., and Laing, T. D. (2018a). Relative humidity for load forecasting models. IEEE Transactions on Smart Grid, 9(1):191–198. Xie, J. and Hong, T. (2017). Wind speed for load forecasting models. Sustainability, 9(5):795. Xie, J. and Hong, T. (2018a). Temperature scenario generation for probabilistic load forecasting. IEEE Transactions on Smart Grid, 9(3):1680–1687. Xie, J. and Hong, T. (2018b). Variable selection methods for probabilistic load forecasting: Empirical evidence from seven states of the United States. IEEE Transactions on Smart Grid, 9(6):6039–6046. Xie, Y., Sengupta, M., and Dooraghi, M. (2018b). Assessment of uncertainty in the numerical simulation of solar irradiance over inclined PV panels: New algorithms using measurements and modeling tools. Solar Energy, 165:55–64. Xie, Y., Sengupta, M., and Dudhia, J. (2016). A Fast All-sky Radiation Model for solar applications (FARMS): Algorithm and performance evaluation. Solar Energy, 135:435– 445. Xie, Y., Sengupta, M., Habte, A., and Andreas, A. (2022). The “Fresnel Equations” for Diffuse radiation on Inclined photovoltaic Surfaces (FEDIS). Renewable and Sustainable Energy Reviews, 161:112362. Xu, L., Wang, S., and Tang, R. (2019). Probabilistic load forecasting for buildings considering weather forecasting uncertainty and uncertain peak load. Applied Energy, 237:180–195. Yagli, G. M., Yang, D., Gandhi, O., and Srinivasan, D. (2020a). Can we justify producing univariate machine-learning forecasts with satellite-derived solar irradiance? Applied Energy, 259:114122. Yagli, G. M., Yang, D., and Srinivasan, D. (2019a). Automatic hourly solar forecasting using machine learning models. Renewable and Sustainable Energy Reviews, 105:487–498. Yagli, G. M., Yang, D., and Srinivasan, D. (2019b). Reconciling solar forecasts: Sequential reconciliation. Solar Energy, 179:391–397. Yagli, G. M., Yang, D., and Srinivasan, D. (2020b). Ensemble solar forecasting using data-driven models with probabilistic post-processing through GAMLSS. Solar Energy, 208:612–622. Yagli, G. M., Yang, D., and Srinivasan, D. (2020c). Reconciling solar forecasts: Probabilistic forecasting with homoscedastic Gaussian errors on a geographical hierarchy. Solar Energy, 210:59–67. Yagli, G. M., Yang, D., and Srinivasan, D. (2022). Ensemble solar forecasting and postprocessing using dropout neural network and information from neighboring satellite pixels. Renewable and Sustainable Energy Reviews, 155:111909. Yan, R., Marais, B., and Saha, T. K. (2014). Impacts of residential photovoltaic power fluctuation on on-load tap changer operation and a solution using DSTATCOM. Electric Power Systems Research, 111:185–193. Yang, D. (2016). Solar radiation on inclined surfaces: Corrections and benchmarks. Solar Energy, 136:288–302. Yang, D. (2018a). A correct validation of the National Solar Radiation Data Base (NSRDB). Renewable and Sustainable Energy Reviews, 97:152–155. Yang, D. (2018b). Kriging for NSRDB PSM version 3 satellite-derived solar irradiance. Solar Energy, 171:876–883. Yang, D. (2018c). SolarData: An R package for easy access of publicly available solar datasets. Solar Energy, 171:A3–A12.

648

References

Yang, D. (2018d). Ultra-fast preselection in lasso-type spatio-temporal solar forecasting problems. Solar Energy, 176:788–796. Yang, D. (2019a). A guideline to solar forecasting research practice: Reproducible, operational, probabilistic or physically-based, ensemble, and skill (ROPES). Journal of Renewable and Sustainable Energy, 11(2):022701. Yang, D. (2019b). Making reference solar forecasts with climatology, persistence, and their optimal convex combination. Solar Energy, 193:981–985. Yang, D. (2019c). On post-processing day-ahead NWP forecasts using Kalman filtering. Solar Energy, 182:179–181. Yang, D. (2019d). Post-processing of NWP forecasts using ground or satellite-derived data through kernel conditional density estimation. Journal of Renewable and Sustainable Energy, 11(2):026101. Yang, D. (2019e). SolarData package update v1.1: R functions for easy access of Baseline Surface Radiation Network (BSRN). Solar Energy, 188:970–975. Yang, D. (2019f). Standard of reference in operational day-ahead deterministic solar forecasting. Journal of Renewable and Sustainable Energy, 11(5):053702. Yang, D. (2019g). Ultra-fast analog ensemble using kd-tree. Journal of Renewable and Sustainable Energy, 11(5):053703. Yang, D. (2019h). A universal benchmarking method for probabilistic solar irradiance forecasting. Solar Energy, 184:410–416. Yang, D. (2020a). Choice of clear-sky model in solar forecasting. Journal of Renewable and Sustainable Energy, 12(2):026101. Yang, D. (2020b). Comment: Operational aspects of solar forecasting. Solar Energy, 210:38– 40. Yang, D. (2020c). Ensemble model output statistics as a probabilistic site-adaptation tool for satellite-derived and reanalysis solar irradiance. Journal of Renewable and Sustainable Energy, 12(1):016102. Yang, D. (2020d). Ensemble model output statistics as a probabilistic site-adaptation tool for solar irradiance: A revisit. Journal of Renewable and Sustainable Energy, 12(3):036101. Yang, D. (2020e). Quantifying the spatial scale mismatch between satellite-derived solar irradiance and in situ measurements: A case study using CERES synoptic surface shortwave flux and the Oklahoma Mesonet. Journal of Renewable and Sustainable Energy, 12(5):056104. Yang, D. (2020f). Reconciling solar forecasts: Probabilistic forecast reconciliation in a nonparametric framework. Solar Energy, 210:49–58. Yang, D. (2021a). Temporal-resolution cascade model for separation of 1-min beam and diffuse irradiance. Journal of Renewable and Sustainable Energy, 13(5):056101. Yang, D. (2021b). Validation of the 5-min irradiance from the National Solar Radiation Database (NSRDB). Journal of Renewable and Sustainable Energy, 13(1):016101. Yang, D. (2022a). Correlogram, predictability error growth, and bounds of mean square error of solar irradiance forecasts. Renewable and Sustainable Energy Reviews, 167:112736. Yang, D. (2022b). Estimating 1-min beam and diffuse irradiance from the global irradiance: A review and an extensive worldwide comparison of latest separation models at 126 stations. Renewable and Sustainable Energy Reviews, 159:112195. Yang, D. and Alessandrini, S. (2019). An ultra-fast way of searching weather analogs for renewable energy forecasting. Solar Energy, 185:255–261. Yang, D., Alessandrini, S., Antonanzas, J., Antonanzas-Torres, F., Badescu, V., Beyer, H. G., ˆ Gueymard, Blaga, R., Boland, J., Bright, J. M., Coimbra, C. F. M., David, M., Frimane, A., C. A., Hong, T., Kay, M. J., Killinger, S., Kleissl, J., Lauret, P., Lorenz, E., van der Meer,

References

649

D., Paulescu, M., Perez, R., Perpi˜na´ n-Lamigueiro, O., Peters, I. M., Reikard, G., Renn´e, D., Saint-Drenan, Y.-M., Shuai, Y., Urraca, R., Verbois, H., Vignola, F., Voyant, C., and Zhang, J. (2020a). Verification of deterministic solar forecasts. Solar Energy, 210:20–37. Yang, D. and Boland, J. (2019). Satellite-augmented diffuse solar radiation separation models. Journal of Renewable and Sustainable Energy, 11(2):023705. Yang, D. and Bright, J. M. (2020). Worldwide validation of 8 satellite-derived and reanalysis solar radiation products: A preliminary evaluation and overall metrics for hourly data over 27 years. Solar Energy, 210:3–19. Yang, D. and Dong, Z. (2018). Operational photovoltaics power forecasting using seasonal time series ensemble. Solar Energy, 166:529–541. Yang, D., Dong, Z., Lim, L. H. I., and Liu, L. (2017a). Analyzing big time series data in solar engineering using features and PCA. Solar Energy, 153:317–328. Yang, D., Dong, Z., Reindl, T., Jirutitijaroen, P., and Walsh, W. M. (2014a). Solar irradiance forecasting using spatio-temporal empirical kriging and vector autoregressive models with parameter shrinkage. Solar Energy, 103:550–562. Yang, D., Goh, G. S. W., Jiang, S., and Zhang, A. N. (2016). Spatial data dimension reduction using quadtree: A case study on satellite-derived solar radiation. In 2016 IEEE International Conference on Big Data (Big Data), pages 3807–3812. Yang, D., Gu, C., Dong, Z., Jirutitijaroen, P., Chen, N., and Walsh, W. M. (2013a). Solar irradiance forecasting using spatial-temporal covariance structures and time-forward kriging. Renewable Energy, 60:235–245. Yang, D. and Gueymard, C. A. (2019). Producing high-quality solar resource maps by integrating high- and low-accuracy measurements using Gaussian processes. Renewable and Sustainable Energy Reviews, 113:109260. Yang, D. and Gueymard, C. A. (2020). Ensemble model output statistics for the separation of direct and diffuse components from 1-min global irradiance. Solar Energy, 208:591–603. Yang, D. and Gueymard, C. A. (2021a). Probabilistic merging and verification of monthly gridded aerosol products. Atmospheric Environment, 247:118146. Yang, D. and Gueymard, C. A. (2021b). Probabilistic post-processing of gridded atmospheric variables and its application to site adaptation of shortwave solar radiation. Solar Energy, 225:427–443. Yang, D., Jirutitijaroen, P., and Walsh, W. M. (2012). Hourly solar irradiance time series forecasting using cloud cover index. Solar Energy, 86(12):3531–3543. Yang, D. and Kleissl, J. (2023). Summarizing ensemble NWP forecasts for grid operators: Consistency, elicitability, and economic value. International Journal of Forecasting, 39(4):1640–1654. Yang, D., Kleissl, J., Gueymard, C. A., Pedro, H. T. C., and Coimbra, C. F. M. (2018a). History and trends in solar irradiance and PV power forecasting: A preliminary assessment and review using text mining. Solar Energy, 168:60–101. Yang, D., Li, W., Yagli, G. M., and Srinivasan, D. (2021a). Operational solar forecasting for grid integration: Standards, challenges, and outlook. Solar Energy, 224:930–937. Yang, D. and Liu, L. (2020). Solar project financing, bankability, and resource assessment. In Gandhi, O. and Srinivasan, D., editors, Sustainable Energy Solutions for Remote Areas in the Tropics, pages 179–211. Springer. Yang, D. and Perez, R. (2019). Can we gauge forecasts using satellite-derived solar irradiance? Journal of Renewable and Sustainable Energy, 11(2):023704. Yang, D., Quan, H., Disfani, V. R., and Liu, L. (2017b). Reconciling solar forecasts: Geographical hierarchy. Solar Energy, 146:276–286.

650

References

Yang, D., Quan, H., Disfani, V. R., and Rodr´ıguez-Gallegos, C. D. (2017c). Reconciling solar forecasts: Temporal hierarchy. Solar Energy, 158:332–346. Yang, D., Sharma, V., Ye, Z., Lim, L. I., Zhao, L., and Aryaputera, A. W. (2015a). Forecasting of global horizontal irradiance by exponential smoothing, using decompositions. Energy, 81:111–119. Yang, D. and van der Meer, D. (2021). Post-processing in solar forecasting: Ten overarching thinking tools. Renewable and Sustainable Energy Reviews, 140:110735. Yang, D., van der Meer, D., and Munkhammar, J. (2020b). Probabilistic solar forecasting benchmarks on a standardized dataset at Folsom, California. Solar Energy, 206:628–639. Yang, D., Walsh, W. M., Dong, Z., Jirutitijaroen, P., and Reindl, T. G. (2013b). Block matching algorithms: Their applications and limitations in solar irradiance forecasting. Energy Procedia, 33:335–342. PV Asia Pacific Conference 2012. Yang, D., Wang, W., Bright, J. M., Voyant, C., Notton, G., Zhang, G., and Lyu, C. (2022a). Verifying operational intra-day solar forecasts from ECMWF and NOAA. Solar Energy, 236:743–755. Yang, D., Wang, W., Gueymard, C. A., Hong, T., Kleissl, J., Huang, J., Perez, M. J., Perez, R., Bright, J. M., Xia, X., van der Meer, D., and Peters, I. M. (2022b). A review of solar forecasting, its dependence on atmospheric sciences and implications for grid integration: Towards carbon neutrality. Renewable and Sustainable Energy Reviews, 161:112348. Yang, D., Wang, W., and Hong, T. (2022c). A historical weather forecast dataset from the European Centre for Medium-Range Weather Forecasts (ECMWF) for energy forecasting. Solar Energy, 232:263–274. Yang, D., Wang, W., and Xia, X. (2022d). A concise overview on solar resource assessment and forecasting. Advances in Atmospheric Sciences, 39(8):1239–1251. Yang, D., Wu, E., and Kleissl, J. (2019). Operational solar forecasting for the real-time market. International Journal of Forecasting, 35(4):1499–1519. Yang, D., Yagli, G. M., and Quan, H. (2018b). Quality control for solar irradiance data. In 2018 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia), pages 208–213. Yang, D., Yagli, G. M., and Srinivasan, D. (2022e). Sub-minute probabilistic solar forecasting for real-time stochastic simulations. Renewable and Sustainable Energy Reviews, 153:111736. Yang, D., Yang, G., and Liu, B. (2023a). Combining quantiles of calibrated solar forecasts from ensemble numerical weather prediction. Renewable Energy, 215:118993. Yang, D., Ye, Z., Lim, L. H. I., and Dong, Z. (2015b). Very short term irradiance forecasting using the lasso. Solar Energy, 114:314–326. Yang, D., Ye, Z., Nobre, A. M., Du, H., Walsh, W. M., Lim, L. I., and Reindl, T. (2014b). Bidirectional irradiance transposition based on the Perez model. Solar Energy, 110:768– 780. Yang, D. and Zhang, A. N. (2019). Impact of information sharing and forecast combination on fast-moving-consumer-goods demand forecast accuracy. Information, 10(8):260. Yang, G., Yang, D., Lyu, C., Wang, W., Huang, N., Kleissl, J., Perez, M. J., Perez, R., and Srinivasan, D. (2023b). Implications of future price trends and interannual resource uncertainty on firm solar power delivery with photovoltaic overbuilding and battery storage. IEEE Transactions on Sustainable Energy, 14(4):2036–2048. Yang, H. and Kleissl, J. (2016). Preprocessing WRF initial conditions for coastal stratocumulus forecasting. Solar Energy, 133:180–193. Yang, J., Zhang, Z., Wei, C., Lu, F., and Guo, Q. (2017d). Introducing the new generation of chinese geostationary weather satellites, Fengyun-4. Bulletin of the American Meteorological Society, 98(8):1637–1658.

References

651

Yang, P., Chua, L. H. C., Irvine, K. N., and Imberger, J. (2021b). Radiation and energy budget dynamics associated with a floating photovoltaic system. Water Research, 206:117745. Yang, X., Yang, D., Bright, J. M., Yagli, G. M., and Wang, P. (2021c). On predictability of solar irradiance. Journal of Renewable and Sustainable Energy, 13(5):056501. Yang, Y., Sun, W., Chi, Y., Yan, X., Fan, H., Yang, X., Ma, Z., Wang, Q., and Zhao, C. (2022f). Machine learning-based retrieval of day and night cloud macrophysical parameters over East Asia using Himawari-8 data. Remote Sensing of Environment, 273:112971. Yao, W., Li, Z., Zhao, Q., Lu, Y., and Lu, R. (2015). A new anisotropic diffuse radiation model. Energy Conversion and Management, 95:304–313. Ye, Y. (1987). Interior Algorithms for Linear, Quadratic, and Linearly Constrained NonLinear Programming. PhD thesis, Department of ESS, Stanford University. Yeh, C.-C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., Silva, D. F., Mueen, A., and Keogh, E. (2016). Matrix profile I: All pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pages 1317–1322. Yelchuri, S., Rangaraj, A., Xie, Y., Habte, A., Joshi, M. C., Boopathi, K., Sengupta, M., and Balaraman, K. (2021). A short-term solar forecasting platform using a physics-based smart persistence model and data imputation method. Technical report, National Renewable Energy Lab.(NREL), Golden, CO (United States). Yeom, J.-M., Deo, R. C., Adamowski, J. F., Park, S., and Lee, C.-S. (2020). Spatial mapping of short-term solar radiation prediction incorporating geostationary satellite images coupled with deep convolutional LSTM networks for South Korea. Environmental Research Letters, 15(9):094025. Yeom, J.-M., Park, S., Chae, T., Kim, J.-Y., and Lee, C. S. (2019). Spatial assessment of solar radiation by machine learning and deep neural network models using data provided by the COMS MI geostationary satellite: A case study in South Korea. Sensors, 19(9):2082. You, S., Lim, Y. J., Dai, Y., and Wang, C.-H. (2018). On the temporal modelling of solar photovoltaic soiling: Energy and economic impacts in seven cities. Applied Energy, 228:1136–1146. Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., Tan, W., Yang, Q., Wang, J., Gao, J., and Zhang, L. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241:111716. Yuen, R. A., Baran, S., Fraley, C., Gneiting, T., Lerch, S., Scheuerer, M., and Thorarinsdottir, T. (2018). ensembleMOS: Ensemble Model Output Statistics. R package version 0.8.2. Zagouras, A., Pedro, H. T. C., and Coimbra, C. F. M. (2015). On the role of lagged exogenous variables and spatio-temporal correlations in improving the accuracy of solar forecasting methods. Renewable Energy, 78:203–218. Zammit-Mangion, A. (2020). FRK: Fixed Rank Kriging. R package version 0.2.2.1. Zdunkowski, W., Trautmann, T., and Bott, A. (2007). Radiation in the Atmosphere: A Course in Theoretical Meteorology. Cambridge University Press. Zhang, G., Yang, D., Galanis, G., and Androulakis, E. (2022). Solar forecasting with hourly updated numerical weather prediction. Renewable and Sustainable Energy Reviews, 154:111768. Zhang, H., Kondragunta, S., Laszlo, I., and Zhou, M. (2020). Improving GOES Advanced Baseline Imager (ABI) aerosol optical depth (AOD) retrievals using an empirical bias correction algorithm. Atmospheric Measurement Techniques, 13(11):5955–5975. Zhang, J., Florita, A., Hodge, B.-M., Lu, S., Hamann, H. F., Banunarayanan, V., and Brockway, A. M. (2015a). A suite of metrics for assessing the performance of solar power forecasting. Solar Energy, 111:157–175.

652

References

Zhang, X., Liang, S., Wild, M., and Jiang, B. (2015b). Analysis of surface incident shortwave radiation from four satellite products. Remote Sensing of Environment, 165:186–202. Zhen, Z., Liu, J., Zhang, Z., Wang, F., Chai, H., Yu, Y., Lu, X., Wang, T., and Lin, Y. (2020). Deep learning based surface irradiance mapping model for solar PV power forecasting using sky image. IEEE Transactions on Industry Applications, 56(4):3385–3396. Zhong, X. and Kleissl, J. (2015). Clear sky irradiances using REST2 and MODIS. Solar Energy, 116:144–164. Zhu, Q., Chen, J., Zhu, L., Duan, X., and Liu, Y. (2018). Wind speed prediction with spatio–temporal correlation: A deep learning approach. Energies, 11(4):705. Ziel, F. (2016). Forecasting electricity spot prices using lasso: On capturing the autoregressive intraday structure. IEEE Transactions on Power Systems, 31(6):4977–4987. Ziel, F. and Steinert, R. (2018). Probabilistic mid- and long-term electricity price forecasting. Renewable and Sustainable Energy Reviews, 94:251–266. Ziel, F. and Weron, R. (2018). Day-ahead electricity price forecasting with high-dimensional structures: Univariate vs. multivariate modeling frameworks. Energy Economics, 70:396– 420.

Index β -median, 309 “frozen” NWP model, 69, 176, 327 accessibility, 147, 170, 181 activation function (CNN), 280 advection, 266, 271, 275 aerosol, 241 aerosol optical depth, 148, 256 aggregation consistency, 519 aggregation inconsistency, 522 aggregation scheme, 160 air parcel, 227 albedo snow, 173 surface, 135, 173 albedo enhancement, 110, 447 analog, 321 analysis, 68 ancillary service, 15 anemometer, 223 anisotropy index, 466 ARIMA family of models, 73, 330, 530 association, 382 atmospheric extinction, 104 aureole, 465 autocorrelation, 71, 152 backtrajectory, 273 bagging, 71, 76 bandwidth, 62 bankability, 170, 486 base data, 173 base learner, 76 base method, 6, 221 base rate, 381 basis function, 287 Bayesian model averaging, 344, 436 bias type 1 conditional, 379 type 2 conditional, 381 unconditional, 12, 382 bias–variance trade-off, 77 bimodality, 61 block matching, 269 Boltzmann’s constant, 490, 492 boosting, 77

bootstrapping, 71, 91, 328 block, 71, 531 Bouguer’s law, 477 brightness constancy constraint, 268 calibration, 11, 12, 59, 297, 338, 378, 400 exceedance, 401 marginal, 401 probabilistic, 338 probabilistical, 400 strong, 401 carbon neutrality, 1 chaotic, 52 circularity, 82 circumsolar brightening, 465, 467 circumsolar region, 465 clear-sky index, 61, 107, 136, 260, 266, 341 clearness index, 114, 136, 453 climate change, 1 climatology, 369 external single-valued, 369 internal single-valued, 369 multiple external-valued, 369 multiple single-valued, 369 climatology–persistence combination, 152, 370 cloud condensation nuclei, 239, 241 cloud enhancement, 110, 139, 144, 455 cloud index, 260 cloud motion vector, 133, 266, 273 cloud optical depth, 255 cloud particle size, 256 cloud phase, 255 cloud-top height, 255 coherency, 519 coherent subspace, 529 combining, 10, 70, 296, 313 probabilistic, 11, 297 complementarity, 22 conservation of energy, 226 of mass, 227 of momentum, 226 of water vapor, 227 consistency, 12, 170, 309, 365, 400

653

654

Index

consistency resampling, 431 constant Boltzmann’s, 232 ideal gas, 228 Planck’s, 232 solar, 55, 104 convolution kernel (CNN), 278 correlation function, 285 correlogram, 427, 547 Courant–Friedrichs–Lewy condition, 248 covariance function, 192, 285 cross validation, 80 cultural cognition, 47 cumulative distribution function, 55 curtailment, 549

mixture, 61, 330 nonparametric, 62 parametric, 59 distribution bimodal, 61 divergence, 406 downscaling, 180, 262, 304 aggregate-consistent, 305 dynamical, 305 statistical, 305 temporal, 10, 296 duck curve, 15 dynamic range, 262 dynamical core, 68, 226 dynamical system, 51 dynamics of NWP, 225

data, 169 data (analog ensemble), 323 data assimilation, 68, 221, 248, 249 4D variational, 250 data imputation, 145 decomposition Brier score, 410 CRPS, 416 decorrelation, 547 decorrelation distance, 326 deep learning, 222, 275 demand response, 2, 547 denialism, 31 diagnosis, 223 diagnostic variable, 94 diffuse fraction, 137, 453 diffusion, 275 digital elevation model, 236 discretization (NWP), 243 discrimination, 12, 381 dispatchability, 543 dispersion, 174 distributino binomial, 432 distribution Bart Simpson, 61 beta, 109 exponential family of, 332 Gaussian, 59, 414 generalized extreme value, 424 limiting, 401 location–scale family of, 60 logistic, 60

ECMWF, 67 economic dispatch, 17 effective radius, 256 electronic charge, 490 elicitability, 10, 12, 309 empirical cumulative distribution function, 63, 403 energy internet, 547 energy meteorology, 13 energy storage, 2, 545 ensemble analog, 10, 296 data, 71 daughter, 69 dynamical, 68, 118, 314 hybrid, 69, 314 parameter, 73 persistence, 328 poor man’s, 68, 118 process, 72 stochastic parameterization, 69 ensemble learning, 74 ensemble model output statistics, 339, 435 ensemble variance, 340 eqaution temperature forecast, 228 equation advection–diffusion, 241 air density forecast, 229 closure, 106, 134, 140, 172 continuity, 229 diagnostic, 229 Fresnel, 473

655

Index measurement, 301 moisture forecast, 229 momentum, 94, 227, 243 optical flow, 268 primitive, 22, 226 prognostic, 229 state, 301 thermodynamic, 228 transposition, 114, 135, 462 water vapor, 229 wind forecast, 227 equation of state, 229 error base forecast, 524 coherency, 524 mean absolute, 12, 389, 414 mean absolute percentage, 90 mean bias, 12, 389 reconciled forecast, 524 reconciliation, 524 root mean square, 12, 389 ETS family of models, 73, 330, 530 event nominal, 413 ordinal, 413 expectation–maximization algorithm, 291, 344 expenditure capital, 3, 552 operational, 3, 552 exploratory analysis, 169 factorization calibration–refinement, 378 likelihood–base rate, 378 falsifiability, 34, 181, 364 Faraday’s law, 503 Farneb¨ack algorithm, 269 feature map (CNN), 278 filtering, 10, 295, 301 firm forecasting premium, 551 firm generation, 544 firm kWh premium, 551 firm power enabler, 545 first law of thermodynamics, 228 flattening (CNN), 280 flexible resources, 17 flexibly dispersive, 354 force apparent, 227

centrifugal, 227 Coriolis, 227, 246 friction, 228 gravitational, 239 real, 227 forecast base (hierarchical forecasting), 520 component, 67 composite, 70 deterministic, 9, 51 distributional, 9 ensemble, 9, 67 interval, 9, 66 judgmental, 70 probabilistic, 9, 54 quantile, 9, 65 reconciled (hierarchical forecasting), 520 forecast horizon, 164 forecast lead time, 165 forecast quality, 12, 33 forecast resolution, 165 forecast span, 165 forecast update rate, 165 forecasting data-driven, 7, 283 energy, 4, 21 firm, 25, 544 hierarchical, 25, 119, 196, 306 load, 87 price, 98 satellite-based, 252 solar, 6 wind, 93 forward price, 98 Gaussian grid, 208, 243 geographical smoothing, 546 geostatistics, 284 grid integration, 2 ground coverage ratio, 496 hedging, 12, 408 heliocentrism, 103 Heliosat, 200 hierarchy temporal, 165 homoscedasticity, 299 horizon band, 465 horizon brightening, 465, 467

656 hygrometer, 223 hyper-parameters, 116 ideal gas law, 229 illusion of explanatory depth, 75 inception module (CNN), 280 incidence angle, 135, 452 incidence angle modifier coefficient, 475 indicator function, 63 information criterion, 74 instability, 153 internet of things, 4 intuition pump, 26 inverse problem, 305 invertibility (optical flow), 269 irradiance beam horizontal, 134 beam normal, 134, 172, 257 clear-sky, 103, 136 clear-sky beam normal, 105 clear-sky diffuse horizontal, 106 clear-sky global horizontal, 106 diffuse horizontal, 134, 172 effective, 475 extraterrestial, 103, 257 extraterrestrial, 136 global horizontal, 134, 172 global tilted, 114, 135, 172 plane-of-array, 135 isotropy, 462 K¨oppen–Geiger climate classification, 177, 189 Kalman filter, 301 kernel, 62 kernel conditional density estimation, 384 kernel density estimation, 422 Kirchhoff’s current law, 489 kriging, 131, 285 fixed-rank, 287 large eddy simulation, 246 levelized cost of electricity, 551 likelihood, 380 limited-area model, 180 linear pool beta-transformed, 357 spread-adjusted, 356 traditional, 353

Index link function (GAMLSS), 332 Linke turbidity factor, 148 load following, 17 local stationarity, 155 look-up table, 150, 200, 234, 253 loss Huber, 336, 346 pinball, 101, 334 Lucas–Kanade method, 269 lumpiness, 153 market balancing, 100 day-ahead, 4, 98, 166 intra-day, 4 real-time, 4, 98, 166 two-settlement, 15 Markov assumption, 289 median filter, 304 meteorology, 222 method finite-difference, 243 spectral, 245 method of dressing, 11, 69, 296 best-member, 343 weighted-member, 343 minimum variance unbiased estimator, 525 minimum-trace reconciliation theorem, 526 model clear-sky, 21, 146, 201, 256, 444 reflection loss, 473 relative transmittance, 473 separation, 114, 135, 262, 453 transposition, 135, 462 model chain, 13, 442, 450 model identification, 73 model output statistics, 299 motivated reasoning, 46 MSE decomposition bias–variance, 319, 382 conditioning on forecast, 383 conditioning on observation, 383 MSE scaling, 147 multi-modeling, 68 multiple aggregation, 165 multiple-scenario projection, 85 Murphy–Winkler framework, 57, 175, 364 nameplate power, 488

657

Index neocognitron, 275 net load, 16 neural network convolutional, 97, 275 recurrent, 275 Newton’s second law, 227 numerical instability (NWP), 247 numerical weather prediction, 225 NWP—wavenumber, 247 Occam’s broom, 37, 304 Occam’s razor, 26, 36 octahedral grid, 208, 243 open-circuit voltage, 481 opinion pool, 70, 348 optical flow, 268 dense, 268 sparse, 268 optimal reconciliation, 523 orography, 95 overbuilding, 549 overbuilding & proactive curtailment, 2, 549 parameterization, 22, 68, 230 peristence, 328 persistence, 369 block, 369 single-valued, 369 physics of NWP, 225 pinball loss, 422 planetary boundary layer, 237 polar orbiter, 173 pooling layer (CNN), 280 post-processing, 9 deterministic-to-deterministic, 10 deterministic-to-probabilistic, 10 probabilistic-to-deterministic, 10 probabilistic-to-probabilistic, 11 potential temperature, 228 power curve model chain, 13, 113 regression, 13, 113 solar, 13, 113, 441 wind, 13, 95 power flow, 18 power system, 4 wiring mode, 18 precipitation, 239 predictability, 95, 112, 177

predictability error growth, 367 predictive density, 57 predictive distribution, 20, 59 probabilistic coherency, 528 probability density function, 55 probability integral transform, 338, 402 probability mass function, 380 prognosis, 223 prognostic variable, 94 propriety, 12, 31, 341, 408, 409, 438 pseudoscience, 33, 34, 181 PV azimuth angle, 452 PV tilt angle, 452 pyranometer, 172 pyrheliometer, 172 quality control, 21, 138, 172 quantile, 65 quantile crossing, 347 quantile regression, 436 lasso-penalized, 335 quantile regression forest, 336 quantile regression neural network, 335 query (analog ensemble), 323 radiation longwave, 232 net, 235 shortwave, 232 surface, 232 radiation modeling, 453 radiative transfer, 104, 150, 199, 232, 234, 252, 253 two-stream, 234 random forest, 336 random process, 284 random variable, 53, 54 continuous, 56 rank histogram, 425 ranking statistics, 458 Rapoport’s rule, 30, 183 reanalysis, 72, 176 receptive field (CNN), 276 reconciliation matrix, 523 reductionism, 32, 170 reference cell, 173 reference frame Eulerian, 228 Lagrangian, 228

658 refinement, 380 reflectance, 261 aerosol, 258 cloud, 258 surface, 258 TOA, 252 top-of-atmosphere, 254 regression, 10, 96, 116, 295, 299 lasso, 335 least absolute deviation, 334 nonhomogeneous Gaussian, 339 probabilistic, 11, 296 regularized, 101 regulation, 17 relative air mass, 467 reliability, 378, 404, 405 reliability diagram, 428 representativeness, 22 representativeness of a dataset, 170 reproducibility, 43, 181 ostensive, 125 verbal, 125 resolution, 338, 380, 404, 405 resolving power, 52 resource assessment, 24 response vector (analog ensemble), 325 retrieval, 173, 253, 263 rule of iterated expectations, 365 Sampson–Guttorp method, 288 scale, 178 microphysics, 239 oktas, 198 spatial, 178, 230 subgrid, 230 time, 178, 243 scale mismatch, 95, 179, 196 scattering Rayleigh, 105 scheduling day-ahead, 16 intra-day, 16 scheme, 69, 230 aerosol and atmospheric chemistry, 241 cloud and precipitation microphysics, 239 convection, 238 cumulus, 239 land surface–atmosphere, 94, 235

Index planetary boundary layer, 237 radiation, 232 score Brier, 409 continuous ranked probability, 413 half Brier, 410 ignorance, 340, 421 interval, 355 logarithmic, 340, 421 probability, 409 quantile, 406, 422 ranked probability, 413 skill, 12, 364, 365 score divergence, 409 scoring function, 308, 366 scoring rule, 12, 341, 407 sensor network, 112, 184, 222 shallow-water approximation, 246 sharpness, 12, 59, 338, 400 short-circuit current, 480 similar-day method, 89 sink, 227 site adaptation, 174, 262, 512 skin effect, 504 skip connection (CNN), 281 sky camera, 111 sky’s brightness, 467 sky’s clearness, 467 smoke grenades, 7, 40, 76 smooth function (GAMLSS), 332 solar azimuth angle, 452 solar elevation angle, 455 solar positioning, 103, 148, 452 solar zenith angle, 452 source, 227 space–time trajectory, 97 spatio-temporal statistical model, 284 component-of-variation, 284 descriptive, 284 dynamical, 284 mixed-effect, 284 spectral band, 253 spectral irradiance, 232 spectrum blackbody, 232 reference, 232 spot price, 98 stacking, 79 staggered grid, 245

659

Index standard test condition, 110, 472 stationarity, 152, 153 statistical functional, 308 stride (CNN), 277 summing matrix, 521 sun–earth distance, 104 sunrise hour, 162 sunset hour, 162 super learner, 79 supersaturation, 239 surface energy budget, 235

turbulence kinetic energy, 237 typical meteorological year, 486 unbiasedness, 524 unconstrained PV power, 546 unit commitment, 17, 99 upscaling, 236 variance deficit, 342 variance stabilization, 102, 304 variogram, 326 vector autoregressive model, 101, 290 verifiability, 34 verification, 11, 58, 95, 173, 363 absolute, 363 comparative, 363 distribution-oriented, 167, 175 measure-oriented, 167, 372

Talagrand diagram, 425 Taylor’s hypothesis, 272 temperature air, 235 ambient, 481 cell, 481 nominal operating cell, 483 water vapor mixing ratio, 229 temperature coefficient, 488 weak learner, 76 temperature profile, 255 weighted power mean inequality, 319 temporal alignment, 160 winsorizing (combining), 315 test k-index, 140 zenith angle, 105, 135 Dickey–Fuller, 154 zero padding (CNN), 277 Diebold–Mariano, 92, 458, 537 extremely rare limits, 140 Komogorov–Smirnov (KS), 154 Kwiatkowski–Phillips–Schmidt–Shin, 154 physically possible limits, 139 three-component closure, 140 the “novel” operator, 40, 76 thermometer, 223 thinking tools, 26 three-step search algorithm, 271 threshold, 55, 348, 401 total derivative, 228 training matrix (analog ensemble), 325 transmittance, 104, 257 clear-sky atmosphere, 258 cloud, 258 diffuse, 137 direct, 137, 466 relative, 474 transposition factor diffuse, 135 ground reflection, 135 trimming (combining), 315 turbulence, 237