Information and Communication Technologies for Development Evaluation 0367137143, 9780367137144, 9780429028236

Written by a team of expert practitioners at the Independent Office of Evaluation of International Fund for Agricultural

367 69 9MB

English Pages 173 Year 2019

Table of contents :
Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
List of Figures, Tables and Boxes
List of Contributors
Acknowledgements
Introduction
The Challenge of Evaluating the Sustainable Development Goals
ICTs with Promise for Evaluators
Privacy, Equity and Biases
ICT4Eval – The Publication
1 Evaluation and the Sustainable Development Goals: Opportunities and Constraints
The Greatest Opportunity and the Greatest Challenge for the Global Evaluation Community
How Do We Assess Whether Development Interventions Are Relevant, and Are Having an Impact in Decreasing Inequality and Improving the Welfare of the Worst-Off Groups?
How Do We Carry Out Evaluation Given the Complexity of the SDGs?
How Can We Take Advantage of New Technologies to Address New Challenges?
How Can We Strengthen the Capacities of Governments, Civil Society Organizations, and Parliamentarians to Evaluate Whether Interventions Are Having Equitable Outcomes for Marginalized Populations?
Individual and Institutional Evaluation Capacities Enabled by a Supportive Environment
Fostering Demand for and Supply of Evaluation
Conclusion
2 Information and Communication Technologies for Evaluation (ICT4Eval): Theory and Practice
Data Collection: Faster, Cheaper, More Accurate
Data Analysis: The Machine Learning Revolution
Dissemination and Learning: Reaching a Global Audience
ICTs in Practice – the Case for Cautious Optimism
Sunk Costs vs. an Opportunity
Do It Yourself vs. Outsourcing
Mainstreaming ICTs into Operations
Evaluation 2.0: Turning Dilemmas to Dividends?
3 Big Data Analytics and Development Evaluation: Optimism and Caution
Some Themes from the Big Data Literature
Demystifying Big Data
Defining Big Data and NIT
Big Data and Data Analytics
The Data Continuum
The NIT Ecology and the Linkages to Development Evaluation
Where Is the Big Data Revolution Headed?
Does Big Data Apply to Development Evaluation? Should Evaluators Care About It?
The Great Potential for Integrating Big Data into Development Evaluation
Big Data and Development Evaluation: The Need for Caution
Overcoming Barriers to Big Data Use in Evaluation
Future Scenarios for Development Evaluation in the Age of Big Data
New Skills Required for Evaluation Offices, Evaluators and Data Scientists
4 Technology, Biases and Ethics: Exploring the Soft Sides of Information and Communication Technologies for Evaluation (ICT4Eval)
Factors Affecting Information Technology Access and Use Among the Most Vulnerable
Data and Technology Alone Cannot Ensure Inclusion
Inclusiveness of Access and Use Affect the Representativeness of Big Data
Bias in Big Data, Artificial Intelligence and Machine Learning
Protecting Data Subjects’ Rights in Tech-Enabled, Data-Led Exercises
Improving Data Privacy and Protection in the Development Sector
5 Technology and Its Implications for Nations and Development Partners
Structural Transition and Pathways for Economic Development
Income - Who Has Technology Affected the Most?
A Luddite’s Nightmare or a Passing Phenomenon?
Geography - Implications for Sustainable Rural Development
Dealing with Disruptions and Moving Forward
Implications for Development Partners
Conclusions
The Story This Far
Key Takeaways
Looking Ahead
Index

Recommend Papers

Opportunities and Challenges for Community Development: Volume 1: Information and Communication Technologies for Development in Africa (Information and ... Technologies for Development in Africa) 1552500012, 9781552500019, 9781417500185

Volume 1 looks at the introduction, adoption, and utilization of ICTs at the community level. In various contexts -- geo

492 0 398KB Read more

The Experience with Community Telecentres: Volume 2: Information and Communication Technologies for Development in Africa (Information and Communication ... for Development in Africa, Volume 2) 1552500063, 9781552500064, 9781552501399

Volume 2 examines the setting, operations and effects of community telecenters. It describes the telecenter experiences

463 67 423KB Read more

Information and Communication Technologies for Humanitarian Services (Telecommunications) 1785619969, 9781785619960

Humanitarian services seek to promote welfare to save lives, maintain human dignity, alleviate suffering, strengthen pre

553 10 13MB Read more

Information Communication Technologies and Human Development: Opportunities and Challenges 1599040573, 9781599040578, 1599040581, 9781599040585, 9781599040592

Technology has always played a decisive role in humanity s progress, although the positive impacts technology has on hum

611 103 3MB Read more

Communication and Information Technologies Annual : [New] Media Cultures 9781785607844, 9781785607851

Sponsored by the Communication and Information Technologies Section of the American Sociological Association, this volum

139 47 3MB Read more

Biological Functions For Information And Communication Technologies: Theory And Inspiration 9783642151019, 9783642151026, 3642151019, 3642151027

This book uniquely combines the topics of Life Science and Communication Technology. The book tackles current problems,

400 28 7MB Read more

Biological Functions for Information and Communication Technologies: Theory and Inspiration [1 ed.] 3642151019, 9783642151019

By incorporating biologically-inspired functions into ICT, various types of new-generation information and communication

415 71 7MB Read more

Information and Communication Technologies for Ageing Well and e-Health (Communications in Computer and Information Science) 3031374959, 9783031374951

This book constitutes the refereed post-conference proceedings of the 7th and the 8th International Conference on Big Da

121 78 32MB Read more

Information and Communication Technologies for Agriculture—Theme II: Data 9783030841485, 9783030841478, 3030841480

This volume is the second (II) of four under the main themes of Digitizing Agriculture and Information and Communication

120 29 35MB Read more

ICT for the Next Five Billion People: Information and Communication for Sustainable Development 9783642122248, 3642122248

Currently, around one to two billion users are able to connect to the Internet, most of them living in the industrialize

123 72 12MB Read more

Information and Communication Technologies for Development Evaluation
0367137143, 9780367137144, 9780429028236

Author / Uploaded
Oscar A. García and Prashanth Kotturi

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Information and Communication Technologies for Development Evaluation

Written by a team of expert practitioners at the Independent Office of Evaluation of International Fund for Agricultural Development (IFAD), this book gives an insight into the implications of new and emerging technologies in development evaluation. Growing technologies such as big data analytics, machine learning and remote sensing present new opportunities for development practitioners and development evaluators, particularly when measuring indicators of the Sustainable Development Goals. The volume provides an overview of information and communication technologies (ICTs) in the context of evaluation, looking at the theory and practice, and discussing how the landscape may unfold. It also considers concerns about privacy, ethics and inclusion, which are crucial issues for development practitioners and evaluators working in the interests of vulnerable populations across the globe. Among the contributions are case studies of seven organizations using various technologies for data collection, analysis, dissemination and learning. This valuable insight into practice will be of interest to researchers, practitioners and policymakers in development economics, development policy and ICT. Oscar A. García is the Director of the Independent Office of Evaluation (IOE) of the International Fund for Agricultural Development (IFAD). Before joining IFAD, Oscar served as the head of the advisory services at UNEP – Technology, Industry and Economics Division, Paris, providing guidance to the Partnership for Action on Green Economy. He was senior evaluation advisor for the United Nations Development Programme (UNDP) Evaluation Office overseeing programmatic evaluations in Africa, Asia, Latin America and the Caribbean. Mr. Garcia has more than 25 years of professional experience, combining operational and managerial practice with results-based management, strategic planning and evaluation expertise. Prashanth Kotturi joined the IOE in October 2012 and is currently working as an Evaluation Analyst. Since then, he has worked in lead and support roles on a wide range of evaluations ranging from project evaluations, country portfolio evaluations to corporate-level evaluations and evaluation synthesis. Before IOE, Prashanth has worked in the financial services industry and with microfinance institutions in his home country, India.

Routledge Studies in Development Economics

142 The Service Sector and Economic Development in Africa Edited by Evelyn F. Wamboye and Peter J. Nyaronga 143 Macroeconomic Policy for Emerging Markets Lessons from Thailand Bhanupong Nidhiprabha 144 Law and Development Theory and Practice Yong-Shik Lee 145 Institutions, Technology and Development in Africa Jeffrey James 146 Urban Policy in Latin America Towards the Sustainable Development Goals? Edited by Michael Cohen, Maria Carrizosa and Margarita Gutman 147 Unlocking SME Finance in Asia Roles of Credit Rating and Credit Guarantee Schemes Edited by Naoyuki Yoshino and Farhad Taghizadeh-Hesary 148 Full and Productive Employment in Developing Economies Towards the Sustainable Development Goals Rizwanul Islam 149 Information and Communication Technologies for Development Evaluation Edited by Oscar A. García and Prashanth Kotturi For more information about this series, please visit www.routledge.com/ series/SE0266

Information and Communication Technologies for Development Evaluation

Edited by Oscar A. García and Prashanth Kotturi

First published 2020 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2020 selection and editorial matter, Oscar A. García and Prashanth Kotturi; individual chapters, the contributors The right of Oscar A. García and Prashanth Kotturi to be identified as the authors of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-0-367-13714-4 (hbk) ISBN: 978-0-429-02823-6 (ebk) Typeset in Times New Roman by codeMantra

Contents

List of Figures, Tables and Boxes List of Contributors Acknowledgements Introduction

ix xi xiii 1

O scar A . Garc í a and Prashanth Kotturi

The Challenge of Evaluating the Sustainable Development Goals 2 ICTs with Promise for Evaluators 4 Privacy, Equity and Biases 5 ICT4Eval – The Publication 7 1 Evaluation and the Sustainable Development Goals: Opportunities and Constraints M arco Segone

The Greatest Opportunity and the Greatest Challenge for the Global Evaluation Community 10 How Do We Assess Whether Development Interventions Are Relevant, and Are Having an Impact in Decreasing Inequality and Improving the Welfare of the Worst-Off Groups? 11 How Do We Carry Out Evaluation Given the Complexity of the SDGs? 14 How Can We Take Advantage of New Technologies to Address New Challenges? 18 How Can We Strengthen the Capacities of Governments, Civil Society Organizations, and Parliamentarians to Evaluate Whether Interventions Are Having Equitable Outcomes for Marginalized Populations? 20 Individual and Institutional Evaluation Capacities Enabled by a Supportive Environment 21 Fostering Demand for and Supply of Evaluation 23 Conclusion 24

9

vi Contents 2 Information and Communication Technologies for Evaluation (ICT4Eval): Theory and Practice

27

O scar A . Garc í a , J yrki Pulkkinen , Prashanth Kotturi et al .

Data Collection: Faster, Cheaper, More Accurate 27 Data Analysis: The Machine Learning Revolution 29 Dissemination and Learning: Reaching a Global Audience 29 ICTs in Practice – the Case for Cautious Optimism 65 Sunk Costs vs. an Opportunity 65 Do It Yourself vs. Outsourcing 66 Mainstreaming ICTs into Operations 67 Evaluation 2.0: Turning Dilemmas to Dividends? 68 3 Big Data Analytics and Development Evaluation: Optimism and Caution

76

M ichael Bamberger

Some Themes from the Big Data Literature 77 Demystifying Big Data 79 Defining Big Data and NIT 79 Big Data and Data Analytics 80 The Data Continuum 87 The NIT Ecology and the Linkages to Development Evaluation 88 Where Is the Big Data Revolution Headed? 89 Does Big Data Apply to Development Evaluation? Should Evaluators Care About It? 91 The Great Potential for Integrating Big Data into Development Evaluation 92 Big Data and Development Evaluation: The Need for Caution 96 Overcoming Barriers to Big Data Use in Evaluation 102 Future Scenarios for Development Evaluation in the Age of Big Data 104 New Skills Required for Evaluation Offices, Evaluators and Data Scientists 106 4 Technology, Biases and Ethics: Exploring the Soft Sides of Information and Communication Technologies for Evaluation (ICT4Eval) L inda R aftree

Factors Affecting Information Technology Access and Use Among the Most Vulnerable 112

111

Contents vii Data and Technology Alone Cannot Ensure Inclusion 115 Inclusiveness of Access and Use Affect the Representativeness of Big Data 116 Bias in Big Data, Artificial Intelligence and Machine Learning 118 Protecting Data Subjects’ Rights in Tech-Enabled, Data-Led Exercises 121 Improving Data Privacy and Protection in the Development Sector 123 5 Technology and Its Implications for Nations and Development Partners

128

O scar A . Garc í a and Prashanth Kotturi

Structural Transition and Pathways for Economic Development 128 Income - Who Has Technology Affected the Most? 133 A Luddite’s Nightmare or a Passing Phenomenon? 137 Geography - Implications for Sustainable Rural Development 138 Dealing with Disruptions and Moving Forward 139 Implications for Development Partners 142 Conclusions

146

O scar A . Garc í a and Prashanth Kotturi

The Story This Far 146 Key Takeaways 148 Looking Ahead 149 Index

153

List of Figures, Tables and Boxes

Figures 1.1 National average reduction in income poverty in a hypothetical country 1.2 Reduction in income poverty in a hypothetical country, by gender 1.3 The SDGs as a system of goals and targets 1.4 Links between the SDGs through targets: an aggregated picture 1.5 Links among Sustainable Development Goal 10 (inequality) and other goals 2.1 A spatial model represents the real world as a combination of layers or themes 2.2 Globally distributed GEF-supported PAs overlaid with sites of conservation importance 2.3 Forest change data used for global analysis, an example of a PA in Mexico 2.4 Economic valuation of carbon sequestrated at each GEF-supported project site 2.5 Geospatial analysis based on satellite data shows decreasing vegetation at the mouth of the Kagera River, Uganda 2.6 In a systematic review, 20% of the search delivered 80% of the evidence 2.7 Google search trends for the term “machine learning” 2.8 Nightlights classified as low (red), medium (yellow) and high (green) in northern Nigeria 2.9 Trained vs. test models 2.10 Satellite imagery and neural networks 2.11 Predictions at 500m by 500m resolution of food insecurity for a town in Mali, based on a 2018 Emergency Food Security Assessment 3.1 The components of NIT 3.2 The four stages of the data analytics cycle 3.3 The data continuum

13 14 15 16 17 31 33 34 36 37 48 50 58 59 60 60 80 82 88

x List of Figures, Tables and Boxes 3.4 The big data ecosystem and the linkages to the evaluation ecosystem 90 5.1 Labour’s percentage share of GDP has been declining in recent decades 134 5.2 When labour’s share of GDP declines, inequality tends to rise 135 5.3 Share of labour by skills (as per cent of national income) 136 5.4 Impact of various factors on aggregate labour share (in percentage points) change by skill, 1995–2009 137

Tables 3.1 Comparing big data and conventional evaluation data 81 3.2 Kinds of big data analysis with potential applications for programme monitoring and evaluation 82 3.3 NIT and data analytics applications used widely in international development 91 3.4 Ways that big data and ICTs can strengthen programme evaluation 97 3.5 Big data and ICTs have been used to strengthen widely used evaluation designs 98 3.6 Examples of big data and data analytics approaches being used in programme evaluation 99 5.1 Stages of agricultural development for countries in developing Asia and the Pacific 130

Boxes 3.1 Using Data Analytics to Evaluate a Programme for At-Risk Youth Served by a County Child Welfare System 86 3.2 Using an Integrated Data Platform to Help End Modern Slavery 93 4.1 Factors That Affect Access and Use of NIT, Which Can Result in Biased or Incomplete Data 113 4.2 Using Technology for Accountability: Findings from Making All Voices Count 115 4.3 Data Subjects’ Rights in the GDPR 122

List of Contributors

Hamdi Ahmedou has worked with the Independent Office of Evaluation (IOE) of International Fund for Agricultural Development (IFAD) as an evaluation analyst. Anupam Anand currently works as an evaluation officer at Independent Evaluation Office of Global Environment Facility (GEF) and uses satellite data, geographic information system, machine learning, computational social science, field surveys and other mixed methods to evaluate environmental and developmental projects. Michael Bamberger has involved in development evaluation for 50 years and currently serves as an independent consultant. Mostly recently, he has authored the UN Global Pulse report on “Integrating Big Data into Monitoring and Evaluation of Development Programmes”. Geeta Batra is the chief evaluation officer at Independent Evaluation Office of GEF. Michael Carbon currently works as a Senior Evaluation Officer with the IOE of IFAD. Mr Carbon has worked for UN Environment Programme, Nairobi. He has led a variety of ex-post evaluations in Africa, Asia and Latin America and numerous evaluations of environmental and rural development projects and programmes. Paul Jasper is deputy team leader at Oxford Policy Management and works on surveys, quantitative research, statistical analysis, experimental and quasi-experimental quantitative evaluation methodologies. Simone Lombardini leads Oxfam GB’s Impact Evaluation function. His area of expertise includes designing and conducting experimental and quasi-experimental impact evaluations. Edoardo Masset currently serves as the Deputy Director of the Centre of Excellence in Development Impact and Learning (CEDIL), a consortium established with the goal of developing and testing innovative methods for evaluation and evidence synthesis, and has served as the Deputy

xii List of Contributors Director and head of the London office of the International Initiative for Impact Evaluation (3ie). Jean-Baptiste Pasquier currently serves as vulnerability analysis and mapping (VAM) officer in Libya country office of the World Food Programme (WFP). Prior to that, he has served as a data scientist in WFP headquarters in Rome. Jyrki Pulkkinen serves in the Ministry of Foreign Affairs of Finland and has formerly served as the Director of Development Evaluation in the ministry. Linda Raftree is an independent consultant and supports digital strategy, programme design, policy and research for international development initiatives. She advocates for ethical approaches to using information and communication technologies (ICTs) and digital data in the humanitarian and development spaces. Lorenzo Riches currently works as a data scientist in the VAM division of the WFP. Marco Segone currently serves as the director of evaluation office at the UN Population Fund. He has authored numerous publications including Evaluation for Equitable Development Results and How to Design and Manage Equity Focused Evaluations. Gaurav Singhal currently works as an independent researcher and has served as the lead data scientist in the VAM division of the WFP. Emily Tomkys Valteri works as ICT in Programme Accountability Programme Manager in Oxfam GB. She specializes in mobile data collection and mobile case management, and coordinates Oxfam’s work on how ICTs can be used in Monitoring, Evaluation, Accountability and Learning (MEAL). Juha Ilari Uitto currently serves as the director of the Independent Evaluation Office of GEF. He has conducted and managed a large number of programmatic and thematic evaluations of international cooperation at the global, regional and country levels, in particular related to environmental management and poverty-environment linkages. Monica Zikusooka currently works as Regional Programme Quality and Impact Manager at Save the Children, Kenya. Her work focuses on MEAL systems, building, results monitoring and evidence generation, capacity-building, cross-country learning and linking the regional operations to Save the Children’s global MEAL strategy.

Acknowledgements

This book was edited by Oscar A. García, Director, Independent Office of Evaluation (IOE) of International Fund for Agricultural Development (IFAD) and Prashanth Kotturi, Evaluation Analyst, IOE of IFAD. It features chapters from Marco Segone, Director, Evaluation Office, United Nations Population Fund (UNFPA); Jyrki Pulkkinen, Evaluation Commissioner, Ministry of Foreign Affairs of Finland; Michael Bamberger, Independent Evaluation Consultant; and Linda Raftree, Independent Consultant. The book was enriched by individual case study contributions from Juha Uitto, Director, Independent Evaluation Office, GEF; Anupam Anand, Evaluation Officer, Independent Evaluation Office, GEF; Geeta Batra, Deputy Director, Independent Evaluation Office, GEF; Monica Zikusooka, Regional Monitoring and Evaluation Specialist, Save the Children; Michael Carbon, Senior Evaluation Officer, IOE of IFAD; Hamdi Ahmedou, Former Evaluation Consultant, IOE of IFAD; Edoardo Masset, Former Deputy Director at International Initiative for Impact Evaluation (3iE); Paul Jasper, Deputy Team Leader, Oxford Policy Management; Gaurav Singhal, Former Lead Data Scientist, WFP of the United Nations; Lorenzo Riches, WFP of the United Nations; Jean Baptiste Pasquier, WFP of the United Nations; Simone Lombardini, Global Impact Evaluation Advisor, Oxfam; and Emily Tomkys Valteri, ICT in Programme Accountability Project Manager. The process of producing the book was ably supported by Andrew Johnston who edited a preliminary version of the document.

Permissions Figures 1.3, 1.4 and 1.5 – From ‘Towards Integration at Last? – The sustainable development goals as a network of targets’, by David le Blanc, ©2015 United Nations. Reprinted with the permission of the United Nations. Figures 5.1, 5.2, 5.3 and 5.4 – Republished with permission of International Monetary Fund, from ‘Why Is Labour Receiving a Smaller Share of Global Income? Theory and Empirical Evidence’, Dao, M. et al., 2017.

Introduction Oscar A. García and Prashanth Kotturi

Monitoring and evaluation are essential to gather information from past and current activities that aim to have a positive impact on sustainable development. They are useful to track progress, identify areas for improvement and inform decision-making. The two concepts are not synonymous, however. Evaluation has come a long way since its origin as part of a joint “monitoring and evaluation” process. It is now regarded as an independent function that systematically assesses the achievement of expected and unexpected results. Evaluation is designed to enable judgements about what has worked and not worked, and most importantly to identify the factors leading to performance, while guiding future operations and strategy. In their most fundamental ways, monitoring and evaluation differ from each other in three significant aspects: timing, purpose and who conducts it. In terms of timing, monitoring takes place throughout the project cycle, while evaluation assesses all or part of the project cycle and is conducted at a certain point in time. Monitoring’s purpose is to track a development intervention’s progress, comparing what is delivered with what is planned. Evaluation, on the other hand, reviews the achievements of a programme and considers whether the programme was the best way to achieve it. It measures both intended and unintended effects of an intervention. More importantly, evaluation attempts to establish some level of attribution of the observed effects to the intervention. Monitoring is typically conducted by programme managers, while evaluations are conducted by an independent third party who can be impartial in consulting with programme staff and stakeholders (CID, 2014). An evaluator’s role is to investigate and justify the value of an evaluand. Such investigation and justification shall be supported by joining empirical facts and probative reasoning (Scriven, 1986). The responsibilities of evaluators and the expectations of evaluations have increased. Evaluation has evolved from being a self-assessment exercise for development institutions to becoming a tool of impartial scrutiny of operations, reporting usually directly to governing bodies (IOE, 2015b). This evolution requires data collection and analysis to pass a more rigorous methodological test than even that envisaged by Michael Scriven. Different methodologies and methods have

2 Oscar A. García and Prashanth Kotturi comparative advantages in addressing particular concerns and needs; in the end, the question is whether the methodological approach is adequate given the available budget and capacities. Given constraints and challenges faced, is it possible to provide the evidence that allows passing judgement on what was achieved (IOE, 2015a)? Using a mix of methods – combining the breadth of quantitative methods with the depth of qualitative methods, and “triangulating” information from different sources and approaches – evaluation can be useful to assess different facets of complex development outcomes or impacts. This will yield greater validity than using one method alone. This is in line with current good practice in development evaluation (UNEG, 2016). These changes have made evaluation a powerful instrument for accountability – for holding institutions responsible for delivering on their mandates. It is also a vital tool for learning – for making sure that errors in the past, either by implementation or by design, are not repeated in the future, thus strengthening the programmes and strategies of the institutions and beyond. There are, however, still important challenges for the conduct of development evaluation associated with the availability and quality of data, such as lack of baseline data, sampling bias, difficulties in selecting appropriate counterfactuals, and accounting for the impact of multiple scales and contexts on development interventions. These challenges are compounded by the evolving development discourse and the movement towards Sustainable Development Goals (SDGs).

The Challenge of Evaluating the Sustainable Development Goals The SDGs agreed by 193 countries in September 2015, which frame the international development agenda till 2030, benefit from a quarter of a century of evaluating efforts to achieve their predecessors, the Millennium Development Goals (MDGs) (see Chapter 1). At the heart of the SDGs agenda is the need to ensure sustained and inclusive economic growth, social inclusion, and environmental protection, fostering peaceful, just, and inclusive societies through a new global partnership. Hence, the SDGs agenda covers social, economic and, most importantly, environmental sustainability. Such multidimensional nature of sustainable growth requires evaluators to take a systems view of the SDGs. However, taking multidimensionality into account is easier said than done. There are three main challenges for evaluation as pertains to taking a systems view, each flowing from the other. (i) multiple interrelated sectors are involved in achieving SDGs. This brings a greater degree of complexity, which has to be dealt with in new ways. (ii) the complexity of the SDGs also requires multiple levels of interventions (national, regional, local) and the involvement of multiple actors. (iii) in light of the complexity involved, the data and requisite capacities, including national evaluation capacities, are inadequate, so a conceptual framework, and the means therein for evaluating SDGs, are scarce and in

Introduction 3 some cases do not exist (EvalPartners, 2017). The 17 SDGs are the results of intergovernmental negotiations on priority development challenges facing the world in the 21st century and have 169 targets and 232 indicators. As of September 2017, 145 out of the 232 indicators could not be evaluated due to lack of data or of an internationally agreed methodology (UNDP, 2017). The advent of the SDGs also brings opportunities. The emerging knowledge in evaluation has led to the recognition of the complexity and interrelatedness of the SDGs and the potential they have to put sustainability at the centre of the sustainable development agenda (Nilsson, 2016). This helps evaluators look at the SDGs as a part of a single system and apply systems thinking, an integrated approach that was largely absent in the process of conceptualizing and evaluating the MDGs. Evaluators cannot tackle the challenges of the SDGs and harness the opportunities that they present using existing paradigms of data collection and analysis. The inherent complexity of the SDGs – and the lack of data for some i ndicators – implies that evaluators will have to come up with new sources and methods. In a rapidly evolving environment, mustering data of the scale and variety required by the number and diversity of SDG indicators calls for fresh approaches. A decade ago, many people still lived beyond the reach of information and communication technologies (ICTs). Today, these technologies are everywhere. Satellite signals, mobile telecommunication antennas, and TV and radio frequencies cover the globe. World has moved from 2G to the impending introduction of 5G in many countries around the world. Enabled by the rapid spread of technology and more affordable access to the Internet and mobile networks, the availability of digital data, in new and different forms, has grown at massive scale. Several initiatives have been launched in recent years that have explored diverse ways to leverage these new types of digital data for more targeted, effective and efficient development interventions (GIZ, 2017). That is why ICTs applied to development evaluation are so crucial. Technology offers the rigour that is required in evaluation to establish attribution to programme results, compared with the progress that is to be measured in monitoring. Thus, ICTs applied to development evaluation (ICT4Eval) encompass potential or actual usage of technology as a tool for collecting required data, turning it into information or, even better, knowledge, and communicating it to the designated audience in a manner that suitably addresses attribution or substantial contribution of development interventions to results achieved. To do so, evaluators have now the ability to call upon an extraordinary set of tools that can be deployed in almost any setting, from remote sensing to

4 Oscar A. García and Prashanth Kotturi digital survey applications and data analysis powered by machine learning and artificial intelligence. ICTs show great potential to improve the quality of the work that evaluators perform. They offer the potential to enhance the evaluation cycle, including its planning and execution, improving feedback loops and management response, and the sharing of knowledge with partners. For that reason, they are critical to strengthening evidence-based policymaking that relies on the evaluation of impacts, outcomes and shortcomings of development initiatives at all levels of activity. Evaluators therefore need to keep abreast of important developments in ICTs, so they can stay at the cutting edge of innovation as they shape the future of the field of development. This need to make sense of the plethora of advancements in technology is what motivated the Independent Office of Evaluation (IOE) of International Fund for Agricultural Development (IFAD) to organize an international conference on Information and Communication Technologies for Evaluation and also to put together this publication on the same topic.

ICTs with Promise for Evaluators The conference focused on four broad technical strands of discussion: wireless communication, remote sensing, machine learning and big data, as well as exploring cross-cutting issues such as ethics and privacy. These technologies are not new – most have existed for decades. Artificial intelligence and machine learning originated in the 1950s with Alan Turing’s “Turing test”. Remote sensing was developed during the space race of the Cold War era. Wireless communication and the Internet grew out of technologies developed for military applications decades ago. Thus, the technologies discussed in this book might not be novel by themselves. However, what is new is their proliferation, access and relative affordability. They have evolved over a period of time to lend themselves to use for the field of development in general and evaluation more specifically. Remote sensing. Remote sensing is the process of detecting and monitoring the physical characteristics of an area by measuring its reflected and emitted radiation at a distance (USGS, 2018). Special cameras and sensors that collect images of the earth remotely may be attached to a variety of platforms such as ships, aircrafts, drones and satellites. Newer, more accurate and higher resolution sensors are being introduced at low or no cost to end users. As an example, the European Space Agency’s Sentinel constellation of satellites promises to provide multispectral images down to a 10-meter resolution at weekly intervals. In addition, satellite imagery has come to be recognized as a global public good, which has led to satellite images being made available for free to the public at large (Borowitz, 2017). When combined with other methods of collection and analysis, this rich trove of global data can offer evaluators a reliable source for numerous indicators.

Introduction 5 Wireless communication and devices. Wireless communication technologies have proliferated into many parts of life with the aid not only of personal computers but also of devices such as smartphones that have merged numerous functions, from alarm clock and calculator to camera and personal digital assistant. Such assimilation of functions in increasingly smaller devices has implications for the way data can be collected in the field and the pace with which it can be analysed. The proliferation of telecom data networks makes it increasingly simple to transfer swaths of data from almost any location. It also provides possibilities for evaluators to integrate their data collection exercises in the field seamlessly and efficiently through integration with cloud hosting. This raises the real possibility for development practitioners in general and evaluators in particular to create real-time data systems, thus reducing time taken to close the feedback loop from data collection to analysis and its use for decision-making (GIZ, 2017). Big data. The exponential growth of big data and smart data analytics provides information and analytical capacity that would have been unimaginable even a few years ago. Digital data offers a trove of real-time information on a huge range of issues, such as trends in food prices, availability of jobs, access to health care, quality of education and reports of natural disasters (see Chapter 3). With the help of technology, anything from human DNA to earth’s terrestrial features can now be turned into trillions of data points. Technology has also provided new ways to analyse such data, through radical advances in cloud and distributed computing and in machine learning. Machine learning. Machine learning is “the science of getting computers to learn and act like humans do, and improve their learning over time in autonomous fashion, by feeding them data and information in the form of observations and real-world interactions” (Faggella, 2017). Machine learning and artificial intelligence were the buzzwords in the technology sphere since 2016 – and for good reason. The basic idea behind machine learning goes back nearly seven decades. What has led to the burst of enthusiasm and activity in this field in the past five years is the drastic reduction in the costs of computing power driven by the advent of cloud and distributed computing and storage (Parloff, 2016). This transformation has fuelled both the big data revolution and machine learning, which have reinforced each other. Now virtually anyone can access, store and analyse large amounts of data; convert it into invaluable information; and build models that can predict anything from the distribution of poverty to medical conditions and consumer behaviour. Privacy, Equity and Biases While technology can bring the world to us, it can also lay bare our lives and magnify existing inequalities. Privacy, equity of access and biases are concerns that have evolved alongside the broader leaps in technology. Use of

6 Oscar A. García and Prashanth Kotturi technology is accompanied by a certain lack of control over one’s data and how it is used (van den Hoven et al., 2014). Such loss of privacy comes with strong ethical concerns about what purposes personal information can be used for. This is especially true because the social consensus on technology – and, as a result, the legislative framework – has not been able to keep pace with technological innovation and its application (Wadhwa, 2014). Without a social consensus, it cannot be ascertained what is unethical, and without legislative framework, it cannot be ascertained what is legal. Development workers deal with some of the poorest and most vulnerable sections of the society, who do not necessarily have the legislative protection or awareness needed to safeguard themselves. Thus, the emphasis on privacy and ethics becomes that much more important in development. A moral and ethical imperative underlies the work of the development sector, so the concepts of data privacy and data protection are not difficult to extend to its general ethos and ways of working. However, the increasing complexity of big data, artificial intelligence, data privacy, protection and security, and of the legal and regulatory environment requires a focused effort to make these topics part of everyday language and actions and to build or acquire the necessary skills and expertise to prevent harm to the most vulnerable (see Chapter 4). The distribution of technology is not uniform, and access to it is not equitable. Larry Irving Jr., a former US assistant secretary of commerce, coined the term “digital divide” in the 1990s to describe this gap. The digital divide refers to unequal patterns of material access to, usage capabilities of, and benefits from computer-based information and communication technologies that are caused by stratification processes that produce classes of winners and losers of the information society, and participation in institutions governing ICTs and society (Fuchs and Horak, 2007). The host of technologies emerging today in machine learning and big data are based on using data from numerous existing technological platforms. These technology platforms have differing levels of accessibility and coverage. Thus, systematic adoption of different technologies will have to be accompanied by deliberation on who is being reached and whose voice is being heard. Increased use of technology could increase the marginalization of those on the other side of the digital divide. Using new technologies thus entails keeping the spirit of “leave no one behind” in play when conducting evaluations. The above inevitably gives rise to concerns of biases creeping into evaluations. Many of the algorithms that help provide new ways of making sense of data were developed in the private sector. A lack of algorithmic transparency can be especially worrisome when poorly constructed algorithms are underpinned by biased processes and data, which lead to biased results and decisions (GIZ, 2017). This is especially the case if we rely on machine learning algorithms and big data, whose predictive models can extrapolate existing biases born of inequities. Technology can also amplify the voice of the

Introduction 7 vulnerable and marginalized if used appropriately; evaluators will have to be aware of both sides of the coin. As Marco Segone, director of evaluation for the UN Population Fund, said in his closing remarks at the ICT4Eval conference, “Evaluators should neither embrace technology nor reject it. We, as evaluators, should shape it”.

ICT4Eval – The Publication The discussions in the ICT4Eval conference highlighted the need for further deliberation on specific topics. This book is an endeavour in that regard, to cover further ground on selected topics from the conference. The book is comprised of five chapters and seven individual case studies by 19 authors, who elaborate on experiences of using ICT tools. The first chapter deals with the extensive challenges that lie ahead in assessing progress on SDGs. The second chapter deals with the host of technologies that evaluators could use, followed by examples of how such technologies have been deployed and the results. Contributors to such cases hail from United Nations agencies, governments, non-governmental organizations, private consulting firms, academia and the world of freelance professionals. The third chapter deals with the broader paradigm of big data and how different technologies feed into it. It elaborates on the practical issues that evaluators could face in trying to use existing avenues of big data. The fourth chapter deals with ethics, privacy and biases in using technology for monitoring as well as evaluations. The final chapter deals with the broader implications of technology for economic development and for countries’ development trajectories. This book offers a starting point for deliberating on the use of an increasingly complex set of ICT tools in development in general and evaluation in particular. It is the first step in a long iterative process of introducing innovations, learning from them and adapting to changing times. The book has been written by practitioners, for practitioners, to explore ways of harnessing technology for their work, ranging from simple mobile-based tools to cutting-edge neural networks in deep learning and artificial intelligence. It is a book that seeks to demonstrate, by example, the frontiers that can be breached in ICT4Eval. It assimilates a lifetime of experience and work by accomplished practitioners from a wide range of backgrounds. It is an invitation to opening new avenues for evaluation to address the pressing development challenges of our time.

References Borowitz, M.J. (2017), Open Space, the global effort for open access to environmental satellite data, MIT Press, Cambridge, Massachusetts. CID (2014), Monitoring versus evaluation, Council for International Development, Wellington, New Zealand, www.cid.org.nz/assets/Key-issues/Good-Development- Practice/Factsheet-17-Monitoring-versus-evaluation.pdf.

8 Oscar A. García and Prashanth Kotturi EC (2017), Introduction to monitoring and evaluation using the logical framework approach, European Commission Civil Society Fund in Ethiopia, Addis Ababa, http://eeas.europa.eu/archives/delegations/ethiopia/documents/eu_ethiopia/ ressources/m_e_manual_en.pdf. EvalPartners (2017), Evaluating the sustainable development goals, EvalPartners, New York. Faggella, D. (2017), “What is Machine Learning?”, TechEmergence, San Francisco, www.techemergence.com/what-is-machine-learning/. Fuchs, C. and E. Horak (2007), “Informational Capitalism and the Digital Divide in Africa”, Masaryk University Journal of Law and Technology, Masaryk University, Brno, Czech Republic, https://journals.muni.cz/mujlt/article/view/2504/2068. GIZ (2017), Data for development: what’s next?, Deutsche Gesellschaft für Internationale Zusammenarbeit, Bonn, http://webfoundation.org/docs/2018/01/ Final_Data-for-development_Whats-next_Studie_EN.pdf. IOE (2015a), Evaluation manual, second edition, Independent Office of Evaluation of the International Fund for Agricultural Development, Rome, www.uneval.org/. IOE (2015b), The evolution of the independent evaluation function at IFAD, Independent Office of Evaluation of the International Fund for Agricultural Development, Rome. Nilsson, M. (2016), Understanding and mapping important interactions among SDGs, United Nations High Level Political Forum on Sustainable Development, Vienna. Parloff, R. (2016), “Why Deep Learning is Suddenly Changing your Life”, Fortune, New York, http://fortune.com/ai-artificial-intelligence-deep-machine-learning/. Scriven, M. (1986), New frontiers of evaluation. Evaluation Practice. UNDP (2017), Guidance note: data for implementation and monitoring of the 2030 agenda for sustainable development, UN Development Programme, New York, www. undp.org/content/dam/undp/library/Sustainable%20Development/Guidance_Note_ Data%20for%20SDGs.pdf. UNEG (2016), Norms and standards for evaluation, United Nations Evaluation Group, New York. USGS (2018), What is remote sensing and what is it used for?, United States Geological Survey, Washington, DC, www.usgs.gov/faqs/what-remote-sensing-and-what-itused-0. Van den Hoven, J. et al. (2014), “Privacy and Information Technology”, The Stanford Encyclopedia of Philosophy (Summer 2018 Edition), Edward N. Zalta (ed.), Stanford University, https://plato.stanford.edu/archives/sum2018/entries/it-privacy/. Wadhwa, V. (2014), “Laws and Ethics can’t Keep Pace with Technology”, MIT Technology Review, Massachusetts Institute of Technology, Cambridge, Massachusetts.

1 Evaluation and the Sustainable Development Goals Opportunities and Constraints Marco Segone

We live in a world where a massive concentration of wealth and privilege exists in the hands of a few: the richest 1% of the population owns 40% of the world’s wealth, while the poorest 50% of the population owns only 1% of the world’s wealth. Human development indicators show that 793 million people are still malnourished and that one in three women will be beaten, raped, abused, or mutilated in their lifetimes. The world is already witnessing the impact of climate change on natural systems. Climate change is also projected to undermine food security, exacerbate existing health threats, reduce water availability and increase displacement of population. The unsustainable patterns of economic development have led to unequal distribution of fruits of economic growth and exacerbated concerns of environmental sustainability. Is this the world we want? Or would we like to live in a world in which inequalities have been banished for everyone, everywhere, all the time? Most people would agree this is a common goal. So how do we get there? The good news is that the 197 countries that endorsed the 2030 Agenda for Sustainable Development recognize the importance of long-term, equitable and sustainable development, and more and more countries are implementing social and public policies to try to decrease the gap between those with the most and those with the least (Segone and Tateossian, 2017). The ambitious 2030 Agenda for Sustainable Development, adopted in September 2015 by world leaders at a historic UN summit, calls for global transformation that focuses on ending poverty, protecting the planet and ensuring prosperity for all. In January 2016, the 17 Sustainable Development Goals (SDGs) intended to implement this agenda came into force. These new goals – built on the success and the unfinished agenda of the Millennium Development Goals (MDGs) – call on all countries to mobilize efforts to end all forms of poverty, fight inequalities and tackle climate change, while ensuring that “no one is left behind”. The innovative and transformational process and content of the SDGs increases our chances of reaching the goals. There are five fundamental differences between the SDGs and the MDGs.

10 Marco Segone First, the SDGs were formulated through a broad, inclusive process. For more than two years, governments, civil society groups, the private sector and thought leaders from around the world negotiated and discussed the SDGs. For the first time, eight million people voted on which of the global goals were most important to them. This inclusive and participatory process has also encouraged each country to adapt the SDGs to its own national contexts, increasing the sense of ownership of the goals (UNDG, 2016). Second, the SDGs are universal. Unlike the MDGs, which had a strong focus on developing countries (with seven of the eight goals devoted to them), the SDGs are relevant to every country (Osborne, Cutter and Ullah, 2015). Rob D. van den Berg, president of the International Development Evaluation Association, has reminded us that “from the perspective of the SDGs, all countries are developing countries”. Third, the SDGs are comprehensive and integrated. While the large number of goals (17) has led some to express concern, it also encourages sweeping transformation across a broad range of areas and encourages the use of partnerships to accomplish these goals. To improve communication and ensure that people understand the ultimate intent of the SDGs and Agenda 2030, the United Nations has clustered them into “five Ps”: people (human development), prosperity (inclusive economic development), planet (environment and climate change), peace (a key component of all development) and partnership (one of the few ways to achieve such sweeping transformation). Fourth, the principle of “no one left behind” is the key principle informing every SDG and is mainstreamed throughout the structure of Agenda 2030. Achieving gender equality and reducing inequalities among and within countries are both stand-alone goals, and they are both mainstreamed through all SDGs. Fifth, Agenda 2030 and the SDGs include a follow-up and review mechanism, operating at the national, regional and global levels. The principles for this mechanism are voluntary and country-owned; open, inclusive and transparent; support the participation of all people and all stakeholders; are built on existing platforms and processes; avoid duplication; respond to national circumstances; and are rigorous and evidence-based, informed by data that is timely, reliable and disaggregated. Most important for those in the evaluation community, the follow-up and review mechanism is expected to be informed by country-led evaluations. Consequently, the 2030 Agenda calls for strengthening national evaluation capacities, echoing the UN General Assembly resolution adopted in December 2014 on the same subject (UN, 2015a).

The Greatest Opportunity and the Greatest Challenge for the Global Evaluation Community This is the first time in the history of international development that the world’s heads of state have committed to follow up and review mechanisms

Evaluation and SDGs 11 to assess the implementation of global goals. This assessment takes the form of voluntary national reviews (VNRs) to be undertaken by national governments of their progress on SDGs. One hundred and eleven VNRs have been presented at the UN High-Level Political Forum on Sustainable Development (HLPF) since 2016, with a further 51 due to be presented in 2019 (UNDESA, 2018). This high-level and far-reaching commitment could enable a surge in the demand for country-led evaluation. Key policymakers will hopefully demand their own national evaluation systems, so that they can produce high-quality evaluations to inform the national SDG reviews that countries will be presenting at the UN HLPF. This is therefore an unprecedented opportunity for the evaluation community. On the other hand, evaluation of these broad-reaching goals with a central focus on “no one left behind” presents several challenges: • •

• •

How do we assess whether development interventions are relevant, and are having an impact in decreasing inequality and improving the welfare of the worst-off groups? How do we carry out evaluation given the complexity of the SDGs? Are we going to evaluate complex and inter-dynamic environments with the traditional linear, simple and static logical framework (logframe) approach? How can we take advantage of new technologies to address the challenges above? Most importantly, how can we strengthen the capacities of governments, civil society organizations (CSOs) and parliamentarians to evaluate whether interventions are producing equitable outcomes for marginalized populations?

Below are some suggestions about how to address these challenges while capitalizing on the great opportunity the SDGs provide to all of us.

How Do We Assess Whether Development Interventions Are Relevant, and Are Having an Impact in Decreasing Inequality and Improving the Welfare of the Worst-Off Groups? The 2030 Agenda made a commitment to ensure a systematic follow-up and review of the SDGs that would be “robust, voluntary, effective, participatory, transparent and integrated”, and that would “make a vital contribution to implementation and will help countries to maximize and track progress in implementing the 2030 Agenda in order to ensure that no one is left behind” (UN, 2015b). Country-led evaluations could play a central role in informing SDG reviews and, together with strong monitoring data, supporting national policy decision-making. Gender equality and reducing inequalities between and among countries are central to the SDG principle of leaving no one behind (UN, 2017b). That

12 Marco Segone means going beyond aggregate indicators, which only estimate the proportion of the population who have benefited from a particular intervention and can conceal the fact that some marginal or vulnerable groups are being left behind. In this context, the goal of the SDGs in reducing inequalities includes • • •

identifying groups that have been left behind, understanding why this has happened, identifying strategies to promote more inclusive approaches that will include these groups.

Strengthening national statistical systems is of paramount importance in order to produce disaggregated data that go beyond national averages. A data availability assessment, the establishment of national SDG indicators and benchmarks and a data ecosystem assessment are all elements that would provide the building blocks for the data inputs for the VNRs (ODI, 2018). Evaluators also have to explain why certain groups have been left behind and how this can be corrected. This is why equity-focused and gender- responsive evaluation is vital. UN Women, the UN entity for gender equality and women’s empowerment, defines gender-responsive evaluation as having two essential elements: what the evaluation examines, and how it is undertaken. Gender- responsive evaluation assesses the degree to which gender and power relationships – including structural and other causes that give rise to inequalities, discrimination, and unfair power relations – change as a result of an intervention. This process is inclusive, participatory, and respectful of all stakeholders (rights holders and duty bearers). Gender-responsive evaluation promotes accountability regarding the level of commitment to gender equality, human rights, and women’s empowerment by providing information on the way in which development programmes are affecting women and men differently and contributing to the achievement of these commitments. It is applicable to all types of development programming, not just gender-specific work (UN Women Independent Evaluation Office, 2015). UNICEF, the UN agency for children, defines equity-focused evaluation as a judgement of the relevance, effectiveness, efficiency, impact and sustainability of policies, programmes and projects that are concerned with achieving equitable development results (Bamberger and Segone, 2011). This approach involves using rigorous, systematic and objective processes in the design, analysis and interpretation of information in order to answer specific questions, including those of concern to the worst-off groups. It assesses what does work to reduce inequalities, and what does not, and it highlights the intended and unintended results for the worst-off groups, as well as the gaps between the best-off, average, and worst-off groups.

Evaluation and SDGs 13 It provides strategic lessons to guide decision makers and to inform stakeholders (Bamberger and Segone, 2011). The UN Evaluation Group, the professional network of evaluation offices of UN agencies, provides a valuable resource for all stages of the formulation, design, implementation, dissemination and use of the human rights and gender-responsive-focused evaluations (UNEG, 2014). Equity-focused and gender-responsive evaluations use existing evaluation methods but bring a crucial perspective to how interventions are evaluated. In Figure 1.1, if the national policy to reduce poverty in a hypothetical country is evaluated, the main finding would be that the policy had a positive impact, reducing the percentage of the population living in poverty from 55% to 30% in 10 years. Therefore, the evaluation would recommend that the policy be continued. However, if the same policy were evaluated with an equity-focused and gender-responsive approach, the findings and recommendations would be different. Data would be disaggregated by gender1 as in Figure 1.2, and while the evaluation would acknowledge that the policy increased the national average, it would also find that the gap between males and females increased. The evaluation would also find that while the policy had an important positive effect in reducing poverty among males, it had a less positive effect on women. Therefore, the recommendation would be that the policy should be revised to decrease the inequality gap and have a more positive effect on the worst-off group.

Income Poverty

100%

80%

60% Total 40%

20%

0%

2005

2010

2015

Figure 1.1 N ational average reduction in income poverty in a hypothetical country. Source: Segone (2017).

14 Marco Segone Income Poverty by Gender

100%

80%

60%

Total Female Male

40%

20%

0%

2005

2010

2015

Figure 1.2 Reduction in income poverty in a hypothetical country, by gender. Source: Segone (2017).

How Do We Carry Out Evaluation Given the Complexity of the SDGs? As mentioned above, the SDGs are interrelated and interlinked, which adds to their complexity but also to their dynamic interaction and transformational impact. As the map of the SDGs produced by Le Blanc (2015) illustrates clearly, the SDGs and their targets can be seen as a system in which the goals are linked through targets that refer to multiple goals. The map of the SDGs shown in Figure 1.3 produced by Le Blanc (2015) represents the first 16 SDGs as broader circles of differing colours, while targets are represented by smaller circles in the colour of the goal under which they figure. The map conveys a clear sense that SDGs are a system, with goals and targets interlinked. An additional perspective shows the strengths of the links among the goals (Figure 1.4). The thicker the link between two goals, the more targets are linking the two goals, directly or through a third goal. The thickest links are between gender and education (SDGs 4 and 5), and between poverty and inequality (SDGs 1 and 10), demonstrating once again the centrality of the principle of “leaving no one behind”. There are also strong connections between SDGs 10 and 16, on peaceful and inclusive societies. Many targets referencing inequality are listed under other goals (Figure 1.5). Of note is the strong link between inequality and peaceful and inclusive societies (SDG 16), with no fewer than six targets explicitly linking the two, including two from SDG 5 on gender. As can be seen in Figure 1.4, the largest number of links (9) is with the poverty goal.

16.6

16.1

8.1

8.3

8.2

8.6

16.7

4.4

8.10

9.2

8.5

9.3

9.1

9.4

5.5

8.9 14.3

14.7

4.5

8.4

14.6

Growth and employment

8.8

5.1

11.3

10.5

Infrastructure and industrialization

8.7

10.3

10.6

10.2

16.8

10.7

Peaceful and inclusive societies

5.2

5.4

5.6

4.7

Oceans

12.5

14.4

14.1

2.3

Education

4.2 4.1

Inequality

4.6

14.2

4.3

Gender

14.5

12.7

12.8

1.3

11.7

1.4

12.1

15.6

7.1

11.6

12.6

SCP

10.4

10.1

11.1

12.1

12.3

7.3

11.2

6.1

2.4

15.7

13.3

11.5

6.2

15.3

6.3

1.5

15.2

6.6

13.1

1.2 1.1

Health

15.1

15.5

15.8

3.3

15.9

Terrestrial ecosystem

3.9

6.4

15.4

Hunger

12.4

2.1

Poverty

3.6

2.5

2.2

7.2 Energy

3.8

Cities

3.7

11.4

Note: SDG 17, partnerships for the goals, applies equally to all goals and hence is not shown Source: Le Blanc (2015).

Figure 1.3 The SDGs as a system of goals and targets.

9.5

16.3

16.2

16.9

16.10

16.5

16.4

5.3

3.4

3.2

3.5

Climate Change

Water

3.1

13.2

6.5

16.6

8.1

8.3

8.2

16.4

8.6

16.7

4.4 9.1

8.10

9.2

8.5

9.3

9.4

5.5

8.9 14.3

14.7

4.5

8.4

14.6

Growth and employment

8.8

5.1

11.3

10.5

Infrastructure and industrialization

8.7

10.3

10.6

10.2

16.8

10.7

Peaceful and inclusive societies

5.2

5.4

5.6

4.7

Oceans

12.5

14.4

14.1

2.3

Education

4.2 4.1

Inequality

4.6

14.2

4.3

Gender

14.5

12.7

12.8

1.3

11.7

1.4

12.1

15.6

7.1

11.6

12.6

SCP

10.4

10.1

11.1

12.1

12.3

7.3

11.2

6.1

2.4

15.7

Source: Le Blanc (2015).

13.3

11.5

6.2

15.3

6.3

1.5

15.2

6.6

13.1

1.2 1.1

Health

15.1

15.5

15.8

3.3

15.9

Terrestrial ecosystem

3.9

6.4

15.4

Hunger

12.4

2.1

Poverty

3.6

2.5

2.2

7.2 Energy

3.8

Cities

3.7

11.4

Figure 1.4 Links between the SDGs through targets: an aggregated picture.

9.5

16.3

16.2

16.9

16.10

16.5

16.1

5.3

3.4

3.2

3.5

Climate Change

Water

3.1

13.2

6.5

Political inclusion

Energy

Energy access

Poverty

Access to green and public spaces

Access to housig and basic service, slums

Social protection Income growth for bottom 40

Source: Le Blanc (2015).

Health

Cities

Safe and affordable drinking water

Universal Health care

Fiscal, wages and social protection

Equal rights to resources

Eliminate gender disparities

Unpaid care and domestic work

Education

Figure 1.5 L inks among Sustainable Development Goal 10 (inequality) and other goals.

Terrestrial ecosystems

Generic resources

Migration policies

Labour rights migrant workers

Resilient infrastructure

Agricultural productivity

Monitoring global financial markets

Inequality

Developing country participation in global governance Eliminating discrimination

Growth and employment

Infrastructure and industrialization

Hunger

Women in leadership roles

End discrimination

Developing country represenation

Peaceful and inclusive societies

Gender

Water

18 Marco Segone Le Blanc (2015) suggests that because of these connections, the structure of the SDGs makes policy integration and coherence across sectors more important than ever. Targets relating to many of the thematic areas covered by the SDGs are found not only under their namesake goal (when it exists) but across a range of other goals as well. In designing, monitoring and evaluating their work, institutions concerned with a specific goal (e.g. education, health, economic growth) will have to take into account targets that refer to other goals. Due to the normative clout of the SDGs for development work, their interconnectedness may provide stronger incentives than in the past for cross-sector, integrated work (UN ECOSOC, 2016). Similarly, governments concerned with monitoring and evaluating progress towards the goals will need to look at multiple goals. The United Nations faces the same new requirement. That is why UN Secretary-General António Guterres, in his report “Repositioning the UN Development System to deliver on the 2030 Agenda: our promise for dignity, prosperity and peace on a healthy planet”, calls for the establishment of a UN-wide evaluation function focusing on strategic, cross-cutting issues related to the UN system’s support of the SDGs globally (UN, 2017a). Similarly, the new Strategic Plan 2018–2021 of the United Nations Development Programme positions the organization as a “policy integrator” (UNDP, 2017). Given the integrated nature of the SDGs and the need to look at them as a system, the global evaluation community should adopt new approaches that go beyond simply updating the widely used set of evaluation criteria established by the Development Assistance Committee of the Organisation for Economic Co-operation and Development (OECD-DAC): relevance, effectiveness, efficiency, impact and sustainability. A paradigm shift is needed to move from the linear, hierarchical and static logframe to a more complex, horizontal and dynamic approach using systems thinking. The evaluation community has to be challenged to use systems approaches to evaluate interventions, by understanding the confluence of three concepts: interrelationships, perspectives and boundaries (Reynolds and Williams, 2012).

How Can We Take Advantage of New Technologies to Address New Challenges? As discussed earlier in this chapter, tracking the progress of the SDGs requires a way of working that is faster, more collaborative and more sensitive to complexity. Technology offers real possibilities for evaluators to rise up to the challenges that SDGs present, by using faster feedback loops, collaborating on a global scale and dissecting complexity. Faster feedback loops. Technology and ICTs in particular offer the evaluation community unprecedented opportunities for more real-time, effective and efficient evaluation, even in humanitarian contexts. Satellite images of land and forests can provide finely detailed data on climate change. Drones

Evaluation and SDGs 19 or geo-referenced photos can make it possible to evaluate interventions in areas that are not accessible for security reasons. Data analysis of social media or radio broadcasts can make available billions of data points on cultural beliefs and social norms. And artificial intelligence may be able to analyse billions of data points in an incrementally reliable manner – even producing evaluation reports instead of human evaluators! Evaluators might even be able to go beyond simply disseminating evaluation messages. They could create a lasting impact among populations and policymakers through near real-life simulations using virtual reality. All of this will mean that evaluations can close the loop – from data collection to analysis to creation of knowledge and feedback into SDG monitoring – much faster. This is especially important in light of the SDGs’ ambitious timeline. Collaboration on a global scale and ease of partnerships. Just as achieving the SDGs requires comprehensive and global partnerships due to their interrelatedness, tracking progress towards their achievement requires a similar collaboration. ICTs make it much easier to facilitate such collaboration. The near ubiquity of the Internet and the advent of cloud storage and cloud-based applications means that development partners across the world can communicate, collaborate seamlessly and report on progress accordingly. The prevalence of application programme interfaces (APIs) enables datasets of various kinds, produced by various actors around the world, to be used to track numerous indicators while triangulating analysis and making it more rigorous. Dissecting complexity. Advances in computing power have made it easier and cheaper to process massive amounts of data. Advances in machine learning and neural networks, as seen in Chapter 2, have made it possible to identify in troves of data underlying trends that hitherto were not obvious or were time-consuming and expensive to detect. Machine algorithms can spot correlations among a wide variety of variables on the scale demanded by the SDGs. In addition, ICTs available today can help evaluators collect and analyse a wide variety of data. This in turn can help evaluators triangulate existing evidence and explore related evidence on other indicators. This is just the one of the ways in which evaluators can dissect complexity and put into action a “systems view” of evaluation. However, ICT4Eval comes with risks. Ethical risks, as privacy and data security cannot be violated, and use of ICT – especially via social media – can magnify biased perspectives, including hate speech and even fake news. Risks of leaving those already behind, who do not have access to mobile phones, even further behind and without voices. And risks of not having enough capacities to embrace ICT. Should we embrace ICT4Eval, to ensure ICT is used to make evaluation more real-time, effective and efficient? Or should we resist ICT4Eval, to avoid leaving the poorest and worst-off groups without voices and to avoid major ethical risks? Evaluators should neither embrace nor resist ICT4Eval. We should instead shape it, harnessing its potential while making sure that ethical standards and human rights values are not undermined but magnified.

20 Marco Segone How Can We Strengthen the Capacities of Governments, Civil Society Organizations, and Parliamentarians to Evaluate Whether Interventions Are Having Equitable Outcomes for Marginalized Populations? The key bodies responsible for implementing country-led evaluations within each country are national governments. Since national SDG reviews are voluntary, the commitment of governments is critical, particularly as they are the ones who have to decide how to divide their limited financial and technical resources among many different development priorities, which are supported by different groups of international and national stakeholders. Given the broad scope of the SDGs, almost all government agencies could be involved, with the national government playing an important coordinating role. At the national level, donor agencies, UN agencies, CSOs, academia, scientific community, advocacy groups and foundations can all contribute to the evaluation agenda. However, the interconnectedness of the SDG system means that it is vital to avoid the “silo mentality” of many MDG monitoring and evaluation activities, with each donor agency, CSO and UN agency conducting its own study, often with limited coordination, significant duplication and little comparability of data between entities. The Inter-Agency Expert Group on SDG Indicators (IAEG-SDGs) is seeking to avoid these dangers by advocating for a global indicator framework for the SDGs that would be agreed upon by all member states, with national and subnational indicators used for more localized policy interventions. CSOs, including voluntary organizations for professional evaluation (VOPES), should play an important role in the country-led evaluations at both the national and local levels (ACSC, 2016). Their contribution will be critical in ensuring a truly inclusive consultation and participatory approach. While many governments collect data on local communities and are willing to involve these communities in the data collection process, government agencies are often less willing to involve them in interpreting the findings and discussing policy implications. CSOs, particularly human rights groups, are well placed to ensure that the voices of local communities and marginalized groups are heard (ACSC, 2016). Strengthening National Evaluation Capacity for the SDGs Using country-led evaluations to inform the SDG follow-up and review mechanisms goes hand in hand with strengthening national evaluation capacities. The UN General Assembly resolution 69/237, adopted in December 2014, underlines the importance of building countries’ capacities to evaluate development activities (UN, 2015a). It calls for interaction and cooperation among all relevant partners, including UN agencies as well as national and international stakeholders, to coordinate efforts to strengthen member state

Evaluation and SDGs 21 capacities for evaluation. More importantly, the resolution emphasizes that national ownership and national priorities form a strong base for building countries’ capacities to manage and oversee evaluations. National evaluation capacity development is a complex field in which different stakeholders have different roles to play and different contributions to make. To develop national evaluation capacity in the SDG era, countries face four ongoing and interrelated challenges: developing a National Evaluation Policy (NEP), building enough individual evaluation capacity, ensuring institutions and processes are in place, and adequately engaging partners. These challenges are dynamic and affect both the supply of and demand for relevant and useful evidence that can inform national plans and policies. This complexity encourages the use of a systems approach, while recognizing that each country has its own context and realities. That means looking not only at individual actors and institutions, at different levels and across sectors but also, crucially, at the network of relationships or connections between them in each country (IIED, 2016). From such a viewpoint, it is clear that weaknesses in capacity at any level, or with any key actor, will affect the capacity of the whole system to deal with a problem in order to achieve a goal. Therefore, a country-specific system approach to national evaluation capacity development is needed, particularly when addressing evaluation capacities for country-led evaluations of the SDGs with a “no one left behind” lens. Individual and Institutional Evaluation Capacities Enabled by a Supportive Environment In the past, evaluation capacity development focused on strengthening the knowledge and skills of individuals through training programmes. Now, it is clear that capacity development should be based on a systems approach that takes into account three major levels (individual, institutional and the external enabling environment) and two components (demand and supply), and should be tailored to the specific context of each country. The enabling environment for evaluation is determined by whether there is a strong culture of learning and accountability. How much information is sought about past performance? And how much of a drive is there to continuously improve, and to be responsible or accountable for actions taken, resources spent and results achieved? Such a culture is embedded in tacit norms of behaviour, and an understanding of what can and should – or should not – be done; and in many cases, by behaviours role-modelled by leaders. An enabling environment is also supported or created through governance structures that demand independent evaluation, be it through parliaments or governing bodies. Such an environment can be further enhanced through VOPEs that set standards and strive towards greater professionalism in evaluation. Therefore, VOPEs should be supported, so that they can

22 Marco Segone foster indigenous demand for and supply of evaluation, including by the setting of national evaluation standards and norms. There are also examples of governments soliciting the advice and involvement of VOPEs, not only in formulating evaluation policies and systems, but also in implementing evaluations that are consistent with those policies. The institutional framework for evaluation ensures that a system exists that can implement and safeguard the independence, credibility and utility of evaluation within an organization. Such an institutional framework needs to • • •

•

•

•

•

include a system of peer review, or assurance that the evaluation function is set up to safeguard and implement the principles of independence, credibility and utility; establish safeguards to protect individual evaluators, evaluation managers and the heads of evaluation functions; ensure that a multidisciplinary evaluation team is in place that can guarantee the credibility of evaluation by understanding multiple dimensions of evaluation subjects and by combining the necessary technical skills; secure the independent funding of evaluations at an adequate level, to ensure that the necessary evaluations are carried out and that budget holders do not exercise inappropriate influence or control over what is evaluated and how; combine measures for impartial or purposeful selection of evaluation subjects to ensure impartiality, on the one hand, and increased utility, on the other, by making deliberate choices linked to decision-making processes; set out a system to plan, undertake and report evaluation findings in an independent, credible and useful way (to increase objectivity in the planning and conduct of evaluations, systems that increase the rigour, transparency and predictability of evaluation processes and products are needed); institute measures that increase the usefulness of evaluations, including the sharing of findings and lessons that can be applied to other subjects.

An evaluation environment is essential to support country-led evaluations of the SDGs. The UN resolution on capacity building for evaluation at the country level and the strong commitment of the evaluation community to support the follow-up and review of the SDGs are key drivers to enhance evidence-based policymaking to achieve the SDGs. At the individual level, a capacity development strategy should strengthen the ability of senior managers to plan evaluations, identify key evaluation questions, ensure the independence and credibility of evaluations, and use evaluation results effectively. It is crucial to identify and support leaders or natural champions who have the ability to influence, inspire and motivate

Evaluation and SDGs 23 others to design and implement effective evaluation systems (Mackay, 2007). Leadership is not necessarily synonymous with a position of authority; it can also be informal and can be exercised at many levels. That is why an evaluation capacity development strategy should, especially in its initial stages, identify and support national and local leaders in public administration and intergovernmental monitoring, as well as in evaluation groups and national VOPEs. It should also be linked to the national processes that focus on the country-level review of the SDGs. By giving national M&E departments or agencies responsibility for SDG follow-up and review, evaluation can become a key source of support for these national reviews. On the supply side, a capacity development strategy should enhance behavioural independence: independence of mind; integrity; and knowledge of and respect for evaluation standards, processes and products. It should also foster professional competencies through formal education, specialized training, professional conferences and meetings, on-the-job training such as joint country-led evaluations, and communities of practice and networking such as VOPEs. Fostering Demand for and Supply of Evaluation A distinction should be made between the capacity of policymakers and advisors to use evidence, and the capacity of evaluation professionals to provide sound evidence. While it may be unrealistic to expect policymakers and advisors to become experts in evaluation, it is both reasonable and necessary for such professionals to be able to understand and use the evidence produced by evaluation systems in their policies and practices (Rutter, 2012). Integrating evidence into practice is a central feature of policymaking processes, and in this case, for integrating it into the follow-up and review mechanisms of the SDGs. An increasingly necessary skill for professional policymakers and advisors is to know about the different kinds of evidence that are available, how to gain access to them and how to critically appraise evidence. Without such knowledge and understanding, it is difficult to see how a strong demand for evidence can be established and, hence, how its practical application can be enhanced. It is also important to realize that the national SDG review process is a political process, informed by evidence. The use of evidence in national SDG reviews depends not only on the capacity to provide quality and trustworthy evidence, but also on the willingness and capacity of policymakers to use that evidence. On the supply side, evaluations need to be timely, rigorous, focused, clear in their messaging and foster learning partnerships with national-level stakeholders (World Bank, 2009). Similarly, demand for evaluations may depend on a multiplicity of factors such as the policy environment (existence of policy forums, belief systems of the policymakers), skill levels of policymakers in using evidence, and political culture of the country (Witter et al., 2017).

24 Marco Segone One of the most popular examples of uptake of evaluation for a policy decision is the PROGRESA (Programa de Educacion, Salud y Alimentacion), a government conditional cash transfer programme in Mexico. The evaluation of the programme was methodologically sound, demanded from political actors within the country and was constantly followed up by the political decision makers. Also thanks to the evidence provided by the evaluation – among other factors – the programme was extended to urban areas and continued under a new incoming government. In addition, it facilitated the spread of conditional cash transfer programmes to other countries and popularized policy evaluations (World Bank, 2009). In addition, to strengthen an enabling policy environment, policymakers may need to provide incentives to encourage other policymakers and advisors to use the available evidence. These can include mechanisms to increase the “pull” for evidence, such as requiring spending bids to be supported by an analysis of the existing evidence base, as well as mechanisms to promote the use of evidence, such as integrating analytical staff at all stages of the policy implementation. CSOs, including VOPEs, can play a major role in advocating for the use of evidence in policy implementation. Think tanks, with the help of mass media, can also make evidence available to citizens, and citizens can demand that policymakers make more use of it.

Conclusion To achieve the aim of Agenda 2030 to leave no one behind, we need new evaluation approaches and technologies that take into account the complexity of the SDGs. Given this complexity and the accompanying challenges, we also need strong partnerships. Evaluation is a powerful change agent and policymakers around the world could use evaluations positively to make agenda 2030 a reality. Evaluators, commissioners of evaluation, policymakers and parliamentarians all need to be ambassadors of evaluation within their departments, organizations and countries.

Note 1 To keep the example simple, disaggregation would be done by gender. However, in a real-world situation, disaggregation should also be done by other social determinants of inequality.

References ACSC (2016), The roles of civil society in localizing the sustainable development goals, African Civil Society Circle, South Africa, www.acordinternational.org/silo/files/ the-roles-of-civil-society-in-localizing-the-sdgs.pdf.

Evaluation and SDGs 25 Bamberger, M. and M. Segone (2011), How to design and manage equity-focused evaluations, UNICEF, New York, https://mymande.org/sites/default/files/EWP5_Equity_ focused_evaluations.pdf (accessed 29 May 2018). IIED (2016), Developing national evaluation capacities in the sustainable development era: four key challenges, International Institute for Environment and Development, London, http://pubs.iied.org/pdfs/17396IIED.pdf. Le Blanc, D. (2015), “Towards Integration at Last? The Sustainable Development Goals as a Network of Targets”, DESA Working Paper No. 141, United Nations Department of Economic and Social Affairs, New York, www.un.org/esa/desa/ papers/2015/wp141_2015.pdf (accessed 29 May 2018). Mackay, K. (2007), How to build M&E systems to support better government, Independent Evaluation Group of the World Bank, Washington, DC, https://openknowledge. worldbank.org/handle/10986/6851 (accessed 29 May 2018). ODI (2018), What do analyses of voluntary national reviews for sustainable development goals tell us about “leave no one behind”? Overseas Development Institute, London. Osborne, D., A. Cutter, and F. Ullah (2015), Universal sustainable development goals: understanding the transformational challenge for developed countries, Stakeholder Forum, https://sustainabledevelopment.un.org/content/documents/1684SF_-_ SDG_Universality_Report_-_May_2015.pdf. Reynolds, M. and B. Williams (2012), “Systems Thinking and Equity-Focused Evaluations”, in Segone, M. (ed.), Evaluation for equitable development results, UNICEF, New York, www.evalpartners.org/library/evaluation-for-equitable-development- results (accessed 29 May 2018). Rutter (2012), Evidence and evaluation in policy making, Institute for Government, London, www.instituteforgovernment.org.uk/sites/default/files/publications/ evidence%20and%20evaluation%20in%20template_final_0.pdf. Segone, M. (2017), Keynote speech at EvalMena Conference, Amman, Jordan, 2017. Segone, M. and F. Tateossian (2017), “No One Left Behind – A Focus on Gender and Social Equity”, in van den Berg, R., I. Naidoo, and S. Tamondong (eds.), Evaluation for Agenda 2030, UN Development Programme and International Development Evaluation Association, Exeter, United Kingdom, http://web.undp. org/evaluation/documents/Books/Evaluation_for_Agenda_2030.pdf (accessed 29 May 2018). UN (2015a), “Building Capacity for the Evaluation of Development Activities at the Country Level: Resolution Adopted by the General Assembly on 19 December 2014”, A/RES/69/237, United Nations, New York. UN (2015b), “Transforming Our World: The 2030 Agenda for Sustainable Development: Resolution Adopted by the General Assembly on 25 September 2015”, A/ RES/70/1, United Nations, New York. UN (2017a), “Repositioning the UN Development System to Deliver on the 2030 Agenda: Our Promise for Dignity, Prosperity and Peace on a Healthy Planet – Report of the Secretary-General”, United Nations, New York, http://undocs. org/A/72/684 (accessed 29 May 2018). UN (2017b), “Leaving No One Behind: The United Nations System Shared Framework for Action”, United Nations System Chief Executives Board for Coordination, New York, www.unsceb.org/CEBPublicFiles/CEB%20equality%20frameworkA4-web-rev3.pdf.

26 Marco Segone UN Women Independent Evaluation Office (2015), How to manage gender-responsive evaluation: evaluation handbook, UN Women, New York, https://genderevaluation. unwomen.org/en/evaluation-handbook (accessed 29 May 2018). UNDESA (2018), Handbook for the preparation of voluntary national r eviews, the 2019 Edition, United Nations Department of Economic and S ocial Affairs, New York, https://sustainabledevelopment.un.org/content/documents/20872VNR_ hanbook_2019_Edition_v2.pdf. UNDG (2016), The sustainable development goals are coming to life, United Nations Development Group, New York, https://undg.org/wp-content/uploads/2016/12/ SDGs-are-Coming-to-Life-UNDG-1.pdf. UN ECOSOC (2016), Breaking the silos: cross sectoral partnerships for advancing the sustainable development goals, United Nations Economic and Social Council, New York www.un.org/ecosoc/sites/www.un.org.ecosoc/files/files/en/2016doc/ partnership-forum-issue-note1.pdf. UNDP (2017), UNDP strategic plan 2018–2021, United Nations Development Programme, New York, http://undocs.org/DP/2017/38 (accessed 29 May 2018). UNEG (2014), Integrating human rights and gender equality in evaluations, United Nations Evaluation Group, New York, www.unevaluation.org/document/download/ 2107 (accessed 29 May 2018). Witter, S. et al. (2017), “Generating Demand for and Use of Evaluation Evidence in Government Health Ministries”, Health Research Policy and Systems, Vol. 15/86, Springer Nature, London, https://pdfs.semanticscholar.org/98cd/e586e450810b41 33a20c3239e6f0202930a9.pdf. World Bank (2009), Making smart policy: using impact evaluation for policy making, Thematic Group on Poverty Analysis, Monitoring and Impact Evaluation, World Bank, Washington DC, http://siteresources.worldbank.org/INTISPMA/ Resources/383704-1146752240884/Doing_ie_series_14.pdf.

2 Information and Communication Technologies for Evaluation (ICT4Eval) Theory and Practice Oscar A. García, Jyrki Pulkkinen, Prashanth Kotturi et al. The conference on ICT for Evaluation, organized by International Fund for Agricultural Development (IFAD) in 2017, aimed to identify ways in which technology could bridge the gaps in data for Sustainable Development Goals (SDGs). The conference had four main topics: data collection, data analysis, data dissemination and cross-cutting issues. Such a classification was conceived as a way to visualize the step-by-step process that evaluators use to handle data and to explore how technology could make the process more efficient and more rigorous. One of the chief lessons of the conference was that technology is increasingly integrating the three processes of collection, analysis and dissemination, and making such integration more seamless.

Data Collection: Faster, Cheaper, More Accurate Evaluations have historically depended on document reviews, observations and interviews during field visits for qualitative data. In some cases, surveys are also conducted to collect quantitative data. In recent years, new tools and sources for data collection have emerged that build on the greater availability of three main technological advances: remote sensing systems, wireless technology, and cloud storage and computing. Some of these advances are increasing the automation and integration of data collection and analysis (Kipf et al., 2015), thus making them more accurate, faster and less expensive. Many organizations have been using such data collection tools for years, including the specialized agencies of the United Nations, multilateral development banks, international foundations, academics and the private sector. Remote sensing: The field of remote sensing is undergoing a rapid change with the introduction of ever more accurate and high-resolution sensors at low cost or for free (Rocchini et al., 2017). Landsat system of the US National Aeronautics and Space Administration (NASA) has historically been the biggest provider of remote sensing imagery across the globe, to a resolution of 30 metres. Improvements in sensors of the European Space

28 Oscar A. García et al. Agency’s Sentinel constellation of satellites promise to provide multispectral images down up to a 10-metre resolution with a temporal frequency of one week (Harvey et al., 2018). The open data movement has compelled these taxpayer-funded initiatives to be made public, so the imagery is now publicly available. The rapid advances in drones have made it much easier to mount sensors for more specific purposes. Today, drones are being used in the remotest and most dangerous of environments, including humanitarian and post-disaster contexts such as in Haiti and in the Philippines after Typhoon Haiyan (UN OCHA, 2014). Advances in remote sensing have allowed evaluators to collect data at the global level on indicators that were hitherto nearly impossible or very costly to capture. The convenience of measuring forest cover or vegetation profile in a given area while sitting at one’s desk could hardly have been imagined until recently. Remote sensing has added another source of data to triangulate findings from traditional sources of data collection in evaluations (see Case Study 1). Wireless devices and communication have proliferated in daily life in the past decade. In some of the remotest and most underdeveloped parts of the world, wireless communication has leapfrogged an entire generation of fixed-line devices and other modes of communication. Wireless devices have spawned an entire ecosystem of applications that are indirectly affecting numerous fields, giving rise, for example, to the sharing economy. In the development sector, wireless devices are increasingly being used to collect monitoring and evaluation data (World Bank, 2013). The existence of the open-source Open Data Kit ecosystem means that designing or even just using existing data collection applications has become cost-effective and easy. Applications being used to collect monitoring and evaluation data from the field – both free and proprietary – piggyback on the near ubiquitous presence of mobile networks and widespread Internet connectivity (World Bank, 2013). The combination of wireless devices and cloud hosting means that data collection can be seamlessly integrated with basic data analysis and visualization. Cloud computing refers to the delivery of computing services – servers, storage, databases, networking, software, analytics and more – over the Internet (“the cloud”), thus enabling shared access to resources that have hitherto been costly or unavailable due to geographical constraints (Microsoft Azure, 2018). Access to data storage and analytical tools in a shared and distributed manner enables even small organizations to operate effortlessly across geographical and functional areas and focus on core businesses (Misra and Mondal, 2011). Cloud computing has been the bedrock in advancements in machine learning and data analytics by enabling organizations to avoid heavy upfront capital expenditure on computing resources. Cloud computing has also played a key role in ensuring real-time integration of data collection, analysis and reporting (Lin and Chen, 2012).

ICTs for Evaluation 29

Data Analysis: The Machine Learning Revolution Evaluators have used a wide range of tools to analyse qualitative and quantitative data, from Excel-based quantitative analysis to qualitative comparative analysis (QCA). Software tools such as ATLAS.ti and NVivo, among others, have digitized qualitative analysis. With the emergence of machine learning, the paradigm of qualitative analysis is rapidly undergoing another change. In the realm of qualitative data analysis, the initial set of tools focused on word count (text analysis), then relationships between concepts, and finally understanding grammar. In recent years, computer-assisted data analysis has seen a surge in efficacy and potential. Leaps in computing power, distributed computing and complex algorithms have enabled applications that incorporate machine learning and natural language processing to interpret vast amounts of qualitative data from unlimited sources in different formats (Evers, 2018). They also have the potential to automate a part of the analysis and increase the efficiency of the analysis (see Case Study 4). Machine learning is also having a profound impact on the way quantitative data is analysed (Boelaert and Ollion, 2018). New computer language environments such as R and Python have increased the scope and speed of quantitative analysis that can be carried out on an ordinary desktop computer (Mullainathan and Spiess, 2017), and the availability of large amounts of data to train machine learning algorithms has fuelled the growth of predictive modelling with an unprecedented level of rigour (see Case Study 5). The ability to train the algorithms on data derived from a wide variety of sources, including the Internet, remote sensing and primary surveys, has made machine learning a tool that can be implemented in a wide variety of contexts. The emergence of powerful yet accessible programmes such as Python and the availability of application program interfaces (APIs) mean that quantitative data from any source can be used in combination with other sources to build predictive models.

Dissemination and Learning: Reaching a Global Audience The achievement of the SDGs will mainly take place at the country level. Evaluations can contribute to improving the quality of decision-making (evidence-based policymaking) by providing timely feedback and rigorous evidence on effectiveness of development programmes. In doing so, the approach to the dissemination of evaluation findings, conclusions and recommendations needs to be adjusted so that particular experiences under evaluation can serve as lessons learned more broadly. For evaluators, dissemination and learning tend to raise two major questions: how to disseminate the results of evaluation more broadly? who should the results be disseminated to? Historically, dissemination has meant sharing evaluation reports through websites and mailing groups, but evaluators can no longer be restricted to this paradigm alone.

30 Oscar A. García et al. The advent of social media, the blogosphere, the Internet and the network effects of such media have meant that a few billion people are now on a common platform. This provides vast potential for evaluators to communicate with a wide range of stakeholders, including policymakers, national institutions and donors. Social media also requires a fresh communication outlook because of the need for short and impactful messages, and because it is overloaded with information from myriad sources (Holton and Chyi, 2012). The emergence of virtual reality is also changing the way in which audiences are engaged and influenced. Virtual reality lets users transcend physical and temporal barriers to experience rather than read stories. The United Nations system has only just started experimenting on this front, with the recent production of the virtual reality film Clouds Over Sidra. In terms of the audiences to whom they should be disseminating their findings, evaluators interact with numerous stakeholders and audiences. Each audience may require a different means of communication. One of the most difficult tasks in the process of dissemination is communicating with the public at large, or what in the development jargon is conceived as end beneficiaries. This inherent difficulty has led to criticism of evaluation as an extractive process that fails to inform beneficiaries about the results of the data taken from them. However, proliferation of technology holds the promise of solving this persistent problem (see Case Study 7). The best way to convey the range of possibilities that emerging technologies offer to evaluators is to look at concrete examples of some frontrunners who have capitalized on the existing and new generation of Information And Communication Technology (ICT) tools for their evaluation work. These range from tools as ubiquitous as Skype and Google Maps to cutting- edge remote sensing algorithms. This chapter offers seven examples from institutions, ranging from international financial institutions to private-sector firms and international non-governmental organizations.

Case Study 1: Geospatial Analysis in Environmental Evaluation By Juha I. Uitto, Anupam Anand and Geeta Batra, Independent Evaluation Office, Global Environment Facility Geospatial analysis is a method of combining, visualizing, mapping and analysing various types of geographically referenced data about socio-e conomic and physical attributes (Bailey and Gatrell, 1995). It is based on a spatial model that attempts to emulate reality, in which the user views the real world through the medium of a database representing different data categories, also referred to as data layers or themes (Figure 2.1). By integrating data from multiple

ICTs for Evaluation 31

Figure 2.1 A spatial model represents the real world as a combination of layers or themes. Source: GEF IEO, 2016a.

sources, filtering and applying statistical methods, geospatial analysis enables the examination of relationships, trends, patterns and influences that may not be visible in maps generated from a single data source. Geospatial methods, in particular the application of satellite remote sensing, have been used in the monitoring and assessment of environmental processes for the past four decades, based on their ability to provide synoptic, time series data for various earth system processes (Awange and Kyalo Kiema, 2013; Melesse et al., 2007; Spitzer, 1986). However, their application in the field of evaluation to assess environmental programmes has gained traction only in the last two decades. In early applications in evaluation, geographic information system (GIS) was mainly used for visualization. Renger et al. (2002) describe how GIS data and analysis can be used for visualization, change detection and in conjunction with other evaluation data. Evaluators have also discussed the utility of GIS as a data source, for measuring baselines and programme outcomes and for observing the results of interventions over time (Azzam, 2013; Azzam and Robinson, 2013). Quasi- experimental designs leveraging geospatial data have been used in impact evaluations in forestry and biodiversity interventions (Andam et al., 2008; Buchanan et al., 2014; Ferraro and Pattanayak, 2006). Recently, geospatial analysis has also been used in randomized control trials (Jayachandran et al., 2017).

32 Oscar A. García et al. At the Independent Evaluation Office of the Global Environment Facility (GEF), geospatial methods are increasingly being used to complement other methods to answer evaluation questions about the relevance, efficiency and effectiveness of GEF interventions. In addition to presenting trends in environmental outcomes, these methods help to identify the drivers of environmental degradation, and provide deeper insights about conditions and factors that influence the outcomes. At the Global Environment Facility Independent Evaluation Office (GEF IEO), geospatial methods have been applied in three thematic areas: biodiversity, land degradation and international waters. The three cases presented below illustrate how geospatial approaches can help to overcome the methodological challenges in traditional evaluation methods, such as lack of baseline data, sampling bias, difficulties in selecting appropriate counterfactuals, and accounting for multiple scales and contexts. 1. Impact Evaluation of GEF Support to Protected Areas and Protected Area Systems In the Impact Evaluation of GEF Support to Protected Areas and Protected Area Systems, geospatial analysis was applied to assess the relevance and impact of GEF interventions at global, country and site levels (GEF IEO, 2016a). A spatial overlay analysis was carried out to assess the biodiversity significance of GEF-supported protected areas (PAs). The analysis showed that 58% were located in Key Biodiversity areas, while 31% met other conservation designations such as Ramsar or UN heritage site, and 11% were important from a national perspective (IUCN, 2016) (Figure 2.2). At the global level, the impact evaluation made use of billions of observations from satellite data to compare PAs with the adjacent non-protected areas and compare forest loss in PAs with that in non- protected buffers. The PAs experienced less forest loss than their surrounding 10 and 25 km buffer areas. Forest change analysis for 37,000 PAs in 147 countries was conducted for the impact evaluation, using the most accurate global dataset derived from satellite data analysis (Figure 2.3). A more robust quasi- experimental research design used geospatial data to perform a propensity matching analysis and compared GEF-supported PAs with PAs that didn’t receive GEF support. Mexico was chosen for this case study, as it had been receiving sustained GEF support for nearly 25 years. The results showed that from 2001 to 2012, GEF-supported PAs in Mexico avoided up to 23% more forest loss than PAs that did not directly receive GEF funding.

South Pacfic

North America

South America

North Atlantic

South Atlantic

Antartica

Africa

Europe

Indian Ocean

Asia

Figure 2.2 Globally distributed GEF-supported PAs overlaid with sites of conservation importance.

Biodiversity Hotspots

Important Bird Areas

GEF Supported PAs Areas of Zero Extinction Key Biodiversity Areas

North Pacfic

Arctic Ocean

Australia

Arctic Ocean

10km

%Forest (2000)

PA-10km

%Gain (2000-2012)

PA

%Loss (2000-2012)

PA-25km

25km

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Low:0

High:100

Percent Tree Cover (%)

2

3

5

6

7

8

9 Year(1:2000-2001,...,12:2011-2012)

4

PA-10km

10

d) Yearly Percent of Forest Loss (2000 - 2012)

1

PA

PA-25km

11

12

PA - 25km(excluding the inner)

b) Percent Tree Cover (2000)

PA - 10km

Figure 2.3 Forest change data used for global analysis, an example of a PA in Mexico.

C) Decadal Forest Cover, Gain and Loss (2000 - 2012)

0

10

30 % 20

40

50

60

70

80

a) Cumbres de Monterrey, Mexico

Cumbres de moterrey

PA

Percent Forest Loss (%)

ICTs for Evaluation 35 2. Evaluation of GEF Support to Land Degradation Interventions In an evaluation study of GEF interventions in land degradation, geospatial methods were used to measure environmental changes in three biophysical indicators; assess the factors associated with the outcomes; and estimate the co-benefits in terms of one ecosystem service, carbon sequestration (GEF IEO, 2016b). The results showed that GEF interventions had reduced forest loss and landscape fragmentation, and increased vegetation productivity. Factors such as access to electricity, initial conditions and project duration influenced these outcomes. Furthermore, using a value transfer approach, the study assessed the amount of carbon sequestered and estimated that the carbon benefit per dollar invested was $1.08 (Figure 2.4).1 3. E valuation of GEF Support to International Waters Focal Area Geospatial methods were also used to measure the impact of GEF’s long-term engagement in the Lake Victoria region (GEF IEO, 2016c). The GEF has supported environmentally sound management of the Lake Victoria ecosystem through its International Waters Focal Area since 1996. The overall objective of the GEF projects was to address major threats facing the lake ecosystem, particularly by lessening the nutrient load and clearing the water hyacinth on site. Remote sensing methods were used to observe changes in hyacinth invasion (Figure 2.5). They showed that the overall vegetation in Lake Victoria entered a declining phase in 2008, as measured by the Normalized Difference Vegetation Index (NDVI). At the end of 2016, levels of vegetation productivity had fallen by about 25% from a peak in 2007–2008. Lessons Learned in Applying Geospatial Approaches Geospatial tools have enhanced the GEF evaluation team’s ability to undertake analysis efficiently and cost-effectively at project, national, global and ecologically meaningful scales. Big data from satellites revealed patterns that had not initially been apparent. The results were reproducible and generated an objective evidence base, while complementing other methods. Finally, geospatial tools are also being used for disseminating evaluation results through both static and interactive dynamic maps and visualizations. With the increase in type, volume and availability of data, geospatial tools have clearly created new opportunities to capture the essential variables of environment and development interventions and analyse their impacts. To use these tools effectively, however, technical skills, high computing capacity and multidisciplinary

Source: GEF IEO, 2016b.

Figure 2.4 Economic valuation of carbon sequestrated at each GEF-supported project site.

-0.100

-0.075

-0.050

-0.025

0.000

2003

2007 2009

GEF ID 88

2011

GEF ID 2405

2013

GEF ID 3399

2015

2016

Water

Year

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2005

1981 1982 1983 1984 1985 1986 1987 1988 1989

2000

Vegetation

Figure 2.5 G eospatial analysis based on satellite data shows decreasing vegetation at the mouth of the Kagera River, Uganda.

NDVI

38 Oscar A. García et al. expertise are needed to analyse and interpret data and results. Accuracy and reliability of contextual variables often vary widely across countries and sites, so geospatial data needs to be complemented by field verification and other appropriate methods to answer the “how” and “why” questions. Some of the data and infrastructure constraints may be eased by collaborating with institutions that have access to big data and infrastructure for using it. GEF’s evaluation team has formed partnerships with NASA, the University of Maryland and AidData. To enhance the usefulness and efficiency of its evaluations, the team is continuously exploring innovative technologies such blockchain, artificial intelligence, deep learning, the Internet of things and computational social science, alongside traditional evaluation methods. These cases also demonstrate the advantages in leveraging state- of-the-art datasets, methodologies and computing platforms in evaluating environmental interventions. In the PA evaluation using forest cover data (Hansen et al., 2013), GEF IEO conducted the analysis on Google’s cloud computing platform, which reduced the amount of processing time significantly. In the land degradation evaluation, the team applied machine learning algorithms driven by geospatial data and econometric analysis, which made it possible to work with the high volume of data and provided insights into the factors associated with the outcomes. In evaluating the GEF interventions around Lake Victoria, the evaluation team used decades of dense time series data to evaluate the outcomes and sustainability of a series of interventions. Conclusion The role of geospatial science is increasingly being recognized by international organizations and major environmental conventions as countries move towards more evidence-based policy decisions and practice. The United Nations Convention to Combat Desertification (UNCCD) has endorsed the use of indicators obtained from remote sensing to monitor progress towards reversing and halting the degradation and desertification of land (Minelli et al., 2017). The United Nations Framework Convention on Climate Change (UNFCCC) and the Convention on Biological Diversity (CBD) also endorse the use of objective indicators, many of which are derived through geospatial methods (Karl et al., 2010; Scholes et al., 2008; Stephenson et al., 2015). Geospatial data and analysis offer an efficient and complementary approach to monitoring and evaluating environmental outcomes.

ICTs for Evaluation 39

Case Study 2: Simulated Field Visits in Fragile and Conflict Environments: Reaching the Most Insecure Areas of Somalia Virtually By Monica Zikusooka, Save the Children Two decades after the central government in Somalia collapsed, several parts of the country are still consistently viewed as among the world’s most dangerous environments for aid workers, despite efforts to support stabilization. Against the backdrop of a volatile security situation is a chronic humanitarian crisis in which food prices, livestock survival, and water and food availability are constantly under stress from drought and armed conflict. Between October 2010 and April 2012, Somalia was at the heart of a drought crisis in the Horn of Africa, which affected 13 million people, causing an estimated 258,000 deaths (Checchi and Robinson, 2013). In 2017, Somalia was on the edge of another famine following severe drought conditions, rising prices and access limitations. In January 2018, the Food Security and Nutrition Analysis Unit for Somalia reported that while the risk of famine was declining, 2.7 million people still faced crisis and emergency, with about 301,000 children acutely malnourished (FSNAU, 2018). According to the UN Office for the Coordination of Humanitarian Affairs, Somalia has received over US$600 million every year since 2012 in response to its humanitarian and development needs. However, it is difficult to tell how effective this assistance has been because monitoring and evaluation systems are poor and unevenly implemented as a result of the fragile, conflict-r idden and violent setting in the country. Save the Children endeavours to implement adaptive monitoring and evaluation systems in different parts of Somalia to enable learning and accountability. Overall, Save the Children has longstanding programmes to prevent and treat acute malnutrition, provide health care, improve water and sanitation, and enhance household food security and livelihood options for communities. Due to the very high rates of acute malnutrition, Save the Children runs Community-based Management of Acute Malnutrition (CMAM) programmes across the country. CMAM programmes include Outpatient Treatment Programme (OTP) centres where acutely malnourished children without medical complications are treated as outpatients. Children with complications are referred to the nearest health centre for more advanced care and management. OTPs are run by trained Somali national staff with routine supervision from senior Somali national programme managers. International nutrition technical advisors conduct regular support visits where security allows. While all Save the Children staff have direct access to programme areas in Puntland and Somaliland, access to Central and South

40 Oscar A. García et al. Somalia is limited, especially for non-Somali staff, and in some cases even for Somali staff who are not from that specific implementation area and ethnic group. This makes it difficult to monitor and evaluate programmes effectively. At the same time, if technical staff are unable to make field support visits, this can lower the motivation of local staff and deny them opportunities for learning. Given that 65% of official development assistance (OECD, 2018) is invested in fragile contexts, most of which are conflict-r idden, the World Bank underscores the need for innovative methods for monitoring and evaluating results (The World Bank, 2013). In this regard, it is critical that chosen options are based on a good understanding of the state of conflict and fragility, that they are diverse and allow triangulation of findings, that they follow “do no harm” principle, that they are gender-sensitive, that they are aware of local capacity for data collection and analysis, and that they follow ethical procedures (World Bank, 2013). Ranging from collecting routine programme activity data to harnessing big data to monitor and measures results, technology offers unprecedented opportunities in fragile and conflict-affected settings. Save the Children has explored the use of technology in remotely monitoring and supporting field teams. The result is known as a simulated field visit (SFV). The SFV was developed to enable monitoring of nutrition programmes, including assessing programme performance against established standards; to identify bottlenecks and challenges; and to maintain links with field teams to foster motivation and capacity development. The first SFV was conducted in 2013 as part of a review of a nutrition programme in Puntland and Hiran by headquarters and regional staff before a donor audit. The review of the Puntland programme was conducted without much difficulty of access, but insecurity prevented direct access to Hiran. This led to the development of a method to remotely review the Hiran programme that later became the SFV. The method was standardized and applied in three main steps: (1) documentation: field teams follow specific guidance on the documents to scan and photographs to take; (2) technical and joint review of documents: submitted documents are reviewed against quality standards2 and outcomes jointly discussed with field teams via Skype, with a virtual interface enabled by screen sharing; and (3) feedback and action planning: the review team prepares a brief report that includes findings and actions agreed upon for the field team to implement. Subsequent SFVs compare their findings with previous visits to track improvements and address recurring issues. After several SFVs, it was observed that the process had helped to improve programme quality, including adherence to admission and

ICTs for Evaluation 41 treatment protocols, completeness in the service package (especially inclusion of measles vaccination) and documentation of discharge details. The process was appreciated by field teams in Hiran and technical staff based in Nairobi, as it allowed collaboration on improving programme quality despite the limitation in face-to-face interaction in the field. The SFV applies principles of monitoring and evaluation in fragile and conflict-affected settings, adaptation to conflict context, triangulation of findings and sensitivity to local capacity, and has a lot of room for gender mainstreaming. The SFV is by no means a stand-alone, one-size-fits-all monitoring and evaluation approach, but rather one of the options that could be used where external access to programmes is limited. While the SFV process has its strengths, it is time-consuming, as field staff need to do a lot of preparation. The review team has to set aside time to conduct the visit, just as they would for an actual field visit that includes travel, but this was often difficult to achieve. Documents that are used in SFVs are provided by field teams, which could introduce selection bias, making it possible to create a false impression of the programme. While there is an effort to mitigate this challenge by taking Global Positioning System (GPS) coordinates of programme locations, this in itself poses security risks to the local population if digital data is intercepted by third parties (Dette, Steets and Sagmeister, 2016). Despite these challenges, it was felt that the effort involved in carrying out a SFV would stimulate a better understanding of quality expectations and translate into programme improvements. Where Internet connections are strong, this can be a participatory approach, which aids acceptance of the final recommendations. Where there isn’t a strong Internet connection, the process is still possible, but documents and photos need to be physically transferred to the review team, so additional time must be factored into the process. Save the Children predominantly used the SFV method between 2013 and 2016, when access to several field sites in Somalia was very limited. Access to many sites has now improved, allowing technical specialists at different levels to visit. The use of technology in monitoring has been further improved, however, with adoption of KoBo Collect (for data collection) and Tableau. These platforms allow collection and visualization of programme performance data, site locations and pictures of work in progress. Programme teams in the field collect data using tablets or mobile phones and upload it to a platform where it is processed in real time and visually presented to support decision-making at different levels. Technology has added significant value in monitoring and evaluation of Save the Children’s programmes in Somalia by enabling verification of the existence of programmes,

42 Oscar A. García et al. access to real-time data that supports decision-making at all levels, and improvement in data quality and integrity as this can be ensured remotely. Technology has also improved cost-effectiveness in data collection and analysis, facilitated learning across sites/locations as performance data is pooled into centralized dash boards, and improved engagement with field teams in remote locations. These benefits are similar to what has been documented in development and humanitarian programmes similar to Save the Children’s (Corlazzoli, 2014). The SFV with enhanced technology for data management is a practical approach for monitoring and evaluating programmes in places that are difficult to access, especially in fragile and conflict-affected environments.

Case Study 3: Analysing Stories of Change: Engaging Beneficiaries to Make Sense of Data By Michael Carbon and Hamdi Ahmedou, Independent Office of Evaluation, International Fund for Agricultural Development An evaluation is meant to be a two-way process between the evaluator and the evaluand. In the development sector, the evaluand is typically the end beneficiary of the programme intervention. The nature of engagement between evaluator and evaluand can be of four kinds: (a) one-way feedback to beneficiaries; (b) one-way feedback from beneficiaries; (c) two-way feedback with interactive conversation between beneficiaries and evaluators, but with the evaluation team retaining independence and power; and (d) two-way feedback through participatory evaluation, with beneficiaries as part of the evaluation team (DFID, 2015). Evaluation is often practised in various ways, such as self-assessment, stakeholder evaluation, internal evaluation and joint evaluation. In addition, it can include individual storytelling, participatory social mapping, causal linkage and trend and change diagramming, scoring, and brainstorming on programme strengths and weaknesses (Sette, 2018). Some think that beneficiary participation in evaluation, especially in assessment of results, helps in dealing with complexity, a theme underlined in Chapter 1, which is inherent in development projects (Casella et al., 2014). The value of seeking feedback from beneficiaries is accepted by all and reflected, albeit not explicitly, in wider strategies to improve downward accountability, participation and community evaluation of aid projects (DFID, 2015). As part of the Country Strategy and

ICTs for Evaluation 43 Programme Evaluation (CSPE) in Cameroon, the Independent Office of Evaluation of IFAD piloted a participatory narrative-based approach to help identify causal linkages between the support provided by projects and changes in living conditions in the beneficiary households. The study also aimed at providing a deeper understanding of contextual factors affecting change and identifying any unexpected project effects. The study targeted two value chain projects in Cameroon financed by IFAD between 2004 and 2017.3 Methodology. The study utilized the SenseMaker® methodology and software suite.4 This methodology is rooted in storytelling and participatory research. It consists of eliciting stories (narratives) from a large number of stakeholders and giving them (beneficiaries) a determinant role in analysing and giving sense to the stories. Narrative-based interviewing techniques, as opposed to direct questioning, are particularly appropriate when exploring complex, qualitative topics in an evolving context, and when it is difficult to determine exactly what sorts of questions might lead to useful answers. Interviewing through stories is often more natural and tends to provide more contextual and unexpected information. Also, a well- constructed story elicitation results in fewer refusals or false responses compared to direct questioning.5 Instead of being interpreted and analysed by the evaluation research team, the stories are interpreted by the storytellers through a self-signification questionnaire. This lends the categorization and analysis of the stories greater legitimacy by reducing the bias associated with an external researcher’s interpretation of the data. Because a large number of stories (a few hundred up to a few thousand) are collected and self-signified, it is possible to conduct quantitative analysis of recurrent themes, perspectives and feelings narrated in the stories. By combining elements of qualitative and quantitative research, the approach helps to make sense of complex and evolving realities. The study in Cameroon focused on members of rice, onion and cassava producers’ organizations (POs) and interviewed 590 individuals. Respondents were first invited to answer an opening question, which was the same for everyone in the POs and was designed to elicit a significant experience in their lives, indirectly related to the assistance received from the IFAD-funded projects: Since you have become a member of the PO, can you tell us about an important positive or negative change related to the production, processing or selling of your crop (onion, rice or cassava) and how this has affected you and your family? Please describe what happened. Once the interviewer had recorded the story, the respondent was asked to interpret the story through a predefined self-signification

44 Oscar A. García et al. questionnaire. This questionnaire was developed on the basis of the project’s theory of change, which was reconstructed by the evaluation team based on project documentation. The theory of change showed the hypothetical causal linkages between support provided by the projects to POs, services provided by POs to their members, and changes measured in key IFAD impact domains at the household level (income and assets, agricultural productivity and food security). The questionnaire had different sections: basic data about the PO member (name of the PO, gender, age), evolution of PO service provision (relevance, quality and inclusiveness), changes in crop production and marketing, and perceived impact on the household. The questionnaire contained a mix of different questions called “signifiers”: multiple-choice questions, dyads, triads and “stone” diagrams. These allowed the respondents to place their story in predefined categories and to position the narrated events and changes in tension fields between multiple options. The stories and responses to the self-signification questionnaire were captured directly on tablets using the SenseMaker® Collector application. Software and participatory analysis. The responses were analysed using the SenseMaker® Explorer software, which produces quantitative data and uncovers the trends embedded in the stories by positioning the stories on specially designed charts. The software analysis raised several additional questions which were followed up during the main evaluation mission through four participatory workshops with a sample of surveyed PO members. Topics that were further explored during the workshop were, among others, cost implications of the adoption of new agricultural practices promoted by the project, use of crop income, factors affecting area extension and yield increases, and risk mitigation strategies. Those topics were explored separately for different gender and age groups. Main strengths of the methodology. The methodology’s participatory approach has made it possible to engage the beneficiaries in the evaluation by gathering their opinions and perceptions of the various forms of support provided by their POs and, indirectly through their POs, by the project. The 590 stories provided a solid qualitative base of evidence, while software analysis and graphic representations added a quantitative dimension. The collection of stories and signification responses on tablets, directly in the SenseMaker® Collector application, was easy and practical, and it allowed for real-time monitoring of data collection progress. The data analysis tools embedded in the companion SenseMaker® Explorer software made it very easy to identify trends and outliers in the stories and allowed for a much more targeted and efficient reading of the stories.

ICTs for Evaluation 45 The stories were used to confirm key impact pathways of the projects’ reconstructed theory of change, and helped identify contextual constraints and bottlenecks affecting the respondents’ crop-g rowing activities that were not reflected in the theory of change. The participatory approach was not limited to eliciting a perspective from the respondents but included their effective involvement in interpreting the data through a questionnaire to “give meaning” to their experience. The participatory workshops proved critical to clarify several issues arising from the data analysis and added another level of self- interpretation by project beneficiaries. Main lessons learned. The use of SenseMaker® in the Cameroon CSPE was a pilot experiment that generated several lessons: •

•

•

The SenseMaker® approach is complementary to other data collection and analysis tools. The approach was appropriate to capture a large and varied set of qualitative perspectives, feelings and context, but was less useful to collect hard facts and quantitative evidence. It was therefore important to use the SenseMaker® study within a broader mix of complementary methods including in-depth desk review, key informant interviews and direct field observations. Besides, the study itself was designed and used with the support of information gathered and analysed through other methods.6 Put sufficient effort in reconstructing and validating the intervention’s theory of change. The opening question to elicit the story and the signifiers were designed on the basis of a draft reconstructed theory of change of the project. This draft theory of change was reconstructed on the basis of desk review, early on in the evaluation process, and contained several flaws. These flaws were carried through in the signifiers. For instance, some planned project support was not delivered and certain PO services were not yet provided. Signifiers related to these services were therefore irrelevant. On the other hand, some impact pathways were missing in the draft theory of change and the signifiers related to those pathways were therefore not included in the questionnaire. This limited the analysis that could be done concerning those impact pathways. While the SenseMaker® study certainly proved useful to identify those flaws in the reconstructed theory of change, more effort could have been put in getting it right as much as possible before the study, for example by a more careful desk review of project reports and knowledge management products, and by sharing and discussing the draft theory of change with project stakeholders. Keep the self-signification questionnaire short and simple. The SenseMaker® method is demanding for interviewers and interviewees

46 Oscar A. García et al.

•

alike. Responding to triads, dyads and “stone diagrams” requires more reflexion and time than a typical closed questionnaire. In our Cameroon study, the self-signification questionnaire included eight triads, five dyads and three stones, plus a number of multiple-choice questions. Each interview lasted 40–60 minutes, depending on the respondent’s level of understanding; this duration was challenging for some and might have affected the reliability of responses towards the end of the questionnaire. Interviewers also expressed doubts about the respondents’ level of understanding of the more complex signifiers, the triads and stone diagrams in particular, which might have led to misleading responses. It would therefore have been wiser to replace the relevant triad and stone questions as much as possible by multiple choice or dyad signifiers. Test not only the questionnaire but also data analysis. The interview team trialled the questionnaire in one cassava PO. They collected a story and self-signification responses from 18 producers. This was sufficient to identify and fix language and interpretation issues in the questionnaire, but not to get a feel of response trends on the different signifiers. It later emerged during data analysis that some dyad signifiers encouraged respondents to position their response at the extremes, while some triad and stone questions received mostly neutral responses, suggesting that respondents might have misunderstood them or been fatigued by the signifiers. Also, several signifiers were duplicative, not providing any additional insights. A larger number of stories should have been collected on a trial basis – perhaps around 50 – to conduct a preliminary analysis and detect possible misinterpretations and less useful or duplicative signifiers.

In conclusion, like any evaluation data collection and analysis method, SenseMaker® has its strengths and limits when used for evaluation. If well designed and well used within a broader set of mixed methods, its participatory narrative-based approach helps to make sense of important qualitative aspects within complex, evolving realities, where survey methods based on direct questioning would be less appropriate. Furthermore, the approach gives a voice to project beneficiaries by letting them narrate what they deem is important and by involving them effectively in analysing their own experiences, first through self- signification and later through group discussions and analysis. This is likely to promote learning through self-reflection and to increase the evaluation’s legitimacy in the eyes of project beneficiaries. Given the success with the approach, it is being replicated in an impact evaluation in Niger.

ICTs for Evaluation 47

Case Study 4: Is There a Role for Machine Learning in the Systematic Review of Evidence? By Edoardo Masset, International Initiative for Impact Evaluation The amount of evidence on the effectiveness of development interventions is increasing, but in order for it to be useful for policymakers it needs to be summarized and synthesized. This is normally conducted through systematic reviews of evidence. Following the lead by the Cochrane Collaboration in medical research, where syntheses of evidence are produced to inform public health policies, the Campbell Collaboration, 3ie and other organizations have recently promoted the production and use of systematic reviews in international development. Systematic reviews are produced by large teams of researchers who scan all the available literature on a specific topic and filter the evidence through a process of search and selection, with the goal of summarizing the existing reliable and relevant evidence on the effectiveness of a specific intervention. Systematic reviews are currently conducted by researchers, often research assistants, who use databases to retrieve large numbers of published and unpublished studies, select the evidence relevant to the topic of the systematic review, appraise the quality of the evidence and finally summarize it. This is tedious and mechanical work that takes several months to complete. A standard systematic review takes 12–24 months and requires a lot of low-skill effort. Not only is the process taxing, but reviews take so long time to produce results that they often become outdated soon after being completed, and consequently fail to inform policies in a timely manner. The whole process of producing systematic reviews is becoming more and more demanding because the number of impact evaluations is rising at an exponential rate and more databases are becoming available. Much of the time spent in conducting systematic reviews is absorbed by the process of searching, screening and evaluating the available literature, often using word-recognition devices, while little time is left for activities requiring higher-order cognitive functions, such as synthesizing and evaluating the evidence. Much of researchers’ effort could be spared if we were willing to abandon any attempt to systematically screen all the literature and if we accept that studies can be classified by a relevance score by a machine algorithm that represents their probability of being relevant for the review, b we are willing to accept a reasonable margin of error in screening, which will result in some relevant studies not being included in the review. a

48 Oscar A. García et al.

Africa-Wide ASSIA

SocIndex WHO GHL

PsychInfo

CAB Abstracts

ERIC

Medicine

Global Health

IBSS

Origin

0

.2

Econlit

%of included studies .4 .6

.8

Web of Science

1

Figure 2.6 shows how wasteful systematic reviews are and hence the potential for improvement. The figure shows the percentage of studies included in a review (vertical axis) as a function of the percentage of screened studies in each search database (horizontal axis). The chart shows that 20% of the search in this particular systematic review delivered about 80% of the studies eventually included in the review. This means that if we had been ready to miss out 20% of the evidence, we could have conducted the search in a fifth of the time it normally requires, with great savings in effort and time.

0

.2

.4

.6

.8

1

% of screened studies

Figure 2.6 In a systematic review, 20% of the search delivered 80% of the evidence. Source: Author’s elaboration.

The solution is not that simple, however. The problem is that researchers do not know how many studies will be included from each database before conducting the search. Can machines support researchers in performing this task? The answer is yes. Much research and several projects are under way that use machine learning algorithms to help researchers conduct systematic reviews (O’Mara-Eves et al., 2015; Tsafnat et al., 2014).

ICTs for Evaluation 49 In these trials, researchers screen a subset of the studies. The result of the screening process is fed into a machine that develops a rule to include or not include a given study based on the information provided by the researchers. This is normally performed by a logistic regression where the dependent variable is the inclusion- exclusion of the study and the explanatory variables are words and combinations of words in the studies reviewed. The inclusion rule is then applied to a new subset of the data, and the selection performed by the computer algorithm is returned to the researchers. The researchers at this point can perform an additional screening on the results of the search conducted by the computer, which can be fed back again to the machine to improve and refine the inclusion process at successive trials. In this way, the machine iteratively learns to include the studies using the criteria followed by the researchers. It has been estimated that this exercise can save 10%–90% of screening time. Whether machine algorithms can be applied to other aspects of systematic reviews, such as appraising the quality of the evidence and synthesising it, is more questionable. Some have argued that systematic reviews will eventually be performed entirely by machines at the push of a button (Tsafnat et al., 2013). Computer algorithms can certainly help researchers assess the quality of studies when this appraisal is based on the recognition of key associations of words (Millard, Flach and Higgins, 2016). Computers can also be instructed to extract the desired data and summarize them in a quantitative way (Jonnalagadda, Goyal and Huffman, 2015). It is more questionable whether computers are able to evaluate evidence from a body of work. The ability of machines to learn should not be underestimated, but the evidence used in international development poses particular problems. Evaluations of development interventions are often a mixture of qualitative and quantitative analyses, and the quantitative analysis is often not as simple or transparent as the randomized controlled trials normally employed in medicine. It is therefore unlikely that systematic reviews of development interventions will be entirely produced by machines, at least in the near future. However, efforts to support researchers to complete tedious and time-c onsuming tasks should be promoted, as the technology to do this is available and easily applied.

50 Oscar A. García et al.

Case Study 5: Using Machine Learning to Improve Impact Evaluations By Paul Jasper, Oxford Policy Management Interest in applying machine learning to a variety of contexts is growing rapidly around the world, including in international development (Figure 2.7). The World Bank, for example, has organized a competition for data scientists to try to predict poverty using machine learning approaches (Driven Data, 2018). Similarly, UN Global Pulse is running several projects to test how machine learning can support development work (UNGP, 2018). Can machine learning also be used in impact evaluations? A 2016 UN Global Pulse report on using big data in monitoring and evaluation mentions machine learning as a technique related to predictive analysis with great potential but states that “not much progress has been made on causal analysis” (UNGP, 2016). In recent years, economists and social scientists have tackled this issue and investigated how machine learning can be used to improve causal inference and programme evaluation, resulting in a nascent literature in this field. For example, Hal Varian’s article from 2014 (Varian, 2014) reviews a set of machine learning techniques that could be used by economists. Athey and Imbens (2017) and Mullainathan and Spiess (2017) specifically review how machine learning methods could help to solve econometric problems of causal inference. Most recently, Susan Athey provided a comprehensive overview of the current state of the research in this area (Athey, 2018). Oxford Policy Management (OPM) is experimenting with integrating some of these approaches into its impact evaluation work. Google Trends - Machine Learning 120 100 80 60 40 20 0

2010-01 2011-01 2012-01 2013-01 2014-01 2015-01 2016-01 2017-01 2018-01

Year

Figure 2.7 G oogle search trends for the term “machine learning”.

ICTs for Evaluation 51 What Is Machine Learning? Machine learning can be defined as “a field that develops algorithms designed to be applied to datasets, the main areas of focus being prediction […], classification, and clustering or grouping tasks” (Athey, 2018). A large number of machine learning applications deal mainly with prediction problems: what is the best way to predict a certain outcome (such as consumption poverty of households) using a set of observed explanatory variables (such as background characteristics of those households). Machine learning algorithms can be remarkably good at selecting and combining a set of variables in flexible ways in order to solve such prediction problems consistently (James et al., 2013). Yet this is fundamentally different from answering questions about causality, which impact evaluations are designed to tackle. It is possible, for example, that the type of roofing of a home is a very good predictor of consumption behaviour and the related poverty status of the household living there. This would not mean, however, that installing a different roof would automatically increase this household’s consumption. Instead, it is plausible to assume that causality might run the other way: with increasing wealth, a household might choose to change their dwelling’s roof. The challenge is to devise ways to use the strength of machine learning to improve causal inference and impact evaluation methods. OPM has focused on experimenting with applying machine learning to situations where randomized evaluations are not possible, that is, quasi- experimental impact evaluations (QIEs).7 What Constraint Does Machine Learning Address in QIEs? To identify causal effects of a policy intervention, researchers ideally devise a policy experiment where participation or targeting is randomly assigned to participants and non-participants, and where outcomes of interest are then compared between the two groups. Due to the random assignment, these two groups are – on average – e quivalent on all dimensions except for the intervention. Hence, differences in outcomes in the two groups after the intervention can be causally attributed to the policy. In practice, however, such randomized experiments are often not possible. This means that many impact evaluations are quasi- experiments that rely on – in some way – controlling for all relevant characteristics of participants and non-participants that make them non- comparable, thereby creating valid comparisons of outcomes that can serve to estimate causal effects of a policy.8 In other words, such methods rely on the assumption that once all relevant characteristics

52 Oscar A. García et al. are controlled for appropriately, assignment to the participant or non- participant group is virtually random.9 Selecting the right set of covariates and combining them in the right way to control for any systematic differences between participants and non-participants is therefore a core step in implementing many QIEs. This step is not trivial though, given that researchers are often faced with many possible covariates to select from and many ways of controlling for them. For instance, survey questionnaires used in impact evaluations often collect data on several hundreds of variables that can be combined in all sorts of ways. It is here that machine learning methods can help researchers to add robustness, given that they are very good at variable and model selection tasks. How Did OPM Use Machine Learning in Its QIEs? OPM has experimented with two closely related methods: “double selection” (DS) and “double machine learning” (DML).10 Both approaches build on the idea that when the ultimate objective is to estimate causal effects, good covariate selection means that one needs to build a model with variables that are related to participation status and that are related to the outcome of interest. Otherwise, differences in outcomes between participants and non-participants might be due to those variables and not the policy to be evaluated. Hence, in a first stage, machine learning algorithms should be applied to two separate estimation problems: how covariates are related to the treatment assignment variable and how they are related to the outcome variable. In a second stage, the results of these first two analyses can be used to estimate the impact of the policy. The key insight here is that the first stage in this approach can be interpreted as a prediction problem that machine learning can help to address. DS does so by employing machine learning-driven regularized regression methods, such as LASSO, to automatically select a limited set of variables that turn out to be related to both participation status and the outcome variable, and to then feed those variables into causally interpreted second-stage estimations. Simply put, machine learning in the first stage here is used for variable selection alone. DML exploits the predictive power of machine learning approaches more comprehensively. In the first stage, a flexible set of machine learning methods (e.g. random forests or neural networks) can be used to separately model the relationship between covariates, the outcome variable, and the participation assignment variable. These models are used to predict both outcomes and the participation status. Prediction errors, that is, residuals, are recorded and used in the second stage to

ICTs for Evaluation 53 estimate the effect of the policy. The intuition behind this approach is that if the relationships between covariates and the outcome, on the one hand, and between covariates and the participation assignment, on the other, are modelled well in the first stage, the remaining errors will capture information that cannot be explained by the covariates controlled for. Hence, this information should reveal whether once taking covariates into account, participation assignment can explain remaining variation in the outcome variable. In other words, once machine learning is used to explain all variation that is due to the background characteristics of participants and non-participants, one can assess whether differences in outcomes between participants and non-participants persist. In contrast to DS, DML allows for more flexibility in the machine learning method used in the first stage and hence allows to explore algorithms that model these relationships in highly complex ways. OPM has been exploring the use of algorithms for covariate selection, using both DS and DML, in several impact evaluations for which it has collected survey data (at baseline and endline) on both participants and non-participants of the programmes to be evaluated. In addition to outcome indicators, the resulting datasets generally include a large set of additional variables that could be potential confounders to be controlled for. In an education evaluation, for example, key outcome indicators generally include learning outcome measures of pupils. Sets of covariates include geographical, school-level, teacher-level and individual pupil- level background characteristics. Each of those sets covers a variety of dimensions. At the pupil level, for example, background characteristics can include aspects of a pupil’s parents (age, education status, economic activities), of physical dwellings (access to water, electricity, infrastructure), of consumption patterns and of household composition. This list of covariates can easily be extended. All of these covariates could also be transformed, for example by interacting them with each other, in order to allow for some flexible relationships when controlling for them, which increases the set of potential covariates even further. Conventionally, such a situation would be approached by giving ex ante theoretical justifications for certain baseline covariates to be included in the estimation process. Moreover, results from several different estimations, controlling for varying sets of covariates, would normally be presented in order to show whether results are robust to varying estimation models. OPM followed these established procedures but also employed algorithmic covariate selection to show that estimations are robust to such data-driven approaches, including DML and DS applied to a full set of simple and transformed covariates.11

54 Oscar A. García et al. What Was the Benefit of Using Machine Learning in QIEs? There are two main benefits of using machine learning for variable selection and modelling purposes in QIEs. First, employing such algorithms allows for a full systematic search over the set of baseline covariates – including their transformations and interactions – to identify and control for relationships in the data that might be biasing raw comparisons of outcomes between participant and non-participant groups. Hence, assuming that these machine learning algorithms are employed correctly, this adds substantive robustness to the underlying assumption in many QIEs that all relevant covariates that drive systematic differences between the two groups are controlled for appropriately and hence treatment is as good as random. Second, employing these methods removes researcher discretion from the process of covariate selection and modelling, thereby increasing the rigour of QIE estimation processes. Even though, as described above, conventionally this process is pre-specified and theory driven, disagreement often still persists among researchers about the exact specifications to choose in practice. Machine learning allows researchers to be systematic about this process: There are many disadvantages to the traditional process, including but not limited to the fact that researchers would find it difficult to be systematic or comprehensive in checking alternative specifications (…). The regularisation and systematic model selection have many advantages over traditional approaches, and for this reason will become a standard part of empirical practice in economics. (Athey, 2018) Opportunities and Constraints in Using the Tool Research on and applications of machine learning in policy or programme evaluations are still recent and rare but will soon become much more common. Variable and model selection is just one area where machine learning can be of use. Other areas of research include using machine learning to estimate heterogeneous treatment effects, to implement supplementary analyses, and to produce robustness measures (Athey, 2018). The fact that machine learning approaches are new in this area of research, however, also points to two key constraints for their application. First, evaluators still have little capacity and a lot of uncertainty about how to implement machine learning and combine it with

ICTs for Evaluation 55 traditional estimation procedures. Second, machine learning-driven modelling can be perceived as an exercise that reveals little of its internal workings and produces highly non-linear estimation models that are difficult to interpret. Presenting results to audiences that are unfamiliar with machine learning can hence be a complex task. OPM therefore generally compares results from machine learning-driven approaches with results from more traditional estimations. It is likely that both of these constraints will change as machine learning approaches become more commonly used and tested. It is important to emphasize, finally, that machine learning does not absolve the researcher from thinking carefully about the design and assumptions needed to rigorously estimate the causal effect of policies and programmes to be evaluated. While machine learning can help to improve the robustness and rigour of estimation processes, inferring causality in the context of impact evaluations still fundamentally relies on creating credible comparison groups or counterfactuals via careful research design and data analysis.

Case Study 6: Using Geospatial Data to Produce Fine-Scale Humanitarian Maps By Gaurav Singhal, Lorenzo Riches and Jean Baptiste Pasquier, World Food Programme of the United Nations To reduce hunger, it is vital to be able to monitor food security. That is why the World Food Programme (WFP) continuously conducts household surveys. Collecting face-to-face data in remote or unsafe areas is difficult and expensive, however, so WFP’s estimates are only representative at a low resolution – usually at regional or district level. To allocate resources more efficiently, WFP and other humanitarian actors need more detailed maps. WFP’s Vulnerability Analysis and Mapping (VAM) unit aimed to leverage open geospatial data for use in WFP and other humanitarian sector assessments, and to make it accessible for a broad range of users. For WFP, this means enabling users to produce fine-scale food security maps. VAM’s approach builds on recent findings by a research group from Stanford on how machine learning and high-resolution satellite images can be used in combination with survey data to predict poverty indicators for small areas (Head et al., 2017; Jean et al., 2016). VAM tested its approach for a variety of country case studies based on survey data from the World Bank as well as from WFP. By combining further valuable open- source data, such as OpenStreetMap information and nightlights, with the satellite data-based image recognition, and weighting it by

56 Oscar A. García et al. population data, VAM is able to further refine prediction results for poverty indicators. By applying the model to food security indicators, VAM is able to broaden the use of high-resolution mapping by humanitarian actors. VAM is also increasing the usefulness of research results by making them easily accessible to policymakers through a web-based application. Approaches to High-Resolution Spatial Mapping Several approaches have been used to map socio-economic indicators at a high spatial resolution. These approaches generally rely on survey data as ground truth to estimate and evaluate statistical models, and they mostly focus on poverty indicators. However, the set of covariates and the techniques employed to extract these covariates vary broadly. The most common statistical technique is referred to as small-area estimates (SAE) (Elbers, Lanjouw and Lanjouw, 2003). It consists of fitting a model linking the target variable of interest with a set of relevant covariates collected through survey data and applying this model to the same set of covariates in census data (where the target variable was not measured). Census population counts usually being available at a very small administrative level, the resulting predictions downscale the results. WFP, for example, has produced small-area estimations of food security indicators in Bangladesh (Jones and Haslett, 2003), Nepal (Jones and Haslett, 2006, 2014) and Cambodia (Haslett, Jones and Sefton, 2013). This technique is difficult to apply in Sub-Saharan Africa, however, as it requires recent and reliable census data. It also relies on the assumption that the set of covariates was measured in the same way in both the survey and the census data. Where no census data is available, researchers have been using geospatial data to make predictions at a high spatial resolution. Geospatial data is gridded data of diverse covariates, such as climate, accessibility, environment or topographic features. These gridded covariates are often derived from satellite imagery, so they are available globally at a high resolution. WorldPop researchers have found that these geospatial covariates are able to predict well the geographic distribution of population and of population characteristics such as age and births. In particular, models trained on census population counts were able to map population distribution at a resolution of 100 metres in the majority of countries in the world (Alegana et al., 2015; Weber et al., 2018). Geospatial data has also been used to model gender-d isaggregated development indicators in four African countries as well as poverty indicators in a variety of countries. The ground truth data comes from the Demographic and Health Surveys Program (DHS) and the World Bank’s Living Standards

ICTs for Evaluation 57 Measurement Study (LSMS). Their assessments are linked to geographic covariates with the survey GPS coordinates available at the cluster level. This method is referred to as the bottom-up approach. In Bangladesh, mobile-phone metadata features available at the cell- phone tower level were also used as additional covariates to predict poverty (Steele et al., 2017). Finally, a group at Stanford University has used image recognition machine learning models to extract features from high-resolution satellite images (Jean et al., 2016). They predicted poverty indicators (wealth index and expenditures) in five African countries with the extracted features. This technique, known as Transfer Learning, is complex: convolutional neural networks (CNNs) are trained to classify nightlight intensity from the satellite images, and the features that are extracted from the networks are then fed into a traditional linear model. The ground truth data is also LSMS and DHS data aggregated at the survey cluster location. This approach has since been replicated with a broader set of indicators and countries (Head et al., 2017). VAM’s current work aims to unify the different approaches in an attempt to create an automatic process that would apply to various food security indicators collected by WFP. VAM has selected data sources that are globally and programmatically available and built a pipeline to process data and extract covariates. More specifically, VAM has built on the current approaches by 1 testing high-resolution mapping on WFP core food security indicators (Food Consumption Score (FCS) and Food Expenditures), 2 combining Transfer Learning with geographic gridded data to extract a broader set of features that yield more stable results, 3 adapting Transfer Learning to use different sources of imagery available more frequently (from the European Space Agency’s Sentinel 2 satellites) and training lighter neural networks to increase computational efficiency, 4 adapting “geographic data models” to programmatically extract gridded covariates from various open data sources with a given set of parameters, 5 building a prediction model that accounts for both uncertainty in training data and spatial autocorrelation to both predict and provide mathematically consistent estimates of model uncertainty for individual predictions, 6 developing an open-source modular code-base to replicate research and allow new features and models to be added, 7 creating a user-friendly web application that allows users to evaluate the method on their own geo-located dataset and automatically map their prediction results.

58 Oscar A. García et al. Deep Learning to Extract Covariates from Satellite Images For the purpose of this case, VAM focused on only one data source used to extract features from satellite images. These images are comprised of the red, green and blue (RGB) bands from Sentinel 2 and Google Maps for the area of interest (AOI). The processing is inspired by the transfer learning approach developed by Neal et al. CNNs are trained to predict nightlight intensities (as a proxy of poverty) from the aforementioned images. “Convolutional” means the neural networks are invariant to rotation and translation of images; hence, CNNs are commonly used for image recognition. In the process of learning how to predict nightlight intensity, the CNN creates features from the satellite imagery. These features are extracted (hence, the term “transfer learning”) to be used as covariates by the prediction model of step (5) in the pipeline mentioned above. Step (3) of the pipeline trains two models: one for Google images and the other for Sentinel images. The nightlight data is classified into three categories according to their luminosity: low, medium and high. These values become the labels for training our models. Nightlight data from four countries has been used so far to train the model: Malawi, Nigeria, Senegal and Uganda. To reduce class imbalance, the nightlights are masked with European Space Agency’s land use product to take luminosity only from populated areas. Google and Sentinel images are then downloaded (Figure 2.8).

Figure 2.8 N ightlights classified as low (red), medium (yellow) and high (green) in northern Nigeria.

ICTs for Evaluation 59 The images and the nightlight classes are then fed to CNNs. Compared with the transfer learning approaches mentioned above, VAM’s method uses a much smaller model architecture for two reasons: first, it relies upon fewer images for training, and second, VAM desired a light model that would be able to score images fast. The final accuracy on validation sets ranges from 60% to 70% (Figure 2.9).

Figure 2.9 T rained vs. test models.

The trained models are then used to extract features from the satellite images relevant to the AOIs, usually clusters that either have been surveyed (as part of the training data) or that VAM wants to predict (Figures 2.10 and 2.11). Innovations, Limitations and a Path Forward VAM’s approach implements three innovations in the field of high- resolution spatial mapping for development indicators. First, VAM combines satellite image-based feature extraction with further valuable open-source data, such as OpenStreetMap information and nightlights, and weights it by population estimates. This extends predictions to food-insecurity indicators while increasing overall prediction

60 Oscar A. García et al. Satellite Image

Features

feature1

feature2

feature3

feature4

feature n

...

Channels (RGB) Blue

image height

19

13

12

...

15

14

13

28

28

28

...

...

...

...

...

Green Red

image width

Figure 2.10 Satellite imagery and neural networks.

Figure 2.11 P redictions at 500m by 500m resolution of food insecurity for a town in Mali, based on a 2018 Emergency Food Security Assessment.

ICTs for Evaluation 61 accuracy for development indicators. Second, VAM incorporates both spatial autocorrelation and training-data uncertainty into the modelling process to produce predictions accompanied by mathematically consistent estimates of error. This serves two purposes. It allows VAM to extend its survey data sources from traditional high-quality assessments to the numerous low- to medium-quality assessments performed during emergencies. It also provides decision makers with a “complete picture”, highlighting AOIs where estimates can be presumed to be reasonably trustworthy. Finally, the automation of the feature extraction process means that with a small amount of human training, it can be put into the hands of field practitioners and made widely available to the humanitarian community. However, fundamental limitations remain. The seminal project that VAM’s work builds on achieved a modicum of success in predicting asset wealth and total expenditures at a fine resolution throughout several Sub-Saharan countries (Xie et al., 2016). Unfortunately, this approach did not yield satisfactory results for more dynamic, highly seasonal indicators such as food security or vaccination coverage. By using additional features and a more complex estimation procedure, VAM’s method is in many cases able to achieve satisfactory results, as defined by prediction accuracy and uncertainty (VAM is yet to be able to ground-t ruth results), but certainly not in all cases. Fundamentally, the modelling framework cannot account for time, while food security is by its very nature a temporally dynamic indicator. For this reason, VAM’s method is best suited for downscaling results from a current, traditional, face-to-face survey. It is not capable of truly predicting food insecurity given historical survey results and current satellite imagery. In the light of the innovations VAM has been able to implement, however, this limitation should be viewed as a feature and not a bug. In an ideal world, surveys would be designed from the start to use VAM’s high-resolution estimation method to make predictions about places where access is difficult or high-resolution estimates are programmatically necessary, while allocating samples to places where prediction uncertainty is greatest or socio-e conomic variance of the population is high. Such a union between survey design and model-based estimates would represent a sea change not only for WFP but also for the humanitarian community at large, enabling timely, high-resolution estimates of known accuracy whenever it is simply not possible to undertake a traditional, comprehensive survey assessment.

62 Oscar A. García et al.

Case Study 7: Using and Sharing Real-Time Data During Fieldwork By Simone Lombardini and Emily Tomkys Valteri, Oxfam GB There is general recognition in the evaluation sector about the need to effectively communicate evaluation findings to key stakeholders (Bamberger, Rugh and Mabry, 2012). Gertler et al. (2011) argue that programme participants should also be included in dissemination efforts. Heinemann, Van Hemelrijck and Guijt (2017) and Van Hemelrijck (2017) provide examples of impact evaluations in which feedback and dialogue with key constituents and beneficiaries is part of the evaluation design. USAID (2018) suggests that helping stakeholders understand their data can lead to higher-quality data, more robust analysis and improved adaptation. McGee et al. (2018) recognize that the right to information is critical for accountability and that data needs to be made available in accessible, usable and actionable ways for the user. Therefore, it is widely seen as valuable in theory, and yet it is not common practice in the evaluation sector to share evaluation findings with survey respondents and targeted audiences. Bamberger, Raftree and Olazabal (2016) make the case that ICTs are being rapidly introduced into development evaluations, bringing new opportunities for participation, as well as ethical, political and methodological challenges that evaluation needs to address. Holland (2013) argues that ICT has blown wide open the opportunities for participatory statistics. This short case study presents Oxfam GB’s experience in using ICTs to share real-time data during fieldwork, including practical learning and considerations of when and how this can be carried out as part of a survey data collection process. Since 2015, Oxfam GB’s team of impact evaluation advisers has been using digital devices in more than 18 countries to conduct individual and household surveys for Oxfam’s effectiveness reviews (ex- post, QIEs) (Oxfam, 2016; Tomkys and Lombardini, 2015). In many of these data collection processes, Oxfam took advantage of features provided by digital devices to increase data quality, knowledge dissemination, community participation and engagement. One of these features is the ability to process data in real time, while data collection is still under way. Using digital devices has many clear advantages over paper-based data collection. It lowers costs, reduces the time required to complete the interview and clean the data, and increases data accuracy and data security. As well as providing these advantages, which are already well recognized in the sector, appropriate use of ICTs offers ways of improving the process of conducting household surveys. First,

ICTs for Evaluation 63 household surveys can be long and tedious, which may mean respondents feel little connection with their aims and have limited motivation to take part in them. This is likely to affect the response rate, and consequently the representativeness of the sample and the quality of the data. Second, household survey respondents dedicate time to provide detailed information that researchers and evaluators will use to answer evaluation and research questions, which will then help the respondents’ communities. To reinforce this loop, researchers and evaluators should also return to the same communities to share the results with the individuals who provided the data. Unfortunately, this is not yet standard practice in the sector, and where it does happen, researchers and evaluators often visit a long time after the data was collected. Third, research and evaluation questions are formulated and determined by different interests, which may only indirectly reflect community needs. Using ICTs for data collection can enable researchers to process survey data in real time, even while fieldwork is still being conducted. It allows them to share summary survey data with surveyed communities almost immediately after data collection, which can increase engagement and participation. Due to the constraints of the traditional process and the opportunities created by using digital data collection, Oxfam GB’s Impact Evaluation team piloted sharing and using real-time data during fieldwork in three locations: Thailand, Armenia and Zambia. The objective of sharing real-time data during the Thailand evaluation was to increase engagement in the survey and enable information-sharing among leaders of agricultural groups (Vigneri and Lombardini, 2015). Data was uploaded nightly and sent to the research team in the field, who presented the findings to the leaders of the agricultural groups. As soon as data collection ended in the region, the field research team held four presentations, inviting all leaders of the agricultural groups involved in the survey. This provided an opportunity to increase engagement and collect valuable additional information, which was used when interpreting the results. In Armenia, the data collection was designed to better integrate the quantitative survey data with qualitative data collection methods (Lombardini, 2017). The evaluation had two components: an individual quantitative survey, and a qualitative component consisting of focus group discussions and follow-up individual qualitative interviews. Respondents for the qualitative interviews were selected from the quantitative survey sample and were asked to elaborate on the reasons why they answered in a certain way to the quantitative interview. Using digital devices allowed the qualitative team to follow

64 Oscar A. García et al. up the quantitative survey with qualitative interviews and focus group discussions using real-time responses collected from the individual surveys. This enabled the qualitative and quantitative data collection processes to take place in parallel and to complement one another. In Zambia, the objective was to share information collected on water sources, agricultural techniques and early warning systems with representatives in disaster management committees (DMC), as well as to increase the engagement and participation of project participants with the data collection process (Fuller and Lain, 2018). One meeting was held to share preliminary findings with the DMCs’ representatives, and three group meetings were held with survey participants and their communities. In the meeting with the DMCs’ representatives, information was shared with a standard PowerPoint presentation, while in the group meetings, statistical information was presented using physical objects such as cassavas. The entire process generated positive feedback from the participants. Those in attendance were surprised by some results, such as the use of agricultural techniques employed in neighbouring communities, which they were eager to apply in their own work. These pilots provided some important lessons: Not all household surveys can or should be conducted with digital devices, particularly when there are physical or digital security considerations. If, after careful analysis, a context is deemed not suitable for using digital tools, then the length of time it takes to receive and analyse the data can hinder the sharing of results with the community. There is a data protection risk in sharing data, so precautions should be taken when sharing household-level and personal data. For example, in Zambia, no information on ownership of livestock, assets or household goods was shared in the socialization session, as it was deemed too sensitive. Results should be shared with the whole community. Researchers should present relevant data in a way that can be understood by all, and the session should be open to all demographic groups in the community, including those who did and who did not participate in the research. It can take time to understand how best to present findings and a strong understanding is required of any community dynamics that may lead to exclusion of some groups. Evaluators or researchers need to be trained in the use of digital devices and feel comfortable using them to conduct household surveys. They also need presentation and facilitation skills, so they can convey technical concepts in an accessible and non-technical way. When combining mixed methods, the person responsible needs to understand the relative value of all methods involved and be able to bring these together.

ICTs for Evaluation 65 Capturing information and acting on it: Survey participants often provide information that is valuable for evaluators. Researchers and evaluators should be able to identify what information is relevant and ensure that it is properly documented and used as evaluation or research material. Oxfam GB has found value in sharing and using real-time survey data during or immediately following survey data collection, so the impact evaluation team will make this a more established practice where appropriate and feasible. Sharing information and enhancing engagement with respondents and communities counter the perception that household surveys are purely extractive. At the same time, integrating qualitative and quantitative data collection methods strengthens understanding of a project’s impact.

ICTs in Practice – the Case for Cautious Optimism The case studies this far should give evaluators a cause for optimism using simple and complex ICTs is not merely in the realm of theory but is being actively practised in a wide variety of organizations. The above might even inspire a few organizations to make their own efforts towards using technology for their work. However, using ICTs may be neither cost nor capacity neutral. These costs may be monetary, staff time, opportunity costs of alternative, and perhaps old, ways of working. The relative benefits will have to dictate whether such costs and capacity building are justified by the ends they serve. In light of the trade-offs and costs that organizations will face, there are some organizational decisions that will be required to be made to mainstream ICTs into evaluation work.

Sunk Costs vs. an Opportunity Introduction of any new ICT suites comes with their upfront monetary costs attached to them. Devices for data collection, servers for data storage and training of data collectors come at a cost to the organization(s). For larger organizations, these costs might not prove to be much. However, evaluations are conducted by evaluation outfits of all sizes and profiles. In addition, evaluations are being conducted in an increasingly resource- constrained environments where budgets are being pushed down while expectations from evaluations are increasing. To that end, cost considerations are further complicated by the nature of evaluation and the kinds of data to be collected. Many organizations function in very diverse environments with a variety of development projects. Hence, the data needs for every

66 Oscar A. García et al. evaluation might be different. Investing in any technology comes with costs of two kinds viz. sunk costs and recurring costs. Sunk costs are incurred through investments in buying and setting up a technology, while variable costs are the transaction and incremental costs of running it each time. The perceived incremental benefits may differ from one organization to the other. Justification of the sunk costs requires a certain level of scope and scale within evaluations.

Do It Yourself vs. Outsourcing Dedicated capacity is needed to use new technology tools – capacity that evaluators may not have. Many of the ICT tools require substantial and specific skills and knowledge. This is especially true of emerging technologies such as machine learning, remote sensing and geographic information system (RS&GIS), and data analytics. Organizations are then faced with a dilemma as to whether to build the capacity in-house or outsource such functions to third-party vendors with specialist capacities. Internal capacity building requires time, organizational orientation and the will to build such capacity in an area with which few senior managers are familiar. In addition, resources are constrained in the environment that development organizations and evaluators work in. An additional resource person for a special function is one person lesser for the core function of evaluation. Outsourcing comes with its own cost implications, lack of institution-specific application of technology, and uncertainty about expected outcomes. To address such cost and capacity constraints, organizations need to answer some questions about their operations and how a technology tool might help them: a What is the nature of the organization’s operations at large? b What are the most commonly measured indicators in the organization’s evaluations? c Does this new tool help the organization answer its most frequently asked questions? d How often would the organization need the functions of the tool? e How easy is it to train staff on the tool? f Are the costs of outsourcing lower than transaction costs of training staff? The above questions might give clear answers to some on the need to build certain capabilities in-house or outsource them, while for some others, it may still throw up uncertain results (Box 2.1). As an example, the Independent Evaluation Office of the GEF might be inclined towards building in-house capacity for collecting and analysing remote sensing data, given GEF’s focused environment work and recurrent need for measuring certain environmental indicators (see Case Study 1).

ICTs for Evaluation 67

Box 2.1. Overcoming Data Constraints: In-House or Outsource? The Ministry of Foreign Affairs of Finland planned to evaluate Finland’s interventions in promoting gender equality, an important thematic priority for the ministry. At the outset, the interventions were found to lack primary data. The evaluability study revealed that the only database available was on financing of the projects. To get around such data constraints, the evaluation commissioner decided to explore three options: satellite imagery, Internet-based sourcing of data and data analytics. Some proxy indicators were laid out for gender equality that could be measured using Internet-sourced data and satellite imagery. When researching some gender-related trends in Africa using the Google Trends database, it was found that Google’s data is still limited in Africa. In many cases, no country-specific data are available, perhaps because country-specific Internet exchange points (IXPs) are still under development in many African countries and Internet penetration is still not high. In addition, it was found to be difficult to attribute outcomes to the project’s activities alone. Satellite imagery was also considered for indicators such as access to natural resources. Where country- and local-level data was found to be sufficient, the analysis would have required building highly complex models. This would have required much more advanced skills than the ministry possessed. Any external resources would have had budget implications for the evaluation. In addition, finding people with a working knowledge of the requisite technologies, gender issues and evaluation methods was a major stumbling block. In the end, using these tools was found to be too risky and time-consuming with uncertain outcome. Deliberations on training in-house staff were not fruitful due to the costs and perceived non-recurring nature of the indicators laid out.

Mainstreaming ICTs into Operations Beyond individual evaluators and evaluation function, making the above choices a much wider imperative exists for organizations to decide on. Evaluation remains an integral part of an organization’s operations, as much as design, quality assurance and supervision of projects. To that end, introduction of ICTs cannot be seen in a standalone manner but as a part of an organization-w ide process. ICTs will have to be mainstreamed into the wider organizational operations. Such operations may include mainstreaming technology into planning, monitoring and evaluation and self- assessment processes. This is important for three main reasons.

68 Oscar A. García et al. First, data does not exist in a vacuum for evaluators to collect and analyse. Much of data that evaluators use comes from self-evaluation systems. To that end, technology-enabled data collection and analysis should ideally be integrated into wider organizational data systems. This will enable use of technology without hiccups in the ex post evaluation stage and seamless flow of data from self-evaluation systems into ex post evaluations. Case 1 of GEF earlier in this chapter alludes to such integration. To use remote sensing in evaluation work it is imperative that geo-referencing is carried out on project intervention sites and the coordinates are available in M&E databases. Similarly, as is noted in Case 7 by Oxfam, evaluation processes use the data architecture that already exists in the organization. Second being that technology is fungible and ICT tools can be shared as a distributed resource, if using cutting-edge technology becomes an organizational endeavour. This will enable a wider scope and scale of usage of ICT tools within the organizations, thus enabling better justification of sunk costs upfront. In addition, it will enable better sharing of expertise and capacity in-house and better address the dilemma on building capacity vs. outsourcing. Third, beyond evaluation itself, an organization-w ide endeavour on introduction of ICT contributes to building an organization-w ide culture of innovation.

Evaluation 2.0: Turning Dilemmas to Dividends? Understanding of evaluation has been changing from a very logical project evaluations to more complex policy-level evaluations on social change and development. At the same time, linear result chains thinking in evaluation is being replaced by complex interactive systems thinking, where results can be unpredictable and impact sometimes beyond comprehension. This discussion has produced an idea of Evaluation 2.0, a paradigm shift in evaluation where ICT may have a substantial role in making complexity more understandable and conducting more complex analyses of complex reality. In fact, complexity does not necessary assume causalities but more dynamic interrelations, interactions and trade-offs. Understanding the dynamic nature of development calls for the use of ICT in evaluation, but not necessarily in traditional ways. Big data and statistics may help us to understand the big picture and the context of the evaluation subject, but ICT tools designed for interaction, learning and interpretations of meaning may be needed even more than before (refer to Case 3). ICTs tools such as SenseMaker may help us better interpret and navigate this complexity at the micro level. The Millennium Development Goals and the logical structure of programme design led us also to develop evaluations that fitted well on that global framework. The new framework, Agenda 2030 and the SDGs, has fundamentally changed our understanding of global and local development. It brings complexity and interactive dynamics to our evaluation frameworks. The evaluation community is still in the process of understanding

ICTs for Evaluation 69 what Agenda 2030 means in terms of changing evaluation frameworks, a lthough there have been many conferences on the topic. What we have started to learn about Agenda 2030 and evaluation also calls for Evaluation 2.0, with innovative applications of ICT as an integral part of the evaluation process, from planning to implementation and dissemination. As this requires more resources to realize, evaluation commissioning should be extended with the inclusion of research activities and research funding. This would ensure that new applications of ICTs can be developed along with new evaluation methods and that evaluation can continue to serve its purpose with new tools in the changing framework of global development. Different organizations possess varying mandates and are geared towards particular goals, requiring differing ICT competencies. An inter- organizational partnership can take advantage of the competencies of individual organizations can be a way to overcome the cost and capacity constraints involved in mainstreaming technology. The spectrum of such organizations can extend beyond development organizations to encompass academic organizations, NGOs and the private sector. This is especially true of ICTs with high entry barriers, such as machine learning and big data analytics. Partnership for development is one of the SDGs; by pooling ICT capacities and competencies, development evaluators would be playing their own part in contributing to this goal.

Notes 1 The global analysis was complemented by field work, beneficiary surveys and collection of GIS data using smartphones. The evaluation adopted a mixed method approach, used geospatial data and analysis to measure the environmental impacts, and carried out a qualitative study to understand socio-e conomic factors enabling the impacts. 2 Quality standards include standards for site set-up, equipment and supplies, storage of therapeutic foods and medicines, and water sanitation facilities. 3 These projects are the Roots and Tubers Market-Driven Development Programme (PNDRT, which concluded in 2012) and the Commodity Value-Chain Development Support Project (PADFA, which concluded in 2017). 4 The SenseMaker methodology was developed by Dave Snowden and his team at Cognitive Edge. Information on the approach and examples of practical applications in a development context are available online. For example, http:// cognitive-edge.com/sensemaker/ and https://www.sitra.fi/en/articles/sensemaker- tool-decision-making-new-kind-world/ provide good introductions to SenseMaker. Case studies can inter alia be found on: http://www.undp.org/content/ undp/en/home/blog/2015/10/2/Collecting-stories-from-chaos.html, https://www. rikolto.org/en/project/inclusive-business-s can; https://senseguide.nl/en/case- studies/ and https://www.tandfonline.com/doi/full/10.1080/16549716.2017.1362792. 5 See Kurtz (2014) for detailed, practical guidance on participatory narrative- based inquiry methodology. 6 The questionnaire was derived from the projects’ theory of change, reconstructed on the basis of desk review, and story analysis was influenced by the evaluation team’s knowledge of project performance and contextual aspects, gained through desk review, interviews and field observations.

70 Oscar A. García et al. 7 See Khandker, Koolwal and Samad (2010) for an overview of impact evaluation methods. 8 See Khandker, Koolwal and Samad (2010) for a slightly more technical description of the unconfoundedness assumption underlying such designs. 9 Note that controlling for such covariates usually needs to happen at baseline, that is, with covariates that cannot have been influenced by the policy that is being evaluated. Otherwise, the researcher might be re-i ntroducing bias by controlling for variables that have themselves been affected by the policy. 10 See Belloni, Chernozhukov and Hansen (2014) for an overview of “double selection” and Chernozhukov (2018) for a technical discussion of “double machine learning”. 11 OPM implemented DML and DS using the HDM package in R (https://cran.r- project.org/web/packages/hdm/index.html) and other standard ML packages.

References Alegana, V.A. et al. (2015), “Fine Resolution Mapping of Population Age-Structures for Health and Development Applications”, Journal of the Royal Society Interface, Vol. 12, p. 20150073. Andam, K.S. et al. (2008), “Measuring the Effectiveness of Protected Area Networks in Reducing Deforestation”, Proceedings of the National Academy of Sciences, Vol. 105/42, National Academy of Sciences, Washington, DC, pp. 16089–16094. Athey, S. (2018), The impact of machine learning on economics, National Bureau of Economic Research, Cambridge, Massachusetts, www.nber.org/chapters/c14009. pdf, p. 4 (accessed 1 June 2018). Athey, S. and G.W. Imbens (2017), “The State of Applied Econometrics: Causality and Policy Evaluation”, Journal of Economic Perspectives, Vol. 31/2, American Economic Association, Pittsburgh, Pennsylvania, www.aeaweb.org/articles/pdf/ doi/10.1257/jep.31.2.3 (accessed 1 June 2018). Awange, J.L. and J.B. Kyalo Kiema (2013), “Environmental Monitoring and Management”, in Awange, J.L. and J.B. Kyalo Kiema (eds.), Environmental Geoinformatics: Monitoring and Management, Springer, Berlin, pp. 3–16. Azzam, T. (2013), “Mapping Data, Geographic Information Systems”, New Directions for Evaluation, Vol. 2013/140, Wiley, Hoboken, pp. 69–84. Azzam, T. and D. Robinson (2013), “GIS in Evaluation: Utilizing the Power of Geographic Information Systems to Represent Evaluation Data”, American Journal of Evaluation, Vol. 34/2, Sage, Newbury Park, California, pp. 207–224. Bailey, T.C. and A.C. Gatrell (1995), Interactive Spatial Data Analysis, Vol. 413, Longman Scientific & Technical, Harlow, Essex. Bamberger, M., J. Rugh, and L. Mabry (2012), Real world evaluation: working under budget, time, data, and political constraints, Sage, Newbury Park, California. Bamberger, M., L. Raftree, and V. Olazabal (2016), “The Role of New Information and Communication Technologies in Equity-Focused Evaluation: Opportunities and Challenges”, Evaluation, Vol. 22/2, Sage, Newbury Park, California. Belloni, A., V. Chernozhukov, and C. Hansen (2014), “High-Dimensional Methods and Inference on Structural and Treatment Effects”, Journal of Economic Perspectives, Vol. 28/2, American Economic Association, Pittsburgh, Pennsylvania, https://pubs.aeaweb.org/doi/pdfplus/10.1257/jep.28.2.29 (accessed 1 June 2018). Boelaert, J. and E. Ollion (2018), The Great Regression. Machine Learning, Econometrics, and the Future of Quantitative Social Sciences. Revue française de

ICTs for Evaluation 71 sociologie, Centre National de la Recherche Scientifique, https://hal.archives- ouvertes.fr/hal-01841413/file/GreatRegression-HAL.pdf. Buchanan, G.M. et al. (2014), “Do World Bank Development Projects Lead to Improved Biodiversity Conservation Outcomes?”, Conservation Letters, Wiley, Hoboken. Casella, D. et al. (2014), The Triple-S Project Sensemaker ® experience: a method tested and rejected, IRC, The Hague, www.ircwash.org/sites/default/files/ workingpaper9sensemaker.pdf. Corlazzoli, V. (2014), “ICTs for Monitoring and Evaluation of Peacebuilding Programmes”, Department for International Development, London, www.sfcg.org/ wp-content/uploads/2014/05/CCVRI-SSP-_ICTand-ME-_Final.pdf. Checchi, F. and W.C. Robinson (2013), Mortality among populations of southern and central Somalia affected by severe food insecurity and famine during 2010–2012, UN Food and Agriculture Organization, Rome and Washington, DC, www. fsnau.org/downloads/Somalia_Mortality_Estimates_Final_Report_8May2013_ upload.pdf (accessed 1 June 2018). Chernozhukov, V. (2018) “Double/Debiased Machine Learning for Treatment and Structural Parameters”, Econometrics Journal, Vol. 21, Wiley, Malden, Massachusetts, http://onlinelibrary.wiley.com/doi/10.1111/ectj.12097/epdf (accessed 1 June 2018). Dette, R., J. Steets and E. Sagmeister (2016), “Technologies for Monitoring in Insecure Environments”, Global Public Policy Institute, Berlin, www.gppi.net/ media/SAVE__2016__Toolkit_on_Technologies_for_Monitoring_in_Insecure_ Environments.pdf DFID (2015), Beneficiary feedback in evaluation, Department for International Development, London, https://assets.publishing.service.gov.uk/government/uploads/ system/uploads/attachment_data/file/428382/Beneficiary-Feedback-Feb15a.pdf. Driven Data (2018), “Predicting Poverty – World Bank”, Driven Data, Denver, Colorado, www.drivendata.org/competitions/50/worldbank-poverty-prediction/ page/97/ (accessed 1 June 2018). Elbers, C., J.O. Lanjouw, and P. Lanjouw (2003), “Micro–Level Estimation of Poverty and Inequality”, Econometrica, Ohio. Evers, J.C. (2018), “Current Issues in Qualitative Data Analysis Software (QDAS): A User and Developer Perspective”, The Qualitative Report, Vol. 23(13), pp. 61–73, https://nsuworks.nova.edu/tqr/vol23/iss13/5. Ferraro, P.J. and S.K. Pattanayak (2006), “Money for Nothing? A Call for Empirical Evaluation of Biodiversity Conservation Investments”, PLOS Biology, Vol. 4/4, PLOS, San Francisco. FSNAU (2018), “FSNAU-FEWS NET Technical Release, January 29, 2018”, Food Security and Nutrition Analysis Unit – Somalia, UN Food and Agriculture Organization, Nairobi, http://fsnau.org/in-focus/fsnau-fews-net-technical-release- january-29-2018 (accessed 1 June 2018). Fuller, R. and J. Lain (2018), Resilience in Zambia: Impact Evaluation of the “Citizen Participation in Adaptation to Climate Change” Project, Oxfam GB, Oxford, https://policy-practice.oxfam.org.uk/publications/resilience-i n-z ambia-i mpact- evaluation-of-the-citizen-participation-in-adaptati-620475 (accessed 1 June 2018). GEF IEO (2016a), Impact evaluation of GEF support to protected areas and protected area systems, Global Environment Facility Independent Evaluation Office, Washington, DC, www.gefieo.org/sites/default/files/ieo/evaluations/files/ BioImpactSupportPAs-2016.pdf (accessed 1 June 2018).

72 Oscar A. García et al. GEF IEO (2016b), Value for money analysis for the land degradation projects of the GEF, Global Environment Facility Independent Evaluation Office, Washington, DC, www.thegef.org/council-m eeting- documents/value-money-a nalysis-l and- degradation-projects-gef (accessed 1 June 2018). GEF IEO (2016c), International waters focal area study, Global Environment Facility Independent Evaluation Office, Washington, DC, www.gefieo.org/evaluations/international-waters-iw-focal-area-study-2016 (accessed 1 June 2018). Gertler, P.J. et al. (2011) Impact evaluation in practice, World Bank, Washington, DC. Hansen, M.C. et al. (2013), “High-Resolution Global Maps of 21st-Century Forest Cover Change”, Science, Vol. 342/6160, American Association for the Advancement of Science, Washington, DC, pp. 850–853. Harvey, T.E. et al. (2018), Literature review of remote sensing technologies for coastal chlorophyll-a observations and vegetation coverage, Aarhus University, Denmark https://dce2.au.dk/pub/TR112.pdf. Haslett, S.J., G. Jones, and A. Sefton (2013), Small-area Estimation of Poverty and Malnutrition in Cambodia, National Institute of Statistics, Ministry of Planning, Royal Government of Cambodia and the United Nations World Food Programme, Phnom Penh. Head, A. et al. (2017), Can human development be measured with satellite imagery?, Ninth International Conference on Information and Communication Technologies and Development, ACM Digital Library, New York. Heinemann, E., A. Van Hemelrijck, and I. Guijt (2017), Getting the most out of impact evaluation for learning, reporting and influence, IFAD Research Series, IFAD, Rome, www.ifad.org/documents/38714170/39317790/Res.+Series+Issue+16+ Getting+the+most+out+of+impact.pdf/c76ba037-0195-420f-a290-8e8350749f0f. Holland, J. (2013), Who counts? The power of participatory statistics, Practical Action Publishing, Rugby, https://opendocs.ids.ac.uk/opendocs/bitstream/ handle/123456789/13452/RR_Synth_Online_final.pdf. Holton, A. and H. Chyi (2012), “News and Overloaded Consumer: Factors Influencing Information Overload Among News Consumers”, Cyberpsychology, Behavior, and Social Networking, doi: 10.1089/cyber.2011.0610. IUCN (2016), A global standard for the identification of key biodiversity areas: Version 1.0, International Union for the Conservation of Nature, Gland, Switzerland, https://portals.iucn.org/library/node/46259 (accessed 1 June 2018). James, G. et al. (2013), An introduction to statistical learning with applications in R, Springer, New York. Jayachandran, S. et al. (2017), “Cash for Carbon: A Randomized Trial of Payments for Ecosystem Services to Reduce Deforestation”, Science, Vol. 357/6348, American Association for the Advancement of Science, Washington, DC, pp. 267–273. Jean, N. et al. (2016), “Combining Satellite Imagery and Machine Learning to Predict Poverty,” Science, Vol. 353/6301, American Association for the Advancement of Science, Washington, DC, pp. 790–794. Jones, G. and S. Haslett (2003), Local estimation of poverty and malnutrition in Bangladesh, Ministry of Planning, Bangladesh Bureau of Statistics and the United Nations World Food Programme, Dhaka. Jones, G. and S. Haslett (2006), Small area estimation of poverty, caloric intake and malnutrition in Nepal. Nepal Central Bureau of Statistics, United Nations World Food Programme and World Bank, Kathmandu.

ICTs for Evaluation 73 Jones, G. and S. Haslett (2014), Small area estimation of food insecurity and undernutrition in Nepal, Nepal Central Bureau of Statistics, United Nations World Food Programme and World Bank, Kathmandu. Jonnalagadda, S.R., P. Goyal, and M.D. Huffman (2015), “Automating Data Extraction in Systematic Reviews: A Systematic Review”, Systematic Reviews, Vol. 4/78, BioMed Central, London, https://systematicreviewsjournal.biomedcentral. com/articles/10.1186/s13643-015-0066-7, (accessed 1 June 2018). Karl, T.R. et al. (2010), “Observation Needs for Climate Information, Prediction and Application: Capabilities of Existing and Future Observing Systems”, Procedia Environmental Sciences, Vol. 1, Elsevier, Amsterdam, pp. 192–205. Khandker, S.R., G.B. Koolwal, and H.A. Samad (2010), Handbook on impact evaluation quantitative methods and practices, https://openknowledge.worldbank.org/ handle/10986/2693 (accessed 1 June 2018). Kipf, A. et al. (2015), A proposed integrated data collection, analysis and sharing platform for impact evaluation, Development Engineering, USA, https://ac.els-cdn.com/ S2352728515300014/1-s2.0-S2352728515300014-main.pdf?_tid=42563dbc-d0e6-43388e60-20dd259194b7&acdnat=1543510441_aa36d80e3c66ecb6977fce354095b9c7. Kurtz, C. (2014), Working with stories in your community or organization: participatory narrative inquiry, Third Edition, Kurtz-Fernhout Publishing, New York. Lin, A. and N. Chen, (2012), “Cloud Computing as an Innovation: Perception, Attitude, and Adoption”, International Journal of Information Management, Vol. 32(6), pp. 533–540. Lombardini, S. (2017), Women’s Empowerment in Armenia: Impact evaluation of the women’s economic empowerment project in rural communities in Vayots Dzor region, Oxfam GB, Oxford, https://policy-practice.oxfam.org.uk/publications/ womens-empowerment-in-armenia-i mpact-evaluation-of-the-womens-economicempowerm-620210. McGee, R. et al. (2018), Appropriating technology for accountability: messages from Making all voices count, Institute of Development Studies, Brighton. Melesse, A.M. et al. (2007), “Remote Sensing Sensors and Applications in Environmental Resources Mapping and Modelling”, Sensors, Vol. 7/12, Basel, Switzerland, pp. 3209–3241. Microsoft Azure (2018), “What is Cloud Computing? A Beginner’s Guide”, Microsoft, Seattle, https://azure.microsoft.com/en-i n/overview/what-is-cloud- computing/ (accessed 1 June 2018). Millard, L.A.C, P.A. Flach, and J.P.T. Higgins (2016), “Machine Learning to Assist Risk-of-Bias Assessment in Systematic Reviews”, International Journal of Epidemiology, Vol. 45/1, Oxford University Press, Oxford, https://academic.oup.com/ ije/article/45/1/266/2363602 (accessed 1 June 2018). Minelli, S., A. Erlewein, and V. Castillo (2017), “Land Degradation Neutrality and the UNCCD: From Political Vision to Measurable Targets”, International Yearbook of Soil Law and Policy 2016, Springer, Cham, Switzerland, pp. 85–104. Misra, S. and A. Mondal (2011), “Identification of a Company’s Suitability for the Adoption of Cloud Computing and Modelling its Corresponding Return on Investment”, Mathematical and Computer Modelling, https:// a c.el s -c d n.c om /S0895717710 0 0155 X /1- s2 .0 - S0895717710 0 0155 X-m a i n. pdf ?_tid=5ade43f0- 4ac4- 46b7-ab90- 4a6692bce8b0&acdnat=1543536708_ e0dd3dc9754224ff1bdf4db73084b205.

74 Oscar A. García et al. Mullainathan, S. and J. Spiess (2017), “Machine Learning: An Applied Econometric Approach”, Journal of Economic Perspectives, Vol. 31/2, American Economic Association, Pittsburgh, Pennsylvania, www.aeaweb.org/articles/pdf/doi/10.1257/ jep.31.2.87 (accessed 1 June 2018). OECD (2018), “States of Fragility 2018”, Organisation for Economic Cooperation and Development, Paris, France, www.oecd-ilibrary.org/docserver/9789264302075-en. p d f ?e x p i r e s=155783625 4 & i d= i d & a c c n a m e=o c i d195767& c h e c k s u m= A5532AC41EBA722C360F1830CE6B298E. O’Mara-Eves, A. et al. (2015), “Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches”, Systematic Reviews, Vol. 4/5, BioMed Central, London, pp. 1–22, https://systematicreviews journal.biomedcentral.com/articles/10.1186/2046-4053-4-5 (accessed 1 June 2018). Oxfam (2016), How are Effectiveness Reviews Carried Out? Oxfam GB, Oxford, https://policy-p ractice.oxfam.org.uk/publications/how-a re-e ffectiveness- reviews-carried-out-594353 (accessed 1 June 2018). Renger, R., A. Cimetta, S. Pettygrove, and S. Rogan (2002), “Geographic Information Systems (GIS) as an Evaluation Tool”, American Journal of Evaluation, Vol. 23/4, pp. 469–479. https://doi.org/10.1177/109821400202300407. Rocchini, D. et al. (2017), “Open Data and Open Source for Remote Sensing Training in Ecology”, Ecological Informatics, https://reader.elsevier.com/reader/ sd/pii/S1574954117300237?token=55476A9DF19354ACE6A8A5B049D29664C 7C 7 F1F 0 9 62 F 375105 C 7C13 C1DE 3BE D5 010A 9 73 6AC 7 DB2 D1DBF F 2179DE48C90B. Scholes, R.J. et al. (2008), “Toward a Global Biodiversity Observing System”, Science, Vol. 321/5892, American Association for the Advancement of Science, Washington, DC, pp. 1044–1045. Sette (2018), Participatory evaluation, Better Evaluation, www.betterevaluation.org/ en/plan/approach/participatory_evaluation (accessed 4 November 2018). Spitzer, D. (1986), “On Applications of Remote Sensing for Environmental Monitoring”, Environmental Monitoring and Assessment, Vol. 7/3, Springer, Cham, Switzerland, pp. 263–271. Steele, J.E. et al. (2017), “Mapping Poverty Using Mobile Phone and Satellite Data”, Journal of the Royal Society Interface, Vol. 14, p. 20160690. Stephenson, P.J. et al. (2015), “Overcoming the Challenges to Conservation Monitoring: Integrating Data from in-situ Reporting and Global Data Sets to Measure Impact and Performance”, Biodiversity, Vol. 16/2–3, Taylor and Francis, London, pp. 68–85. Tomkys, E. and S. Lombardini (2015), Going digital: using digital technology to conduct Oxfam’s effectiveness reviews, Oxfam GB, Oxford, https://policy-practice. oxfam.org.uk/publications/going- d igital-u sing- d igital-t echnology-to- c onduct- oxfams-effectiveness-reviews-578816 (accessed 1 June 2018). Tsafnat, G. et al. (2013), “The Automation of Systematic Reviews”, British Medical Journal, Vol. 346, BMJ Publishing Group, London. Tsafnat, G. et al. (2014), “Systematic Review Automation Technologies”, Systematic Reviews, Vol. 3/74, BioMed Central, London, https://systematicreviewsjournal. biomedcentral.com/articles/10.1186/2046-4053-3-74 (accessed 1 June 2018). UNGP (2016), Integrating big data into the monitoring and evaluation of development programmes, UN Global Pulse, New York, http://unglobalpulse.org/sites/default/ files/IntegratingBigData_intoMEDP_web_UNGP.pdf, (accessed 1 June 2018).

ICTs for Evaluation 75 UNGP (2018), Projects, UN Global Pulse, New York, www.unglobalpulse.org/ projects (accessed 1 June 2018). UN OCHA (2014), Unmanned aerial vehicles in humanitarian response, United Nations Office for Coordination of Humanitarian Affairs, New York www.unocha.org/ sites/unocha/files/Unmanned%20Aerial%20Vehicles%20in%20Humanitarian% 20Response%20OCHA%20July%202014.pdf. USAID (2018), Discussion Note: Adaptive Management, USAID Bureau for Policy, Planning and Learning, Washington, DC, https://usaidlearninglab.org/library/ discussion-note-adaptive-management. Van Hemelrijck, A. (2017), Governance in Myanmar: Evaluation of the “Building equitable and resilient livelihoods in the Dry Zone” project, Oxfam GB, Oxford, https:// policy-practice.oxfam.org.uk/publications/governance-i n-myanmar-evaluation- of-the-building-equitable-and-resilient-l iveli-620177. Varian, H. (2014), “Big Data: New Tricks for Econometrics”, Journal of Economic Perspectives, Vol. 28/2, American Economic Association, Pittsburgh, Pennsylvania, www.aeaweb.org/articles?id=10.1257/jep.28.2.3 (accessed 1 June 2018). Vigneri, M. and S. Lombardini (2015), Resilience in Thailand: Impact evaluation of the climate change community-based adaption model for food security project, Oxfam GB, Oxford, https://policy-practice.oxfam.org.uk/publications/resilience- in-thailand-i mpact-evaluation-of-the-climate-change-community-based-583400 (accessed 1 June 2018). Weber, E.M. et al. (2018), “Census-Independent Population Mapping in Northern Nigeria”, Remote Sensing of Environment, Vol. 204, pp. 786–798. https:// re a der.el s ev ier.c om /re a der/s d /pi i /S0 034 42571730 436 4? toke n=B7C59F 5C89994A4A9FB1640ED64731AD80EB2A222E9BFEC870C7FE56069C10C02F 59916CBA06BAD700F2E0831CE40CDD. World Bank (2013), ICT for data collection and monitoring & evaluation, World Bank, Washington, DC. Xie, M. et al. (2016), “Transfer Learning from Deep Features for Remote Sensing and Poverty Mapping,” Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Menlo Park, California.

3 Big Data Analytics and Development Evaluation Optimism and Caution Michael Bamberger

New information technology (NIT), comprising big data, information and communication technologies (ICTs) and the Internet of things (IoT), is transforming every aspect of life in both industrial and developing nations. NIT signifies the nature of interaction between its three aspects mentioned above. The emergence of big data is closely linked to advances in ICTs. In today’s hyper-connected digital world, people and things leave digital footprints in many different forms and through ever-increasing data flows. They originate from commercial transactions; records that companies and governments collect about their clients and citizens; user-generated online content such as photos, videos, tweets and other messages; and traces left by the IoT, that is, by uniquely identifiable objects whose activity can be tracked (ITU, 2014). The IoT consists of devices connected to the Internet anytime and anywhere. In its most technical sense, it consists of integrating sensors and devices into everyday objects that are connected to the Internet over fixed and wireless networks. The fact that the Internet is present at the same time everywhere makes mass adoption of this technology more feasible. Given their size and cost, the sensors can easily be integrated into homes, workplaces and public places. In this way, any object can be connected and can “manifest itself” over the Internet. Furthermore, in the IoT, any object can be a data source (Accenture and Bankinter, 2011). Any talk of ICT4Eval is incomplete without speaking of NIT. The interaction between the new generation of ICT tools, such as sensors, devices and software, the voluminous data they are producing and processing, and their potential use lies at the heart of this book’s endeavour. ICTs encompass the various technologies and accompanying devices that exist, IoT signifies the new ways in which these technologies and devices will interact with one another, while big data is what is produced in the process of using the first two. Given that evaluations are concerned with data and its conversion to information and knowledge, this chapter will focus predominantly on big data, its generation and usage, and occasional references to NIT. There is enormous potential for integrating NIT into the design of development evaluation, but evaluation offices have been much slower to adopt

Big Data Analytics & Development Evaluation 77 NIT than have their operations colleagues. The methodological, organizational and political reasons for this slow adoption need to be understood and overcome so that development evaluation can adapt to the rapidly evolving NIT ecosystem. The wide range of available NIT tools and techniques can also be used to address the main challenges affecting conventional evaluation methodologies and strengthen evaluation designs.

Some Themes from the Big Data Literature While not attempting to provide a comprehensive literature, this section identifies some of the evolving themes in the big data and development literature that are discussed in this chapter. Over the past five years, there has been a discovery of “big data” by the business community, the media and the international development community. A number of popular publications such as Marr’s (2015) “Big data: Using smart big data to analytics and metrics to make better decisions and improve performance” and Siegel’s (2013) “Predictive analytics: The power to predict who will click, buy, lie or die” promote the powerful new tools of big data and data analytics to the business world. Around the same time, an article on the Bloomberg website (2015) illustrates how the potential of specific big data technologies such as satellite images could be harnessed as tools for business analysis. One of the first broad discussions of the implications for international development was the 2012 White Paper “Big Data for Development” published by UN Global Pulse and edited by Emmanuel Letouzé. While UN Global Pulse had already conducted a number of proof-of-concept studies on the applications of big data in many areas of development (unglobalpulse.org), these proof-of-concept studies were still only known to a relatively small audience. The White Paper introduced the development community to the concepts of big data and its many applications in development. Letouzé, Arieias and Jackson published an updated version of the paper in 2016 which covered the genesis of big data, the big data ecosystem, theoretical considerations and controversies, institutional and cultural considerations, and practical applications. While these publication introduced big data to the official development aid sector, Patrick Meier’s (2015) “Digital humanitarians: How big data is changing the face of humanitarian response” became one of the early reference sources on the dramatic ways in which big data was being adopted and changed by humanitarian non-profit agencies. Beginning with the Haitian Earthquake, Meier presents a series of well-documented cases on the role of big data in emergency relief, monitoring and criticizing political regimes, and as a tool to promote social and political change. There are now increasing numbers of websites and conferences dedicated to the applications of big data and particularly ICT, in the latter case driven by the dramatic advances in wireless technology, in development.1 These are complemented by

78 Michael Bamberger exponentially increasing numbers of free and commercially available apps covering every aspect of development. The multilateral and bilateral aid agencies are also becoming active in the big data field. For example, the subject of the 2016 World Development Report (World Bank, 2016) was “Digital Dividends” which offered a broad overview of the potential benefits and challenges of the new digital economy. The World Bank also launched a Big Data Innovation Challenge in 2016, and the top entries illustrated the wide range of applications of big data. United States Agency for International Development (USAID) is developing a number of guidance notes and “how-to manuals” on different applications of big data in programme design and management, and many UN agencies are also entering the field. A number of online sites such as Big Data and Society also provide opportunities for more detailed discussion of a wide range of technical and research aspects of big data and data analytics. The monitoring framework for Sustainable Development Goals, with the challenge of how to harvest and analyse huge quantities of data from every country in the world, is also generating extensive reports and publications in potential applications of big data (e.g. Data Revolution Group’s (2014) “A world that counts: Mobilizing the data revolution for sustainable development”). One of the themes of this chapter is the fact that most of the very extensive literature on applications of big data in development has paid very little attention to how big data analytics can be applied to evaluation (as opposed to programme planning, design, monitoring and emergency relief). In 2016, the Rockefeller Foundation and UN Global Pulse jointly commissioned a report on the “Integration of big data into the monitoring and evaluation of development programs” (Bamberger, 2016), which confirmed the limited attention to how big data could be applied to programme evaluation. This report also identified a number of organizational and methodological factors that were constraining the integration of the data science and evaluation. The paper in Evaluation by Bamberger, Raftree and Olazabal (2016) elaborated on a number of these themes and also focused on some of the important ethical and equity themes that are frequently not addressed in the discussions of big data and development. Finally, as the media have moved from a fascination with the almost magical powers of big data to transform our world, to a concern about some of the potential downsides, a number of influential publications have recently focused on the dark side of big data. One theme concerns some of the dangers of the widespread application of decision-making apps that are based on algorithms not fully understood by the banks, universities, company recruiting offices and police departments that use these apps to streamline student selection, mortgage applications, personnel recruitment and crime detection. Both “Weapons of math destruction: How big data increases inequality and threatens democracy” (O’Neil, 2016) and “Automating inequality: How high-tech tools profile, police and punish the poor” (Eubanks, 2018) focus on the United States and not on developing countries,

Big Data Analytics & Development Evaluation 79 but they reflect concerns about the use of powerful, but not fully understood high-tech decision-making tools. Privacy and how citizens’ thought, communication, movement, social and political activities can be monitored and sanctioned by governments raise a range of important ethical issues. These are covered extensively in Chapter 4.

Demystifying Big Data The world is more connected, interdependent and data-r ich than ever. Exponential growth in the volume of data produced globally means that 90% of all the data in existence today has been generated during the past two years. The explosion of digital services over the past decade has allowed many to become producers, owners and consumers of data. Between 2005 and 2015, the number of Internet users more than tripled from 1 billion to 3.2 billion. More households now own a mobile phone than have access to electricity or clean water (World Bank, 2016). The exponential growth of big data and data analytics provides information and analytical capacity that would have been unimaginable even a few years ago. These ‘digital dividends’ include a trove of real-time information on many issues, such as trends in food prices, availability of jobs, access to health care, quality of education and reports of natural disasters. Alongside these benefits, the digital divide continues to pose challenges (World Bank, 2016). All of these developments have important applications for research and planning. Biometric data generated from the IoT has produced a rapidly evolving research area on the “quantified self” (Wolf, 2010). Sentiment analysis, sociometric analysis, digital spatial analysis and other tools have made possible research on the “quantified community” (Kontokosta, 2016). Satellite images, social media analysis, cell-phone data on traffic patterns and many other sources have contributed to new fields of research on city planning, urban development, migration and poverty analysis (Ashton, Weber and Zook, 2017). Satellite images and remote sensors have greatly advanced climate research. Defining Big Data and NIT It is useful to distinguish between big data and the broader concept of NIT, which combines big data, ICT and the rapidly growing field of the IoT (Figure 3.1). It is also important to distinguish between primary big data (e.g. the original satellite images, Twitter feeds or electronic records of phone calls and financial transfers), which can involve many millions of digital records, and the processed versions of these original data. Whereas handling primary big data requires access to massive computing facilities, data processing transforms the original data into formats that can be accessed and used on personal computers and handheld devices. When it is stated that big data is fast and economical to use, this refers to the processed data, which may have been very expensive to produce from the primary big data.

80 Michael Bamberger

Big data * Satellite images and remote sensors * Social media feeds * Internet searches

Primary big data requires large-scale computing capacity Processed big data

* Phone records * Electronic transactions

has been formatted so that it can be analysed on personal computers

Information and communication technologies (ICT) * Mobile phones * Tablets and other hand-held devices * Portable computers * Internet * Remote sensors

Internet of things (IoT) * “Wearabl e” biometric monitoring devices * Remote sensors

Figure 3.1 T he components of NIT.

Big data is often defined in terms of the “3 Vs”: the velocity with which big data can be collected, analysed and disseminated; the volume of data that can be generated; and the variety of data covering different topics and using different formats. The continued rapid growth of information technology means that definitions based on each of these dimensions rapidly become out of date. Data that only a few years ago would have required a large data centre to process are now easily downloadable on smartphones and personal computers. A fuller definition of big data must include its differences from conventional survey data in terms of costs and computing requirements, coverage of population and granularity, breadth of contextual data, potential sample bias, ease of dissemination and access, relevance to a particular purpose, use of time series data, integration of multiple data sources, and creation of qualitative data (Table 3.1). Big Data and Data Analytics For big datasets to be useful, new procedures are required to analyse the data and present them in a form that could be understood by non-specialist audiences. These new procedures, known as data analytics, have been made possible by rapid increases in computational capacity and in diversity of computation tools. A complete data analytics strategy usually includes three or four stages, but many applications will only use one or two stages (Figure 3.2; Table 3.2).

Longitudinal datasets already exist, and the number is increasing. Some already cover more than a decade.

Longitudinal data is difficult and expensive to collect.

Data often covers the whole population. Low costs of data Data collection costs produce pressures permit disaggregated analysis to the level of the individual. to keep sample size as small as possible, consistent with power analysis requirements. Big data can collect and synthesize a broad set of variables Most evaluations only collect a limited that cover regions, or national and international data, and amount of contextual data as it is multiple types of information. expensive and difficult to collect. Many indicators only cover people who have and use a phone Surveys have procedures for controlling or app or who use ATMs. These are usually a biased sample sample selection bias or response bias. of the total population. Procedures exist for estimating and controlling bias in small phone surveys. Expensive and time-consuming to produce Fast, sometimes real-time dissemination to users who have access to required digital technology. and distribute evaluation findings. Often only distributed to priority stakeholders. Most big data were collected for a different purpose and Data collection instruments and questions assumed to be appropriate proxy indicators. are designed for a special evaluation purpose.

While the initial costs of collection and analysis of primary big Costs of data collection and analysis are data can be high, processed data will often be available to high. researchers and evaluators (end users) at a relatively low cost. Fast. Time-consuming.

Conventional evaluation survey data

Difficult to construct integrated data 9. Combining multiple Integrated data platforms can be developed and are very data sources into powerful tools. However, they are very time-consuming to platforms. Mixed methods are widely integrated data construct. used, but usually they have to be analysed separately and compared manually. platforms. 10. Creation and analysis The technology is rapidly improving for the fast and Qualitative data is expensive and time- economic analysis of qualitative data such as narrative text, consuming to analyse. Case studies and of qualitative and behavioural data. and video and audio files. in-depth interviews can still generate lived experience and depth that is difficult to achieve with big data.

7. Relevance and appropriateness for a particular evaluation purpose. 8. Longitudinal and time series data.

6. Ease of dissemination and accessibility.

1. Costs and computing requirements of data collection and analysis. 2. Speed of collection and analysis. 3. Coverage of population and ease of disaggregated analysis (granularity). 4. Collecting contextual data covering a broader population. 5. Potential sample bias.

Big data

Table 3.1 Comparing big data and conventional evaluation data

82 Michael Bamberger

1. DESCRIPTION AND EXPLORATION Documenting what is happening Creating integrated data platforms Identifying new patterns Data visualization

• • • •

EXPERIMENTAL

4.EVALUATION/PRESCRIPT ION • • •

and PREDICTIVE MODELLING

Explaining why things happen Recommending how to improve performance Data visualization

and MACHINE LEARNING

2. PREDICTION • • •

What is likely to happen? Which groups are likely to succeed and fail? Data visualization

3. DETECTION • •

•

Identifying outliers and groups likely to fail Providing actionable information, often in realtime Data visualization

Figure 3.2 The four stages of the data analytics cycle. Table 3.2 K inds of big data analysis with potential applications for programme monitoring and evaluation Stage 1. Descriptive and exploratory: Documenting and conveying what is happening, often in real time, and seeking previously unidentified patterns Collects larger volumes of data than conventional data collection methods. Identifies patterns that were previously difficult to identify. Collects real-time data that can be continually updated. Offers the benefits of speed in rapidly changing circumstances.

Can incorporate more contextual factors and capture broader trends. Often involves merging organizational datasets that were previously not linked. Enables dynamic monitoring and the generation of actionable data on project problems and new opportunities. Is valuable in emergencies and dynamic situations such as rapid urban growth or population movements.

Big Data Analytics & Development Evaluation 83 Provides early warning. Can integrate multiple sources of data. Can process unstructured data (free text documents, images, motion pictures, sound recordings, physical objects). Permits systems mapping, sociometric analysis and the analysis of complex adaptive systems.

Can identify potential ethnic, work- related and other kinds of conflict. Permits the use of mixed methods. Is valuable for the analysis of qualitative data, including large volumes of agency reports and operational documents. Provides tools for monitoring changes in communities or other kinds of organization and the interactions among different parts of a system.

Stage 2. Prediction: What is likely to happen, who is most at risk, who might drop out? Permits analysis of datasets too large Predicts what is likely to happen but and complex to be processed using without any underlying theory so conventional methods, including social that it is not possible to explain media posts. “why” it will happen. Does not usually identify and test underlying assumptions of the model. Predicts opportunities (groups likely to Predicts probability of success and succeed) and groups at risk. failure for different groups. Stage 3. Detection: Focuses on anomalies and outliers, tracks issues identified by descriptive analysis Tracks outliers and groups at risk. Identifies unintended outcomes.

Builds on prediction and develops ways to track different groups – often in real time. Can detect unintended outcomes difficult to identify with conventional evaluations.

Stage 4. Evaluation (diagnostic/analytic/prescriptive): Shedding light on why things happen Conducts powerful data analytics beyond Uses advanced analytics to enable predictive modelling and pattern the capacity of conventional computing matching. systems, analyses complex adaptive Enables data mining. systems. Displays and disseminates large datasets. Source: Adapted by the author from Letouzé, Areias and Jackson (2016), Peng and Matsui (2016).

Stage 1. Descriptive and Exploratory Analysis: Documenting What Is Happening, Often in Real Time Descriptive and exploratory analysis describes the characteristics of a programme or intervention and the context within which it operates. This approach has benefited organizations that have potentially useful data that has not been analysed. In a common situation, different departments or

84 Michael Bamberger units of an organization each use datasets directly relevant to their particular activities, but no one has ever integrated the data to find patterns. Such analysis may find, for example, that there are big differences in how well programmes operate in different regions or when working with groups that have different socio-economic characteristics. The analysis of modern slavery among Filipinos working overseas illustrates how an integrated data platform can be created to merge multiple sources of data to detect relationships that were not previously understood (refer to Box 3.2). Exploratory analysis will often identify questions that require the use of the more sophisticated kinds of analysis in stages 2–4. Stage 2. Predictive Analysis: What Is Likely to Happen Predictive analysis uses patterns of associations among variables to predict future trends. The predictive models are sometimes based on Bayesian statistics and identify the probability distributions for different outcomes. Other approaches draw on the rapidly evolving field of machine learning (Alpaydin, 2016). When real-time data is available, predictions can be continuously updated. Predictive analytics can use social media data. In Indonesia, for example, although Internet penetration is lower than in other Southeast Asian countries (18% in 2014), analysis of tweets has been used to provide a rapid, economic way to assess communicable disease incidence and control (UN Global Pulse, 2015). In the United States, predictive analytics are currently used by commercial organizations and government agencies to predict outcomes such as which online advertisements customers are likely to click on, which mortgage holders will prepay within 90 days, which employees will quit within the next year, which female customers are most likely to have a baby in the near future and which voters will be persuaded by political campaign contacts (Siegel, 2013). A key feature of many of these applications is that the client is only interested in the outcome (how to increase click-rates for online advertising) but without needing to know “why” this happens. In contrast, it is critical for development agencies to understand the factors that determine where, why and how outcomes occur, and where and how successful outcomes can be replicated in future programmes. So, there is a crucial distinction between generating millions of correlations, and methods to determine attribution and causality. Typical public-sector applications include the most likely locations of future crimes (“crime hot-spots”) in a city, which soon-to-be-released prisoners are likely to be recidivists and which are likely to successfully be re-integrated into society, and which vulnerable youth are most likely to have future reported incidents at home or at school (Siegel, 2013) and predicting better outcomes for children in psychiatric residential treatment

Big Data Analytics & Development Evaluation 85 (Gay and York, 2018). Box 3.1 provides another example of how predictive analytics was used to predict which groups of children within a child welfare system are most likely to have future reported incidents of abuse or neglect in the home (Schwartz et al., 2017). For all of these kinds of analysis, it is essential to understand the causal mechanisms determining the outcomes, and correlations, which only identify associations, without explaining the causal mechanisms, are not sufficient. Section H in Table 3.6 explains how predictive analytics can be used to predict which groups of troubled youth are most likely to have reported incidents of problems at home or at school. While predictive analytics are well developed, much less progress has been made on causal (attribution) analysis. Commercial predictive analytics tends to focus on what happened or is predicted to happen (e.g. click- rates on websites), with much less attention to why outcomes change in response to variations in inputs (e.g. the wording or visual presentation of an online message). From the evaluation perspective, a limitation of predictive analysis is that it is not normally based on a theoretical framework, such as a theory of change, which explains the process through which outcomes are expected to be achieved. There is great potential for collaboration between big data analytics and current impact evaluation methodologies. Stage 3. Detection: Tracking Who Is Likely to Succeed and Who Will Fail Descriptive analysis in stage 1 is not usually available to identify and target specific problem groups, such as youth most likely to drop out of programmes. In stage 2, it is usually possible to generate more detailed data on groups who are likely to perform well and those likely to drop out. In stage 3, data from the two previous stages are combined to identify specific individuals and small groups most likely to fail or to succeed in programmes such as those for troubled youth, for people recently released from jail or for people wanting to start a small business. Stage 4. Evaluation and Data Diagnostics: Explaining How Outcomes Were Achieved and Providing Recommendations on How to Improve Programme Performance Big data analytics using techniques such as data mining, machine learning and natural language analysis can manage large, complex datasets and identify unseen patterns. Data analytics can examine unstructured textual, sound and video material, and bring together different kinds of data (Marr, 2015). These techniques are used to help understand how outcomes were achieved.

86 Michael Bamberger

Box 3.1 Using Data Analytics to Evaluate a Programme for At-Risk Youth Served by a County Child Welfare System The study used a data analytics model to increase the efficiency of the systems used to decide the appropriate actions that a County Sheriff’s office should take on incidents reported to child welfare services, affecting at-r isk youth at home or school. The model, which uses machine learning to integrate data from case workers and child welfare organizations, involved three steps. • • •

Step 1: Determining the odds that a reported incident will get substantiated through the investigative process. Step 2: What are the odds that a case will receive different types of and intensities of services based on their history, background, incident and substantiation? Step 3: Final prescriptive model: identifying which services are most likely to prevent a case from having another incident of abuse.

This approach has several important differences from conventional evaluation approaches. •

•

• •

First, while conventional evaluation is retrospective and reporting on what happened in the past, data analytics builds on existing data to predict what is likely to happen in the future. Bayesian predictive models draw on all available sources to provide the reference point (the “prior distribution”) to predict future outcomes for each group or individual (2014). In addition to available surveys and administrative data, the prior can also take into consideration expert judgement and even the beliefs of key stakeholders. Second, while conventional “frequentist” approaches estimate the average change for the total population, data analytics analyses the expected outcomes for each subgroup who share certain characteristics. Predictions can also be made for individual (in this case, at-r isk youth). Third, the predictions are continually revised and improved as the model is refined based on feedback from how well the predictions performed in the previous round. Fourth, the data analytics model is prescriptive in that it provides recommendations on future actions that should be taken for each individual reported incident. Testing of the prescriptive model for this project found that when decisions are made correctly, based on strong evidence (data) from an effective investigative process, the child welfare system could reduce its rate of return (children who have a future referral/incident) down from 25% to 18%. (Source: Schwartz et al., 2017).

Big Data Analytics & Development Evaluation 87 Some promising analytical tools are being developed to process massive datasets to model complex emergencies and other humanitarian situations, such as the migration of refugees, the complex dynamics of slavery and human trafficking, and forced population movements as a result of massive forest fires. Systems analysis is also being used to help programmes identify the most effective combination of interventions in complex systems (e.g. identifying the most effective options or combinations of options to reduce malnutrition and stunting in a particular region). Big data can also be used to present complex data in maps and graphs that are easily understandable to managers and local communities, and that permit users to focus on specific geographical locations or topics of interest. Box 3.2 illustrates how the four stages of the data analytics cycle were used to improve the performance of a child welfare programme for youth facing problems at home or school. The programme used existing administrative data to assess the effectiveness of current procedures for handling referrals relating to problems at home or school, and using this analysis to develop predictive models, which could then be used to provide recommendations on how each referral should be treated. The implementation of the model would be able to significantly reduce the proportion of future referrals. The use of predictive analytics made it possible to provide specific recommendations for each youth, rather than just providing general guidelines for all youth as was the previous practice using frequentist (experimental) analysis. Causal modelling was also used to reduce/mitigate selection bias. The Data Continuum When discussing development evaluation strategies, it is useful to distinguish between big data, “large data” (including large surveys, monitoring data and administrative data such as agency reports) and “small data” (the kinds of data generated from most qualitative and in-depth case study evaluations and supervision reports). At the same time, the borders between the three categories are flexible and there is a continuum of data, rather than distinct categories (Figure 3.3), and the lines between the three are less well defined. For example, for a small NGO, a beneficiary survey covering only several hundred beneficiaries would be considered small data, whereas in a country such as India or China, surveys could cover hundreds of thousands of respondents. A further complication arises from the fact that several small datasets might be merged into an integrated data platform so that the integrated dataset might become large. There is a similar continuum of data analysis. While many kinds of data analytics were developed to analyse big data, they can also be used to analyse large or even small datasets. Mixed method strategies refer to the combining of different kinds of data and of different kinds of analysis. Consequently, data analytics approaches that are designed for large

88 Michael Bamberger Types of Data Analysis Data Sources

Levels of analysis

BIG DATA

Big data analytics

LARGE DATA

Computer-based statistical analysis

SMALL DATA

Small data analysis combining quantitative and qualitative methods

Merging levels of analysis Mixed Methods Complementing big data analytics with qualitative (small) data analysis

Triangulation

Data Visualization and Dissemination

Multiple sources of small and large data are combined to create integrated data platforms

Figure 3.3 T he data continuum.

datasets can also be used to analyse smaller datasets, while qualitative analysis methods, designed originally for small datasets, can also be applied to big data and large data. For example, a national analysis of national and international migration in response to drought (big or large data) might elect a few areas of origin for the preparation of descriptive, largely qualitative, case studies. The NIT Ecology and the Linkages to Development Evaluation The main elements and actors in the NIT ecosystem include (Figure 3.4) the following. The main actors are the data producers (producers of primary data and organizations that produce the processed data), the data analysts, app developers and marketers, the data users and the data regulators. The data users include large institutional users (government agencies, UN agencies, development banks, universities and research institutions) and small users (NGOs, government agencies, development agencies and researchers). Most small users do not have the capacity to work directly with the primary data but must collaborate with a large user. As the scale of digital communication grows, the role of the government and industry regulators increases – as illustrated by the battles over security, privacy, net neutrality and how to deal with “fake news”, hate speech and online advertising. Regulation also plays a key role in ensuring accessibility and bridging the digital divide. Data generation involves the generation of new sources of primary data (such as satellite images, phone messages, social media, Internet messages

Big Data Analytics & Development Evaluation 89 and data generated by cell phones, including GPS location data) and the production of processed data that is accessible to users. Data analytics concerns the digitalization, integration, analysis and dissemination of big data. The rapid advances in data analytics are providing users with an increasingly sophisticated range of analytical tools, such as software for developing integrated data platforms (e.g. Tableau). Affected populations are the individuals and groups who are affected by how information about them is collected and used – an important part of the NIT ecosystem that is often forgotten and sometimes exploited. In Figure 3.4, they are linked by a dotted line to indicate that they are often not well integrated into the ecosystem. For example, people often do not know what information is being collected on them or how it is used. This is a potentially important issue in development programmes, as big data makes it possible to collect information on communities without them knowing, and this is often used to make decisions affecting their lives. Figure 3.4 also includes a simplified representation of the development evaluation ecosystem that identifies the main actors: evaluation offices, clients and stakeholders, and evaluation consultants. There is also an important link to the target populations affected by development programmes, and other populations that can be affected by development programmes and how they are evaluated. The relationship between the NIT ecosystem and development evaluation is often not well defined and needs to be strengthened.

Where Is the Big Data Revolution Headed? There is a steady move away from generation and control of digital data by a few large agencies with access to massive computing capacity to universal ability to generate, access and use data. However, these trends are taking place within different political and commercial frameworks, in which decisions on public and commercial access are regulated by governments, regulators (e.g. the European Union) and in many cases a few major commercial interests. It is important to take into consideration the consequences of commercial control of some big data. Many apps are proprietary, and their users frequently do not have information on how algorithms are defined or used. Consequently, there is concern about potential bias against the poor, minorities or other vulnerable groups (O’Neil, 2016). Among the most important trends, the “quantified self” refers to biometric, health and behavioural data that can give people more control over their health and lifestyle, and planners and marketers more information for the design of communities, social programmes or marketing techniques. The “quantified community” makes it possible for planners to design communities and cities that respond to or mould preferences and behaviour, using data from the quantified self, plus other sources such as mobility, and

90 Michael Bamberger

Main Actors • Data producers • Data analysts • Data users: o large institutional users o small users • Data regulators

Evaluation ecosystem • Evaluation offices • Clients and stakeholders • Evaluation consultants Data Analytics Organization, integration , analysis and dissemination of big data

Data Generation Generation of new sources of data

AFFECTED POPULATIONS Individuals and groups who are affected by how big data about them is used

Code

= Big data ecosystem

= Evaluation ecosystem

Solid arrows indicate strong linkages and dotted arrows indicate indicate weak linkages

Figure 3.4 The big data ecosystem and the linkages to the evaluation ecosystem.

how services are accessed and used. Like many other trends, this has both positive and potentially negative effects. The World Bank argues that digital technology promotes three kinds of “digital dividends” for development: social inclusion (through search and information), increased efficiency of public and private sectors (through automation and coordination) and innovation (through scale economies and information platforms). However, there is a danger that without strong human oversight, the benefits could turn into risks through increased control

Big Data Analytics & Development Evaluation 91 (information without accountability), inequality (increased automation without increased worker skills) and concentration (scale without competition) (World Bank, 2016).

Does Big Data Apply to Development Evaluation? Should Evaluators Care About It? Although the capacity for data collection and data analytics is more limited in many developing countries, there is a similar expansion of the applications of big data and ICT. Mobile phones, digital banking, satellite imagery and remote sensors are examples of digital technologies that are spreading rapidly. The development community are users rather than drivers of most of these initiatives, which have largely been promoted by the private sector. This is beginning to change, however, as more development agencies and NGOs begin to customize digital tools for their specific needs (Meier, 2015).2 Over the past decade, big data and ICT have played an increasingly important role in international development (Table 3.3), although there continue to be digital divides between urban and rural areas and among groups Table 3.3 N IT and data analytics applications used widely in international development Application

NIT and data analytics tools

Early warning systems for natural and man-made disasters

• Analysis of Twitter, Facebook and other social media • Analysis of radio call-i n programmes • Satellite images and remote sensors • Electronic transaction records (ATM, online purchases) • GPS mapping and tracking • Crowd-sourcing • Satellite images • Mobile phones • Internet

Emergency relief Dissemination of information to smallholder farmers, mothers, fishing communities and traders Feedback from marginal and vulnerable groups and on sensitive topics Rapid analysis of poverty and identification of low-i ncome groups

Creation of an integrated database synthesizing all the multiples sources of data on a development topic

• Crowd-sourcing • Secure handheld devices (e.g. UNICEF’s “U-Report” devices) • Analysis of phone records • Social media analysis • Satellite images (e.g. using thatched roofs as a proxy indicator of low- income households) • Electronic transaction records • National water resources • Human trafficking • Agricultural conditions in a particular region

Sources: Meier (2015), Letouzé (2012), UN Global Pulse website, Bamberger (2016).

92 Michael Bamberger and countries of different economic levels. Agriculture, health and education are often cited as development sectors where most progress has been made in the use of NIT. On a broader level, the increased ability to quantify and explain the dynamics of poverty is one of the areas where NIT can potentially offer the greatest contribution to human well-being (Table 3.3). An important consequence of big data for development is that more data is becoming available on difficult-to-access populations. One example is the recent census conducted in Afghanistan by combining an ongoing demographic survey, satellite imagery, other remote sensing data, urban data and geographic information system (GIS) statistical modelling. Data analytics were used to integrate the different data sources into a common platform, which was then used to generate information on the country’s population (UNFPA, 2016, cited in Bamberger, 2016) It is likely that many of the data sources used for programme M&E will soon be generated or synthesized using NITs rather than stand-alone M&E studies. While each individual dataset (monitoring data, sample surveys, management reports) would be considered large (or even small) on its own, when combined into an integrated database, they could be considered to constitute a big dataset, as analysing them will often require a large computational capacity. Data analytics have great potential to bring together and analyse multiple sources of data on a complex policy question (Box 3.1). Future M&E systems are likely to be closely linked to new types of management information systems that integrate programme identification, design, management, monitoring and evaluation into a single system. Development evaluation may gradually become one of the several outputs of such an integrated management information system rather than remaining a separate function that collects and analyses specially generated data from a range of quantitative and qualitative, primary and secondary data. If so, many evaluations will be based on data that was not collected specifically for evaluation, and evaluations will often be designed and analysed by data scientists rather than by conventional evaluators. While some of the data analysts may be familiar with conventional evaluation methods, many will not; moreover, many of the evaluations will use methods such as integration of multiple data sources, dashboards, data mining, predictive Bayesian analytics and machine learning. There are opportunities here for conventional evaluation to play a lead role in designing the systems for data collection, synthesis and analysis.

The Great Potential for Integrating Big Data into Development Evaluation In the rapidly evolving and increasingly complex world of international development, evaluation methodologies face several challenges, many of which can be at least partly addressed by NIT. These challenges can be broken into three broad categories: design, data collection and data analysis challenges.

Big Data Analytics & Development Evaluation 93

Box 3.2. Using an Integrated Data Platform to Help End Modern Slavery The massive migration of overseas foreign workers (OFWs) from the Philippines leaves many workers vulnerable to exploitation in conditions that can be considered slavery. Social Impact, in collaboration with Novametrics LLC, used data analytics modelling to create an integrated data platform combining all available data relating to Philippine OFWs and the different ways in which they had been exploited. The data were mainly drawn from Philippine government databases and published reports. The data included socio-economic indicators on the characteristics of the vulnerable populations and data on their limited social safeguards (labour laws and their enforcement, social safety nets, institutional mechanisms for access to money and land), as well as OFW perceptions and knowledge of the risks they were facing. Information was available on the social and economic characteristics of the different regions of the Philippines from which OFWs came and to which they returned. While many data sources provided aggregated data, information was also available from both government and NGOs on the migration history and earnings of individual OFWs in the main sectors in which they worked (e.g. clothing manufacturing and domestic services). An algorithm was created and trained using a Philippine non-profit service delivery register that worked to track the situation of overseas workers and particularly those who have returned to the Philippines as victims of slavery. Using the training data (which consisted of over 10,000 observations), the team was able to characterize the most vulnerable populations geographically and socio-economically. Around six months of work was required to clean and organize the data and to use machine learning to create an integrated data platform. The analysis of the data platform provided policymakers with a better understanding of the supply, demand and contextual factors leading OFWs to emigrate, the areas of the greatest concentration of servitude and slavery, and some of the policy options. The findings were quite complex, as there were many interacting factors, and many widely accepted assumptions were found not to be true or to be more nuanced. For example, the majority of workers who were victimized did not come from the poorest areas of the Philippines, and not all OFWs were earning significantly more than they had before they migrated. This project shows that data analytics have great potential for addressing complex policy issues, but also that significant time and resources need to be invested to create the data platform and to conduct the analytics. Source: Bruce and van der Wink, 2017.

94 Michael Bamberger Design challenges include •

•

•

•

developing theories of change that adequately describe emergent and complex programmes. Most current theories of change are based on assumptions of linear causal linkages between a limited range of inputs and a limited range of outcomes. These fail to capture the fact that most development programmes have several inputs that are mediated through multiple intervening variables and contribute to multiple outcomes that result from different combinations of factors in different locations. Furthermore, most theories of change fail to take into account the multiple contextual variables – political, economic, socio- cultural, ecological – that results in programmes that are implemented in identical ways producing significantly different outcomes in different locations. identifying a credible counterfactual for non-experimental evaluation designs (i.e. most evaluation designs). This requires identifying and collecting a set of variables that allow the project and comparison groups to be matched. identifying unintended outcomes. Many theories of change and evaluation designs in general only seek to estimate the extent to which intended outcomes have been achieved, and either do not seek to, or find it difficult to monitor unintended outcomes not identified in the programme and evaluation designs. developing complexity-responsive evaluation designs. The nature of complexity means that programme outcomes are affected by at least four dimensions: the nature of the intervention itself, the context in which the programme operates, the dynamics of the interactions among stakeholders and processes of non-linear causality. Each of these dimensions is very difficult to measure with conventional evaluation designs, and capturing the interactions among all four is beyond most evaluation designs.

Data collection challenges include •

•

high cost and time required for data collection. Data collection is the largest expense in most evaluations, so there is great pressure to reduce sample sizes to the minimum required to obtain a statistically acceptable power for aggregate comparisons at the level of the total sample. This severely limits the ability for granular comparisons between subgroups. difficult-to-reach groups. Many evaluations fail to collect representative data, or in many cases, any data on groups that are difficult or expensive to reach. In other cases, groups are excluded because of the dangers of reaching or interviewing them.

Big Data Analytics & Development Evaluation 95 •

• •

•

•

•

monitoring project implementation and processes of behavioural change. Most evaluations find it difficult to collect behavioural data in contexts that are difficult for interviews to reach, such as interactions among household members or groups such as gangs or drug-users. collecting qualitative data. Qualitative data, such as interactions between people or within or among groups, or audio-v isual data are difficult to collect or code. integrating different sources of data. Often the evaluation could have access to multiple sources of administrative data on a project, as well as a range of secondary data, but as each dataset has a different structure and often different metrics (units of measurement), it is too difficult, extensive and time-consuming to combine these different sources in a way that makes comparisons possible. enhancing quality control of data collection and analysis. The process of ensuring that the right individuals or households are selected is expensive and time-consuming, as is checking on the accuracy of the data collected. Furthermore, by the time that inconsistencies or missing data are identified, it is usually too late to return to the respondent to rectify the errors, so statistical adjustments have to be made in the analysis. collecting information on the temporal dimensions of programmes. It is often only possible to collect data over the period during which a project is being implemented. Usually funding for the evaluation is only available during project implementation, so it is very difficult to measure sustainability over time. It is even more difficult to collect data on the period before a project officially begins. sample design. For the reasons discussed above, there are frequently limitations on the kinds of data available to construct a satisfactory comparison group.

Data analysis challenges include •

•

•

incorporating multiple sources of administrative, survey and other data – which often have different formats and metrics – into an integrated data platform. It is even more difficult to integrate data from audio-visual and other non-numerical data. applying systems analysis to dynamic and changing programmes. Complexity-responsive evaluations often require the use of complexity science techniques such as systems mapping and social network analysis, which are difficult to do with conventional evaluation tools. analysis of big and large data sources. Many computers are not powerful enough to analyse the huge volumes of data that are generated from, for example, millions of phone records, satellite images or social media sites.

96 Michael Bamberger Big data, ICTs and the IoT can help address these evaluation challenges at each stage of a typical programme evaluation cycle: identification and appraisal, design, implementation, mid-term review, project completion report and the evaluation of programme sustainability (Table 3.4). Although evaluators have often been slow to adopt NIT approaches, there are examples where big data has been incorporated into most of the widely used evaluation methods (Table 3.5). UN Global Pulse has also conducted more than 100 proof-of-concept projects in cooperation with national development agencies, which show that a wide range of big data techniques for data collection, analysis and dissemination can be applied in developing countries (UNGlobalPulse.org/projects). Many of these techniques are already being used in other development fields such emergency relief, early warning signals and development research. Consequently, there is an extensive range of NIT techniques that have already demonstrated their viability in development contexts. It is now necessary to find ways to encourage evaluators to make fuller use of these techniques. Table 3.6 presents ten examples of big data techniques that are already being used in development evaluations. The examples briefly cover the use of satellites and drones, remote censors, mobile phones, social media analysis, big data analytics and predictive modelling, and the creation of integrated data platforms.

Big Data and Development Evaluation: The Need for Caution While recognizing the tremendous potential contribution that big data can make to development evaluation, it is also necessary to exercise caution with respect to the purpose, strengths and limitations of big data. The cost, ease and speed of big data analysis and dissemination are very appealing to managers, funders and policymakers. Consequently, they may overlook some of the drawbacks of remote data collection. Project supervision visits often provide a fuller understanding of the situation on the ground than can be obtained from remote big data collection. These visits may also serve to motivate field staff. And face-to-face data collection may also ensure that local communities have a way to communicate with programme management. Dangers of top-down extractive approaches. By eliminating the need to visit projects, big data collection can introduce a top-down decision-making process whereby decisions on priorities or project performance are made without consulting local communities. Equity and exclusion. Many sources of big data include a selection bias as they only collect information from people who, for example, use mobile phones or ATMs. The users of these services are more often urban, better- off, younger and male, and consequently there is a danger that the opinions or situation of poorer, rural, older and female groups may be excluded or underestimated. Studies have also shown that even when women use mobile

Table 3.4 W ays that big data and ICTs can strengthen programme evaluation Evaluation activities Big data

ICT

1. Project identification and appraisal Initial diagnostic Satellite images. Crowd-sourcing. studies and Analysis of social media and Mobile-phone surveys. defining affected Internet queries to identify populations. potential issues and problems. Sociometric analysis. 2. Project planning and design Developing a theory of change. Selecting the Identifying potential big evaluation data contributions to each design. design option. Building in mixed methods. Evaluating complex Predictive analytics and programmes. systems analysis to model complex systems and causal pathways. 3. Project implementation Developing early Twitter, call phone records warning systems. and satellites. Mapping and data visualization. Data collection. See Figure 3.1. Process analysis. Real-time feedback on project implementation (dynamic data platforms). Satellite tracking of population movements, growth of human settlements. Qualitative data. Analysis of text-based data. Collecting Satellite images can track contextual data. physical changes over large areas. Crowd-sourcing provides feedback on natural disasters, political protects and spread of disease. Quality control of data collection.

Monitoring behavioural change.

Twitter and social media. Phone and financial transaction records. Large surveys of household purchases (using smartphones to record food labels, etc.).

Online theory of change. Using smartphones to incorporate both quantitative and qualitative methods.

Smartphone and SMS messages See Figure 3.1. Video and audio recording during meetings, work groups, etc. Web-based M&E platforms provide better documentation of processes. Audio and video recordings

GPS-enabled phones/tablets can check location of interviewers and provide internal consistency checks. Randomly activated audio recorder can listen in to interview. Video and audio recordings can monitor behaviour directly. Sociometric analysis through smartphones.

(Continued)

98 Michael Bamberger Evaluation activities Big data

ICT

Sample selection.

Satellite images for area sampling and to select samples based on housing conditions. Calibrating satellite images with ground data.

Random routes. Automatically dialled samples (combined with human follow-up).

Big data analytics. Data visualization.

Rapid data analysis and feedback with smartphones.

4. Mid-term review Data analysis and data visualization.

5. Project completion Data analysis and Management of interpretation. multidimensional datasets. Smart big data analytics. Analysis of complex qualitative comparative case studies. Dissemination through data visualization. 6. Planning and implementing a sustainability strategy Evaluating Longitudinal big datasets. Periodic cell-phone and Internet surveys. sustainability.

Table 3.5 Big data and ICTs have been used to strengthen widely used evaluation designs Evaluation design

Example

1: Experimental and Using high-frequency metering data for high-quality quasi-experimental information about energy consumption and demand in designs rural solar micro-g rids in India. 1A. Randomized Tablet-based financial education in Colombia, using control trial savings and transaction data combined with survey and telemetric tablet data. 1B. Strong quasi- The Global Environment Facility (GEF) has used experimental quasi-experimental designs to assess the impact of design its programmes to protect forest cover and mangrove swamps. Satellite images and remote sensor data were combined with conventional survey and secondary data to construct comparison groups using propensity score matching. Time series data can be obtained from satellite images, permitting the use of longitudinal analysis. 1C. Natural Using changes in search query volume to assess the experiment effects of a major government tax increase on cigarette smoking in the United States. Canada, which did not have a similar increase, was used as the comparison group.

Big Data Analytics & Development Evaluation 99 Evaluation design

Example

2: Statistical modelling

Evaluating causal interactions between labour market shocks and internal mobility. Understanding labour market shocks using mobile-phone data. The Robert Wood Johnson Foundation evaluated its ten-year programme to improve health and safety in distressed US cities. This combined a quasi-experimental design, including comparison cities, with a theory of change. Given the size, complexity and duration of the programme, very large datasets had to be managed. QCA country-level data assessing factors determining impacts of women’s economic empowerment programmes at the national level. The World Bank India Social Observatory uses a participatory approach to involve women in the identification of the key questions that should be included in large community surveys to identify priority development issues. Community women are then involved in conducting the surveys and in the interpretation of findings. The surveys have been administered to over 800,000 households so data analytics are required for the analysis and synthesis of the findings. A review and synthesis study were conducted to assess the effects of micro-credit on women’s empowerment. The study used data analytic search mechanisms with customized key-word sequence to cover academic databases and online portals.

3: Theory-based evaluation

4: Case-based evaluation 5: Participatory evaluation

6: Review and synthesis approaches

Source: Adapted from Bamberger (2016).

Table 3.6 Examples of big data and data analytics approaches being used in programme evaluation A. Using satellite images and drones to strengthen the evaluation of an international programme to protect forest cover Satellite images and drones were able to significantly increase the number of indicators that could be used in the construction of propensity score matching designs for evaluating environmental protection programmes, including a programme to maintain forest cover in protected forest areas. New indicators included moisture content; slope and accessibility of land; distance to the nearest road, settlement and market; and indicators of illegal human activities such as logging, cattle-raising and illegal tourism. Satellite images also permitted the generation of time series data covering up to 20 years. [Source: Global Environment Facility, 2015] B. A digital evaluation design to evaluate the effects of attitudes to race on online purchases An advertisement for an iPod was posted on an online market-place. Six versions of the ad were created and randomly displayed. Different versions of the ad showed the iPod being held in a white hand, a dark-skinned hand and with and without a tattoo. The number of follow-up clicks was used as an indicator of the effect of race on response rates. [Source: Doleac and Stein, 2013, Figure 3.1: cited in Salganik, 2018: 175–176] (Continued)

C. Combining digital and analogue data to evaluate programmes to reduce electricity consumption A number of researchers have taken advantage of the fact that domestic electricity consumption is continually monitored to assess the effects of different messages on electricity use. In one series of experiments a sample of households received tips on ways to reduce energy consumption and also information about how their energy consumption compared with that of other families in the neighbourhood. In the pilot study, the information was hung on the front door, while in subsequent larger studies the information was mailed. In subsequent rounds refinements were included, such as adding a smiling emoticon for households with below-average consumption and a frowning emoticon for households with above average consumption. The impact of the messages on energy consumption was measured after different periods of time (e.g. one and three weeks). Subsequent researchers added refinements such as a message stressing the importance to the planet of reducing energy consumption. Due to the low cost of conducting the experiments, some studies were able to include millions of consumers. [Source: Schultz et al., 2007 cited in Salganik, 2018: 158–167]. A similar evaluation was conducted to assess supply and demand for rural solar energy micro-g rids in India. [Source: Poverty Action Lab-A] D. A natural experiment to assess the effects of a government tax increase on tobacco A government tax increase on smoking was being introduced in the United States. Changes in Internet search volume on themes relating to smoking were compared between the United States and Canada, where no tax increase was introduced, to assess the effects of the tax increase. It was fully recognized that this is only an approximate indicator of the effects of the tax. [Source: Ayers, Ribisl and Brownstein, 2011; cited in Letouzé et al., 2016: 237–238] E. Evaluating technical support and financial incentives to 1 million rice farmers to reduce carbon emissions and increase productivity The “Sticky Rice Block-Chain” project, which will involve 1 million smallholder rice farmers in South East Asia, is designed to encourage farmers to take measures to reduce carbon emissions, such as carefully scheduled watering and use of fertilizers. Farmers who comply will receive credits that are controlled through a private block-chain. Farmers in each village will be randomly assigned to the treatment and control groups. The project will involve a large team of extension workers providing support and collecting data so that the evaluation will combine digital data from satellites, drones and remote sensors (used to monitor compliance with the required procedures), with block-chain reports, and field survey data (collected through smartphones) and extension reports. Social media analysis will also be conducted on a dedicated Facebook account and possibly messages on pubic social media sites. A multimethod pre-test/post-test randomized control trial design will combine data from satellites and drones, field surveys and block-chain with a formative evaluation design using data from extension workers. Audio-visual data may also be collected and analysed (on farming activities and group meetings) through smartphones. A time series design will also be incorporated as satellite, drone and remote sensor data will continue to be collected (at a very low cost) over a much longer time period than is normally possible with conventional evaluation methods. [Source: Bamberger and Gandhi, 2018].

F. Evaluating the effectiveness of social media campaigns to promote voter registration Social media campaigns were conducted in Mexico and Pakistan to encourage women to register to vote. In Mexico, the social media analysis was conducted on Twitter, which was identified as the site most widely used by women, while in Pakistan the analysis focused on radio call-in programmes. The Mexican study spent several months identifying the hash tags (#) with the most extensive discussion of election-related issues. These included sites promoted by UN Women. An analysis was made of the frequency of references to registration to vote and of the sentiments (positive and negative) towards this theme, at different points in time to examine trends. [Source: UN Women – source details to be included] Another study in South Africa assessed the impact of an online advocacy campaign on the decision of young people to register to vote. [Source: details to be included] G. Using data on cell-phone airtime purchases (”top-ups”) and cell-phone activity to identify phone insecurity and to develop multidimensional poverty indicators This study assessed the potential use of mobile-phone data as a proxy for food security and poverty indicators. Data extracted from airtime credit purchases (or “top-ups”) and mobile-phone activity in an East African country was compared to a nationwide household survey conducted by WFP at the same time. Results showed high correlations between airtime credit purchases and survey results referring to consumption of several food items, such as vitamin- rich vegetables, meat or cereals. These findings demonstrated that airtime credit purchases could serve as a proxy indicator for food spending in market- dependent households. In addition, models based on anonymized mobile-phone calling patterns and airtime credit purchases were shown to accurately estimate multidimensional poverty indicators. [Source: www.unglobalpulse.org/projects/ mobile-CDRs-food-security] H. Using data analytics to evaluate a programme for at-r isk youth in Florida [possible title] The study used a data analytics model to increase the efficiency of the systems used to decide the appropriate actions that a County Sheriff ’s Office should take on incidents at home or school reported to child welfare services. Using machine learning to integrate data from case workers and child welfare organizations, the model was able to provide recommendations on how to deal with each individual referral, rather than only providing general guidance on how to treat all cases, as had been the previous practice. The prescriptive model found that when decisions are made correctly, based on strong evidence (data) from an effective investigative process, the child welfare system could reduce its rate of return (children who have a future referral/incident) down from 25% to 18%. [Source: Schwartz et al., 2017] I. Using NIT to conduct a national census in Afghanistan where data collection was constrained by security concerns Security concerns have prevented the Islamic Republic of Afghanistan from holding a census since 1979. The United Nations Population Fund (UNFPA), in collaboration with the non-profit data group Flowminder, was able to generate maps of the population by combining an ongoing demographic survey, satellite imagery, other remote sensing data, urban data and GIS statistical modelling. [Source: Flowminder, 2016, cited in UNFPA, 2016]

102 Michael Bamberger phones, there may be restrictions on use (e.g. they purchase less airtime) or their use of the phones may be regulated by male partners or parents. The challenges of supply-driven app development. Most apps used by development agencies and evaluators have been developed by tech companies based on their perceptions of what will sell. Often these apps do not provide some of the features that evaluators require; for example, there are no mechanisms for users to provide feedback. An increasing number of development agencies are now designing their own apps or working closely with developers, but there is still a supply-driven bias. The dangers of proprietary algorithms and how they are marketed. Many widely used apps may include (usually unintended) biases against minorities or the poor. For example, several widely used apps for screening job applicants, college entrants or credit applicants in the United States use zip codes as one of their screening criteria. As zip codes are associated with economic status and race, this will often screen out low-income and minority applicants. As the algorithms are considered proprietary and hence confidential, the client (the university, hiring company, government agency or credit company) will often not be aware of the inclusion of this selection criterion (Eubanks, 2018; O’Neil, 2016). Understanding the underlying assumptions of big data analytics. Data analytics are based on a series of assumptions and approaches that are different from those used in most evaluations. Hence, it is important for managers and other users of evaluation studies to understand these differences and how they affect the way that findings and recommendations should be interpreted. One important difference is that data analytics are based on correlations (associations between different variables) and do not seek to explain causality. Data analytics makes predictions about the effects of different interventions without explaining why these of how effects are produced. While this works well for the analysis of online marketing strategies that are only concerned to increase click-rates (“if you change the colour and font size more people will click on the link to your advertisement”), this is not sufficient for development agencies. Agencies need to know why and how an intervention produced a certain outcome and whether the same result will be obtained in other contexts and with different groups. Privacy and security concerns are discussed in Chapter 4.

Overcoming Barriers to Big Data Use in Evaluation Several challenges must be addressed when considering whether and how to incorporate NIT into development evaluation. Some arise from the different contexts (or ecosystems) in which evaluators and data scientists work. First, many evaluation designs tend to make evaluators conservative in their approach. Considerable time and resources are invested in developing and testing sampling frames and data collection instruments that will be replicated over time in pre-test/post-test comparison designs. The logic of these

Big Data Analytics & Development Evaluation 103 designs requires that the same data collection instruments and the same sampling frame will be applied at two or more points in time. Efforts are made to ensure data quality and reliability and to avoid selection bias. So, inevitably there is resistance to changing methods of collecting data and selecting samples (Bamberger, 2016). In contrast, big data technologies are dynamic. Given data analysts’ access to real-time data (that is being constantly updated) and to very large samples, their approach to issues such as data quality and selection bias is different from that of evaluators. Many evaluators feel that data analysts do not take these issues seriously, and this affects the attitude of many evaluators to data science. Second, most development programmes have a much longer decision- making cycle than is often the case for the situations in which data analysts work. For example, programmes that are providing infrastructure such as houses, water supply or sanitation services often have construction cycles of at least six months and often several years. That means they can make only limited use of real-time data, as even agile programmes cannot make short- term adjustments (Bamberger, 2016). However, some kinds of development programmes do have the flexibility to adapt on the basis of real-time data, such as programmes that use social media for education, information sharing, awareness raising and empowerment. Emergency relief programmes would be another example. There is also a concern among some evaluators about potential competition for funds with data centres. It is still too early to judge how real the concern is, but it certainly affects the thinking of some evaluators. Evaluators and data scientists use different frameworks and analytical tools and many people in each group have limited understanding of the approaches of the other. Data analytics makes extensive use of real-time data, for example, while evaluators are more familiar with data generated from surveys, project records and official statistics. Also, the two groups have different approaches to issues of bias, data quality and construct validity. The two groups differ with respect to the role of theory. Most evaluators use a theory-based approach to evaluation design, but the role of theory is less clear for many data scientists. While some argue that high-speed data mining and iterative correlation analysis of very large datasets eliminate the need for theory, others argue that any kind of data mining must be based on an implicit theory. The perceived lack of a theoretical framework, and the resulting danger of reliance on possibly spurious correlations, is one of the criticisms that evaluators often level at data scientists. A related issue concerns attribution and the difference between correlation and causality. Experimental, quasi-experimental and several other evaluation approaches are designed to assess causal relationships between project design and outcomes by controlling for other factors that could explain the outcomes (rival hypotheses). Evaluators argue that correlation-based methods cannot explain causality, which seriously limits their practical utility to policymakers and programme managers. However, data analysts argue that

104 Michael Bamberger with sufficiently large datasets covering a much wider range of variables and with constant updating of the data, it is possible through techniques such as predictive analytics to identify groups most at risk, for example, and those who are likely to respond to different kinds of intervention. In contrast, experimental designs seek to explain causality at a point in time five or more years ago (when the project began). In a rapidly changing world, such historical data may have limited use. There is clearly a need to find ways to combine both experimental designs and predictive analytics. One of the potential benefits of big data analytics is that it is often possible to work with the total population, while most evaluation designs usually work with small samples because of the cost and time involved in collecting data. However, evaluators argue that data analysts’ claims of working with total population data can be misleading as it is often difficult to ensure complete population coverage, so the issue of selection bias must be addressed. Also, survey researchers spend considerable time cleaning data and trying to ensure a high level of data quality. This kind of data quality assurance is usually not possible with big data, so the quality of the big data may be questionable. And much big data may be generated through proprietary algorithms, so it is often not possible to check how the data was generated. A criticism of some data analysis, particularly media analysis, is that it processes extensive information covering a very short period of time but may ignore the historical context. For example, publicly available Twitter feeds normally only cover one week (Felt, 2016). In contrast, many evaluations try to capture the historical context, recognizing that this is important for interpreting data covering the present. Future Scenarios for Development Evaluation in the Age of Big Data Incorporating appropriate NIT data collection and analysis tools can strengthen many development evaluations by addressing the design, data collection and analysis challenges that they face. NIT also opens up enormous opportunities for access to new source of information and knowledge, and to new and wider social, political and economic networks. And NIT provides citizens with new access to centres of power and influence, which could give grass-roots communities and vulnerable groups ways to make their voices heard. While NIT are being widely adopted in development programmes, evaluators and evaluation offices have been much slower to adopt these new technologies. This is due to several factors. Institutional links between big data centres and evaluation offices are weak. Many development agencies have begun to establish data development centres. These centres often do not work closely with the evaluation office, however. The data centre tends to be staffed by professionals with a background in data science or mathematics, and little training in evaluation methods. Many evaluation offices, for their part, are not very familiar with the work of data centres. Often evaluation management has not seen the need to

Big Data Analytics & Development Evaluation 105 encourage the two offices to work together more closely. There is even a concern among evaluators that part of their budgets may be transferred to the recently established data centres. Access and use of big data are limited. While some kinds of big data are becoming increasingly accessible to a broader range of organizations and users without a background in data analytics, access to many kinds of big data is still limited by cost and expertise, as well as political and proprietary considerations. For example, satellite data, phone records, digital transaction data such as that from ATMs, and large-scale social media data such as Twitter, and data from many apps are difficult or expensive to access. Many small-scale users will have to meet conditions imposed by a large-scale institutional user through which they can gain access, such as a UN agency, bilateral donor or university. Evaluation capacity development does not normally include an introduction to big data and vice versa. Most training programmes for evaluators tend not to include an outline of big data and the tools and techniques of data analytics. For example, 2017 was the first year that American Evaluation Association offered professional development workshops on an introduction to big data. Data scientists come from a range of different professional backgrounds, but few will have training in programme evaluation. There is a gap in outlook towards data and its usage. Data scientists and evaluators do not always see eye to eye on issues such as the role of theory, approaches to data mining, and prediction versus the analysis of causality. Evaluators have several concerns about the commercial, political and ethical nature of how data are controlled and used by data scientists. All of these issues are exacerbated by the fact that most evaluators are not very familiar with data analytics, while many data scientists have not been trained in conventional evaluation methodology, as mentioned earlier. At the same time, many development agencies are beginning to create integrated information systems that bring together the different sets of data generated within development programmes but which until now have not been extensively used because of difficulties collecting and analysing these large datasets. As these integrated datasets become more widely used and easier to analyse, it is likely that they will start to be used to assess programme performance and the achievement of outputs and outcomes, in some cases replacing conventional evaluation. If this starts to happen, data centres may begin to take over some of the assessments and activities that were traditionally conducted by evaluators, given the lack of coordination between data centres and evaluation offices. Another area of concern is the ease with which information can be obtained remotely without having to interact with target populations. This makes it possible for planners and policymakers to design projects and to monitor progress in a top-down way that prevents beneficiaries from participating. While these trends are still at an early stage, it is vital to monitor the evaluation of data science and its links with evaluation.

106 Michael Bamberger New Skills Required for Evaluation Offices, Evaluators and Data Scientists Given the speed at which big data and NIT are evolving, it is essential to include the basic principles of big data and data analytics in the core curriculum for evaluators. Current evaluators also need to be brought up to speed on these approaches through workshops, the more systematic use of social media analysis, and pilot collaborative programmes with in-house data centres and external big data programmes. Skills development for evaluators should cover the below, most of which have been covered in detail earlier in the chapter: • • •

• • •

• • •

• •

types of big data and their potential applications to evaluation. steps in the creation and utilization of big data. social media analysis (including social network analysis). The analysis of data from Twitter, Facebook and other social media sites is one of the areas where most progress has been made on integrating big data and evaluation. The analysis of radio call-in programmes has also shown promise (UN Global Pulse citation) but to date has been used less frequently as the data is not so easily accessible and more work is required. introduction to data analytics and the data analytics cycle. introduction to Bayesian statistics and predictive analytics. A basic understanding of these techniques is essential as this is a key element of data analytics and one with which most evaluators are not familiar. introduction to machine learning. This is another fundamental underpinning of data analytics with which most evaluators are not familiar. One of the key concepts that all evaluators should understand is how any kind of data (text, numbers, audio-v isual data, satellite images) can be transformed into a string of 0s and 1s so that all kinds of data can be combined and jointly analysed. creation and use of integrated databases. introduction to evaluation-related apps for smartphones and other ICT devices. collaboration with app developers to define the purpose and requirements for apps (to avoid the current situation where the market is supply driven and most development agencies do not have the knowledge or influence to be able to specify the kinds of apps that they really need. identifying opportunities for integrating big data tools and techniques. how data scientists view development evaluation, and their areas of scepticism and concern.

Many discussions are based on the assumption that evaluation is lagging behind in the application of new technologies and needs to catch up. However, there are a number of issues and potential weaknesses in current approaches

Big Data Analytics & Development Evaluation 107 to using big data in evaluation, where data scientists could benefit from a better understanding of some of the foundational principles of evaluation: • •

•

• • •

introduction to the basic tools and techniques of development evaluation. the principles of evaluation surveys, including the need to work with small samples. There are many situations where access to big data is not easily available, and it is important for data scientists to understand these real-world challenges and how creative sources of big data, for example new mobile phone apps, may be able to help. theory-based evaluation and the theory of change; the importance of starting with the definition of evaluation questions. There is continuing debate on the role of theory in the analysis of large datasets, and many evaluators argue that approaches such as data mining can lead to fundamental issues such as social exclusion (among others) that may be overlooked. Data scientists argue that methods such as machine learning can overcome these concerns, but the debate is far from resolved and each side could benefit from a deeper understanding of the approaches used by the other. the value and challenges of collection and analysis of in-depth qualitative data. the main challenges facing development evaluation and how big data could contribute. understanding how evaluators view big data and their areas of scepticism and concern.

While it is useful to begin by incorporating new perspectives into the separate skills development activities of evaluators and data scientists, these activities should only be considered as first steps and it is essential to find ways to bring the two sides together. Some of the ways that this could be done include • • • • •

joint workshops to discuss some of the areas of difference and how they can be overcome, joint skills development programmes, pilot research projects where the two sides can jointly work on collaborative methodologies, assigning evaluators to work in data centres and vice versa, requiring agency data centres to collaborate with evaluation offices and vice versa.

As we move forward, data scientists and evaluators will have to work together and understand the conceptual idea behind their respective works and how big data might help solve some of the teething gaps in data we

108 Michael Bamberger might experience. Ultimately, ICTs are the source of as well as the means for processing big data. As ever newer technologies emerge, the nature and scope of big data might change. This will change the technical parameters of the usage and utility of big data in evaluations. However, the human element of partnering and capitalising on complementary competencies will endure. As the first chapter so amply emphasized, the necessity for partnerships remains absolute.

Notes 1 MERLTech (the applications of technology for monitoring, evaluation, research and learning) (MERLTech.org), ICTworks and Data for Social Good are just three examples. 2 ICTworks (www.ictworks.org) provides an extensive resource on new applications of NIT in development programmes.

References Accenture and Bankinter (2011), “The Internet of Things: In a Connected World of Smart Objects”, Accenture and Bankinter Foundation of Innovation, Madrid, www. fundacionbankinter.org/documents/20183/137558/Publicacion+PDF+IN+FTF_ IOT.pdf/2783707e-b729-45b2-98eb-1ba52b652b37. Alpaydin, E. (2016), Machine learning: the new AI, MIT Press, Cambridge, Massachusetts. Ashton, P., R. Weber, and M. Zook (2017), “The Cloud, the Crowd, and the City: How New Data Practices Reconfigure Urban Governance”, Big Data & Society, Vol. 16 May, Sage, London, http://journals.sagepub.com/doi/ full/10.1177/2053951717706718 (accessed 29 May 2018). Ayers, J. W., K. Ribisl, and J. S. Brownstein (2011), “Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States’ 2009 ‘SCHIP’ Cigarette Tax Increase”, PLoS ONE, Vol. 6/3, p. e16777. https://doi.org/10.1371/journal.pone.0016777. Bamberger, M. (2016), Integrating big data into the monitoring and evaluation of development programmes, UN Global Pulse, New York, www.unglobalpulse.org/ big-data-monitoring-and-evaluation-report (accessed 29 May 2018). Bamberger, M. and V. Gandhi (2018), “Using Big Data to Strengthen the Evaluation of the Block Chain Carbon Reduction Program for Small Rice Farmers: Exploring Potential Evaluation Designs”, Presentation at the European Evaluation Society Annual Conference. Thessalonica, Greece, October 2018 (unpublished). Bamberger, M., L. Raftree, and V. Olazabal (2016), V, “The Role of New Information and Communication Technologies in Equity-Focused Evaluation: Opportunities and Challenges”, Evaluation, Vol. 22/2, pp. 228–244. Bruce, K. and G. van der Wink (2017), “How Data Analytics Can Help Us to End Modern Slavery: Examples from the Philippines”, Professional Development Workshop, American Evaluation Association, Washington, DC, November 2017. Doleac, J. L. and L. Stein (2013), “The Visible Hand: Race And Online Market Outcomes”, Economic Journal, Vol. 123/572. https://doi.org/10.1111/ecoj.12082.

Big Data Analytics & Development Evaluation 109 Eubanks, V. (2018), Automating Inequality: how high-tech tools profile, police and punish the poor, St. Martin’s Press, New York. Felt, M. (2016), “Social Media and the Social Sciences: How Researchers Employ Big Data Analytics”, Big Data & Society, Vol. 29 April, Sage, London, http:// journals.sagepub.com/doi/abs/10.1177/2053951716645828 (accessed 29 May 2018). Flowminder (2016), “Remote Sensing Technology Complementing Official Statistics: High-Resolution Population Mapping in Afghanistan”, United Nations Population Fund, www.unfpa.org/sites/default/files/event-pdf/FINAL-A fghanistan_ RS_Project_EB_mtg_2_Feb_17-ajt.pdf. Gay, K.E.M. and P. York (2018). “A New Way to Use Data: Precision Care for Better Outcomes in Psychiatric Residential Treatment for Children”, Scattergood Behavioral Health Foundation, February 2018, www.scattergoodfoundation. org/wp-content/uploads/yumpu_files/A_New_Way_to_Use_Data.pdf. Global Environment Facility (2015). “Impact Evaluation of GEF Support to Protected Areas and Protected Area Systems”, 49th GEF Council Meeting October 20–22, 2015. Washington D.C. ITU (2014), Measuring the Information Society Report 2014, International Telecommunication Union, Geneva, www.itu.int/en/ITU-D/Statistics/Documents/ publications/mis2014/MIS2014_without_Annex_4.pdf. Kontokosta, C.E. (2016), “The Quantified Community and Neighborhood Labs: A Framework for Computational Urban Science and Civic Technology Innovation”, Journal of Urban Technology, Vol. 23/4, Taylor and Francis, London, www. tandfonline.com/doi/full/10.1080/10630732.2016.1177260 (accessed 29 May 2018). Letouzé, E. (2012), Big data for development: Challenges & opportunities, UN Global Pulse, New York, www.unglobalpulse.org/sites/default/files/BigDatafor Development-UNGlobalPulseMay2012.pdf (accessed 1 June 2018). Letouzé, E., A. Areais, and S. Jackson (2016), “The Evaluation of Complex Development Interventions in the Age of Big Data”, in Bamberger, Vaessen and Raimondo (eds.), (op. cit.). Marr, B. (2015), Big data: Using SMART big data analytics and metrics to make better decisions and improve performance, Wiley, Chichester. Meier, P. (2015), Digital humanitarians: How big data is changing the face of humanitarian response, CRC Press, Boca Raton, Florida. O’Neil, C. (2016), Weapons of math destruction: How big data increases inequality and threatens democracy, Crown, New York. Peng, R. and E. Matsui (2016), The art of data science: A guide for anyone who works computing.co.uk/learning/wp- with data, Skybrude Consulting. http://bedford- content/uploads/2016/09/artofdatascience.pdf. Siegel, E. (2013), Predictive analytics: The power to predict who will click, buy, lie or die, Wiley, Hoboken, New Jersey. Schwartz, I., P. York., E. Nowakowski-Sims, and A. Ramos-Hernandez (2017), “Predictive and Prescriptive Analytics, Machine Learning and Child Welfare Risk Assessment: The Broward County Experience”, Children and Youth Services Review, Vol. 81(C), pp. 309–320. UN Global Pulse (2015), “Understanding Immunisation Awareness and Sentiment Through Social and Mainstream Media”, UN Global Pulse, Jakarta, www.u nglobalpulse.org/sites/default/files/UNGP_ProjectSeries_Perception_ Immunisation_2014_0.pdf (accessed 29 May 2018).

110 Michael Bamberger UNFPA (United Nations Population Fund) (2016), Big data and the sustainable development goals, background paper prepared for the joint meeting of the executive boards of UNDP, UNFPA, UNOPS, UNICEF, UN Women and WFP, 3 June 2016 (unpublished). Wolf, G. (2010), “The Quantified Self”, TED@Cannes (conference), www.ted.com/ talks/gary_wolf_the_quantified_self (accessed 29 May 2018). World Bank (2016), World Development Report 2016: Digital Dividends, World Bank, Washington, DC, www.worldbank.org/en/publication/wdr2016 (accessed 29 May 2018).

4 Technology, Biases and Ethics Exploring the Soft Sides of Information and Communication Technologies for Evaluation (ICT4Eval) Linda Raftree There has been extensive discussion this far on various examples of technologies and data that they produce, collect and analyse and how they might be able to fulfil the ambition of SDGs. However, as the sector moves into this era of “digital development”, many large-scale efforts are emerging that aim to improve how we use data for development. More specifically, there is a move towards understanding the best practices to learn from and pitfalls to avoid. The Global Partnership for Sustainable Development Data (GPSDD), for example, has over 280 partners who are endeavouring to learn from each other and develop common mechanisms that would lead to a coherent data ecosystem. As part of the process, the GPSDD initiated the Data Roadmaps effort, which aims to overcome some of the major barriers to the use of data for the Sustainable Development Goals (SDGs). Those barriers are numerous, inclusiveness of data being the most important. Data on entire groups and key issues are not always available or in the form that is needed. Data are not dynamic or disaggregated. Quality of data is poor. The data that are available are seldom useable, and when they are useable, they are frequently not open or accessible. When data are accessible, they are often not used effectively (Khagram and Agrawal, 2016). A gender data gap analysis revealed that data on health, education, economic opportunities, political participation and human security are missing or not regularly collected in all countries. In about half of these areas, data cannot be compared, they lack complexity, or they lack granularity and the detailed datasets that would allow for disaggregation (Data2x, 2017). Without solid and inclusive data, the inclusiveness and impact of the SDGs will be difficult to track and measure. This would possibly “leave many behind” in their representation. Counter to this effort to obtain more data, granular data, individualized data and geo-located data on the most vulnerable, is a growing movement calling for a more ethical approach to data and greater attention to the rights and privacy of data subjects, especially the most vulnerable. New information technology (NIT) and big data have been covered in detail in the previous chapter. One concern is that data collected through NITs and big data approaches reflect existing inequalities in technology use and access,

112 Linda Raftree meaning that they carry respondent bias. Additionally, analysis of big data tends to carry with it the unconscious assumptions and bias of those who create the algorithms that provide conclusions. Because big data collection and analysis are normally carried out far from those whose data are being analysed, contextual and cultural awareness may be missing, leading to biased interpretation. This can be exacerbated when data are collected remotely and/or without the explicit knowledge of the individuals providing the data. As a recent paper on big data and evaluation notes, elite capture, restricted use and ambient sexism lead to biases and exclusions on conclusions or recommendations, which may perpetuate existing social inequalities, including gender inequalities. Before we decide to embark on an evaluation based on big data, we need to ask ourselves: Who is not heard? Whose realities are not reflected in the data? What is the impact of these exclusions when using data to inform programmes, advocacy or policy change? (Abreu Lopes, Bailur and Barton-Owen, 2018) There are also concerns about the privacy and safety of the data on the most vulnerable. The unequal power dynamics between programme participants (beneficiaries) and agencies, researchers and evaluators mean that consent may be forced or coercive. This is not a new problem, but the use of biometrics ID systems and mobile-phone numbers for benefit tracking have brought this question into new light. A 2018 review of the use of biometrics, for example, concluded that the potential risks for humanitarian agencies of holding vast amounts of immutable biometric data – legally, operationally and reputationally, combined with the potential risks to beneficiaries – far outweighed the potential benefits in almost all cases (The Engine Room and Oxfam, 2018). Additionally, digital data are easy to replicate and share widely. Once data are “opened” or shared in digital format, there is very little control over how and by whom and for what they are used. This raises the question of whether consent for use can truly be given, especially when there can be no assurance about where a person’s data will end up. This chapter covers ethical aspects related to inclusion, bias and privacy in NITs, data science and digital data, and explores how they impact on evaluation and programmatic decision-making in the area of development.

Factors Affecting Information Technology Access and Use Among the Most Vulnerable The relatively low cost and simplicity of mobile phones and the widespread excitement about these technologies have made them seem ubiquitous. Data on the sheer number of phones or SIM cards mask nuances related to

Technology, Biases and Ethics 113 access, however, and they do not account for multiple phones or SIM cards being owned by the same person. The truth is that access and use of such information and communication technologies (ICTs) as phones are complicated. They depend on many factors, including gender and gender identity, age, location, sexual preference, economic status, refugee or citizenship status, disability, health, education, race, religion, ethnicity and political leanings. The intersection of various factors that lead to exclusion can increase marginalization as well as limit access and use of technology. Even if more marginalized people and groups can access mobile phones or the Internet, other factors affect how and when this happens and whether people share information or express themselves via that channel. That is why it is critical to understand context and culture when designing tech- enabled data collection for evaluation (Box 4.1).

Box 4.1. Factors That Affect Access and Use of NIT, Which Can Result in Biased or Incomplete Data Access – What direct access do different groups have to ICTs? Do individuals own a device via which they can receive/share or connect to information? Do they share one? With whom? (and how does this affect how they use the device?) Can they easily borrow a mobile or computer? How often? Do some members of the family or community have more access than others? Age – What age group is involved or targeted? Does information need to be adapted to certain age groups? Do the very young and/ or very old have access? Do they have resources to cover costs of accessing information or communicating via ICTs? How does age affect privacy? Capacity – What skills are needed to access and use a device? Does the target population have these skills? Conflict and emergencies – Will conditions of conflict or emergency affect access and use? How will they affect willingness to share information or consequences of doing so? Connectivity – Is there a network? Is it reliable? Is it steady or intermittent? Slow or fast? How do the connectivity speed and device type affect what information can be accessed or shared? How does this shape the design of a data gathering exercise? How does it influence whose “data exhaust” – the trail of information left behind as a result of a person’s digital activity – might be available for big data analysis? Cost – How much does it cost to own a device? To borrow one? To use one? To access information on one? How does cost affect access and use? Who is left out because of cost?

114 Linda Raftree Disability – Do ICTs hinder or facilitate those with a disability from accessing or using a device or the Internet, or from participating in an information gathering exercise? Can ICTs help make information more accessible to those with a disability? Economic status – Will those with greater economic capacity have more of an opportunity to communicate their points of view and influence a programme or the outcomes of an evaluation? How might their disproportionate input bias the data? Language – How does content in a particular language create data bias? Who is left out or ignored due to language? Literacy – What are the levels of literacy of the target population? How do they vary and who is left out because of this? If literacy is low, what alternatives have been considered? Voice? Radio? Power – Will the more powerful community members be more privileged because of access, literacy or the language of the information shared? Who is being left out by a particular choice of data gathering method or tool? How will this be addressed in the evaluation design and who will keep an eye on this during the data collection and analysis? Protection – Does access to a device or to content put people at risk in any way? Does risk arise because of the value of the device, the information they may access via the device, or the fact that others can reach them through the device? Or does it spring from perceptions about their having a device, or access to particular information or communication channels? How does this introduce bias into responses or big data being used to draw conclusions about a group or geographic area? Privacy – What are people’s perceptions of privacy as relates to a particular device, application, or to your organization? How will this affect responses to an information gathering exercise or their overall Internet use? Security – Is there any type of perception (or actual incident) where there has been a digital data breach? How might that affect willingness to share information with an evaluator or on a digital platform? Is consent language plain and clear? Trust – Do people trust your organization, the platform and/or information source or the entity that is generated the request to participate in an information gathering exercise? Are you able to detect instances where mistrust is affecting responses or response rates without a physical presence? Adapted from Raftree, Appel and Ganness (2013)

Technology, Biases and Ethics 115 In the case of marginalized adolescent girls living in urban and semi-urban areas, for example, Girl Effect and 2CV found that whether a girl owned or borrowed a phone, who she borrowed from, and whether her use was free or supervised, influenced what she used the phone for, the amount of time spent on the phone, the number of sites she visited and applications she used, and how free she felt to express herself on services like WhatsApp or Facebook (Girl Effect and 2CV, unpublished). On the other hand, one organization reported that in mobile-based surveys about male rape in the Democratic Republic of the Congo, participants seemed to be more comfortable answering questions and reporting sexual assault when the survey was conducted via mobile phone, because the respondent could answer in private (Raftree, 2014).

Data and Technology Alone Cannot Ensure Inclusion Research from the multi-agency programme Making All Voices Count showed that beyond availability of information and access to technology, several factors influenced whether an individual engaged in efforts related to government transparency and accountability, such as reporting corruption or participating in activities focused on governance. Some of the core messages from the research are particularly relevant to inclusion and the role of technology in evaluation because evaluation is often aimed at offering programme participants a space to voice their opinions about the actions of an entity (government or non-government) that holds power (Box 4.2).

Box 4.2. Using Technology for Accountability: Findings from Making All Voices Count • • • • • • •

Not all voices can be expressed via technologies. Transparency, information or open data are not sufficient to generate accountability. Technologies can support social mobilization and collective action by connecting citizens. Technologies can create new spaces for engagement between citizen and state. Technologies can help to empower citizens and strengthen their engagement. The kinds of democratic deliberation needed to challenge a systemic lack of accountability are rarely well supported by technologies. Technologies alone do not foster the trusting relationships needed between governments and citizens, and within each group of actors.

116 Linda Raftree • • • • •

The capacities needed to transform governance relationships are developed offline and in social and political processes, rather than through technologies. Technologies cannot overturn the social norms that underpin many accountability gaps and that silence some voices. A deepening digital divide risks compounding existing exclusions. New technologies expand the possibilities for surveillance, repression and the manufacturing of consent. Uncritical attitudes towards new technologies, data and the online space risk narrowing the frame of necessary debates about accountable governance. Source: McGee et al. (2018)

Some of the potential benefits of using technology include social mobilization, collective action, the opening of new spaces for participation and potentially stronger engagement with actors such as the state (or, in the context of this book, with programme implementers). When using new technology to capture data and feedback, evaluators must consider a variety of barriers to participation and how these barriers can influence representation. Additionally, technology alone does not foster trusting relationships and cannot overturn social norms that underpin accountability gaps, and that silence some voices. These same concerns apply when seeking input from vulnerable populations about development programming. The digital divide exacerbates existing exclusions.

Inclusiveness of Access and Use Affect the Representativeness of Big Data Access to and use of mobiles and the Internet affect not only more traditional surveys that use these technologies but also the representativeness of data captured for big data analytics. Data captured may not reflect what a person actually thinks and does. A person may not have a data trail or data exhaust, or may be sharing or borrowing a phone or an account listed under someone else’s name. Additionally, big data sources are representative of different people and groups. For example, Twitter users tend to be younger, wealthier, more educated and more likely to live in urban areas than Facebook users, and the Twitter platform only represents a small proportion of the population, especially in low-income countries (Abreu Lopes, Bailur and Barton-Owen, 2018). If evaluators use big data, they need to use evaluation methods that ensure that the most marginalized or vulnerable are fairly represented. In addition

Technology, Biases and Ethics 117 to considering individual access and use of mobiles and the Internet, it may be helpful to think of data as coming from four different “buckets”. This way the type and source of data can be reviewed to determine whether data is inclusive, and if not, who is missing and how can those voices be included (Raftree, 2017). Different kinds of data present more or less stark choices for organizations using them and the end evaluands and users. 1 Traditional data. In this case, researchers, evaluators and/or enumerators are in control of the process. They design a questionnaire or data gathering process and go out and collect qualitative or quantitative data; they send out a survey and request feedback; they do focus group discussions or interviews; they collect data on digital devices or on paper and digitize it later for analysis and decision-making. The sampling process is tightly controlled and is deliberately constructed to fit a predetermined criterion of quality. However, such control of the quality and soundness of the data means that it is resource-intensive and of limited size. This kind of data represents the voice of those precisely selected by the agency and those who are intended to be heard for the purpose of the evaluation. 2 Found data. The Internet, digital data and open data have made it easier to find, share and reuse datasets collected by others. These tend to be datasets collected in traditional ways, such as by governments or agencies. The open data movement has also advocated that datasets created using public money should be made freely available for public at large. If datasets are digitized, have proper descriptions and clear provenance, if consent has been obtained for use/reuse and care has been taken to make them anonymous, this can eliminate the need to collect the same data over again. Data hubs are springing up that aim to collect and organize these datasets to make them easier to find and use. These data may not follow the sampling frame of the organizations that intend to use this data. Hence, it is vital to pay attention to the nature of the agency that collected the primary data, the nature of the survey and its sampling frame. Given that most data of this kind contain a clear methodology note, evaluators may control for or at least be aware of any biases and exclusions therein. 3 Seamless data. Development and humanitarian agencies are increasingly using digital applications and platforms, whether bespoke or commercially available ones. Users of these platforms can provide quantitative and qualitative data that help answer specific questions about their behaviours. This data is normally used to improve applications, platforms, interfaces and content but can also provide clues about a host of other online and offline behaviours, including knowledge, attitudes and practices. Such data raises more concerns about privacy than about inclusion given that agencies are able to clearly track the users of such applications. Agencies need to be aware that because this data is

118 Linda Raftree collected seamlessly, users may not realize that they are generating data or understand the degree to which their behaviours are being tracked and used for monitoring, evaluation, research and learning (MERL), even if they have checked “I agree” to the terms and conditions. Organizations should therefore consider whether they need to take further measures to protect privacy and obtain consent. This is especially as important now that the European Union’s General Data Protection Regulation (GDPR) has come into effect. The commercial sector is sophisticated at this type of data analysis, but development agencies are only just starting out. 4 Big data. In addition to big data that are generated “seamlessly” by platforms and applications, there are also “big data” sets that can be acquired (such as cell-phone data records) and data that exist on the Internet that can be “harvested” with the right techniques and tools (see Chapter 3). Development and humanitarian organizations are only just starting to better understand concepts around big data and how it might be used for MERL. Such data raise issues of both privacy and inclusion as there is no agency on the data of those to whom the data pertains nor does the organization that uses it know with full certainty on who it might be listening to. Mixed-method evaluation designs, which combine face-to-face data gathering with technology-enabled data collection methods and larger-scale data analytics, can help to address concerns about inclusiveness. For example, remote sensing and satellites offer new data sources that are being used in the monitoring as well as evaluation process (see Chapter 2). Evaluators have created multidimensional indicators of poverty by analysing changes in the volume of withdrawals from ATMs, records of electronic purchases of agricultural inputs, satellite images showing the number of trucks travelling to and from markets, and the frequency of tweets with words such as hunger and sickness. These can be combined with more traditional, participatory exercises that make sense of what the big data patterns are showing and what might be missing from the big data sources, and ensure that important contextual clues are not being missed (Bamberger, Raftree and Olazabal, 2016).

Bias in Big Data, Artificial Intelligence and Machine Learning The application of big data and big data analytics to development evaluation is still in its infancy. It is only in the past ten years that development and UN agencies have begun thinking about these data sources and their predictive potential, and even more recently that their role in evaluation has been examined (see Chapter 3). Though impressive capacity to process data exists, this capacity has advanced far more quickly than has human capacity

Technology, Biases and Ethics 119 to understand its implications, and ethical and legal frameworks have not yet caught up. In her book Weapons of Math Destruction, Cathy O’Neil details several recent cases in which big data algorithms have directly caused harm, including the financial crash of the late 2000s, school ranking, private universities and policing. Though some big data algorithms can be healthy – baseball managers use them to devise plays – O’Neil says this is only possible if algorithms are open and transparently created, if they can be scrutinized and unpacked, if unintended consequences are tracked and adjusted for when they are negative, and if the algorithms are not causing damage or harm. Unfortunately, in many cases, those creating algorithms purposefully target and/or take advantage of more vulnerable people. In the case of development, assuming that there is good intent, the question becomes one of the unintended consequences that could arise from creating algorithms where there is insufficient data. O’Neill notes that proxy indicators often stand in where there is an absence of hard data and can lead to perverse incentives and distortion of monitoring and evaluation systems that causes harm. If algorithms are not continuously tested and adjusted using fresh data, they can easily become stale. And if they are created by people with little contextual or cultural awareness of how a system actually works, they may well be based on the wrong indicators or proxies (O’Neil, 2016). Predictive capabilities could go a long way towards improving development approaches and outcomes. But humans are designing the algorithms used to make these predictions, so they contain persistent and historical biases. The attractive claim of big data is that it can turn qualitative into quantitative. Yet objectivity and accuracy claims are misleading. As Boyd and Crawford (2011) note, “working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth – particularly when considering messages from social media sites”. Bias can arise in big data algorithms if those who develop them lose control of them as machines start learning. Many algorithms are created and maintained by the private sector, and they are not transparent or open to scrutiny; thus, the biases in them are difficult to pinpoint and correct, even when they appear to exist. Many are calling for “algorithmic accountability” in the face of the opacity of these algorithms. In the United States, racial discrimination in algorithms has influenced criminal justice, law enforcement, hiring and financial lending. Though big data and artificial intelligence are thought to help humans make complicated decisions, big data predictive work has often harmed more vulnerable populations and people of colour. COMPAS, for example, is a widely used algorithm that aims to predict whether defendants and convicts are likely to commit crimes in the future. It generates a risk score that influences sentencing, bail and parole decisions. COMPAS was found to mislabel black defendants almost twice as often as it mislabelled white defendants, raising

120 Linda Raftree questions about whether data fed into COMPAS was reinforcing systemic biases and inequalities (Hudson, 2017). A similar bias has been detected in credit scoring algorithms. In 2017, there was a major outcry when Stanford researchers aimed to use facial recognition software to identify whether a person was homosexual (Murphy, 2017). Chapter 2 cites cases in which machine learning algorithms have been deployed to help evaluations. Hence, evaluators as well will need to be aware of the biases that might seep in. Another challenge with big data and its bias is that the “data exhaust” found on social media sites is not representative of the wider population, much less of the “marginalized” populations with which development agencies work, because of those populations’ limited access and because it’s not always possible to know who the account owner is (Bamberger, Raftree and Olazabal, 2016). Big data can also marginalize smaller, local organizations that may not have the expertise or funds required to obtain and analyse it, skewing research and power bias towards large, well-funded academic or international organizations. Inclusivity and representation are important not only in terms of how evaluators think about data subjects and data. They also matter in terms of how evaluation and big data teams are composed and how technology tools and data collection activities are designed. Technology itself is not neutral. As evaluators know, the design of a survey affects how it is answered. The same is true with technology-enabled data collection and analysis. Biases that intervene in the course of using ICTs in general and data sciences in particular could be tackled using social as well as technological measures. In terms of social measures, evaluation teams should strive to bring in diverse stakeholder perspectives and incorporate them into usage of ICTs. NITs and data science are fields with more men than women, so there is often an inherent bias towards a male perspective. “Researchers must be able to account for the biases in their interpretation of the data. To do so requires recognizing that one’s identity and perspective informs one’s analysis” (Behar and Gordon, 1996). That means evaluation teams should consider their own inclusivity. When designing tools or surveys, they should observe how various people and groups of people respond (or do not respond) in order to test for inclusiveness. Teams with a diverse set of backgrounds and experiences tend to be better at identifying conscious and unconscious bias. Partners with greater knowledge of context and culture can also help counter conscious and unconscious bias. In terms of using technology to combat biases in datasets and machine learning models, new tools such as IBM AI Fairness 360 are being introduced. Such tools can check for biases at several points along the machine learning pipeline, using the appropriate bias metric for their circumstances. These tools can also provide bias mitigation techniques that enable the developer or data scientist to reduce any discovered bias.

Technology, Biases and Ethics 121

Protecting Data Subjects’ Rights in Tech-Enabled, Data-Led Exercises As development agencies incorporate more digital tools and data processes into their work, data ownership, protection, privacy and security have come to the forefront. The development sector has increasingly adopted “Responsible Data” as an umbrella term for data practices that manage the tensions between privacy protection and data security, data for decision-making and data transparency. Responsible data practices prioritize the use of data for the benefit of data subjects and proactively analyse the benefits and risks to those who provide data or whose data is “found” and used in big data approaches (The Engine Room, 2014; Raftree et al., 2016). The development sector witnessed the risks in 2017 when security problems were identified in a software platform called Red Rose that several aid agencies – including Oxfam, CARE, the Norwegian Refugee Council, the International Organization on Migration, the International Committee of the Red Cross and Catholic Relief Services – use to store cash transfer information on vulnerable individuals. These security flaws left people open to having their identities and locations exposed and their aid entitlements faked or manipulated. This case brought to light the major risks that programme participants face when agencies collect their names, photos, fingerprints, physical addresses, ID numbers and/or iris scans to track and monitor aid benefits (Parker, 2017). In response to the breach, some called for a common standard and guidelines for how humanitarians can ethically and safely hold and secure digital data. “Critical incidents – such as breaches of platforms and networks, weaponization of humanitarian data to aid attacks on vulnerable populations, and exploitation of humanitarian systems against responders and beneficiaries – may already be occurring and causing grievous harm without public accountability,” noted some, who have called for an independent ombudsman to investigate such incidents (Raymond, Scarnecchia and Campo, 2017). In early 2017, the US Agency for International Development began developing its guidance on Responsible Data Practices, noting that development actors need to protect the privacy and security of individuals and communities that they serve and whose data they collect, use, share and hold (USAID, 2018). Other development agencies have recently issued guidelines for ethical data use that address data ownership, protection and security, including the International Organization on Migration (IOM, 2010), the Cash Learning Partnership (Cash Learning Partnership, 2013), the International Committee of the Red Cross (ICRC, 2013, 2016), Oxfam (Hastie and O’Donnell, 2017; Waugaman, 2016), Girl Effect (Girl Effect, 2016), UN Global Pulse (UNGP, 2016), the World Food Programme (WFP, 2016) and the Harvard Humanitarian Initiative (Greenwood et al., 2017).

122 Linda Raftree The development sector’s efforts to responsibly manage data have been bolstered by the European Union’s GDPR, which came into effect on May 25, 2018 (EU, 2016). The GDPR was designed to “harmonize data privacy laws across Europe, to protect and empower EU citizens’ data privacy and to reshape the way organizations across the region approach data privacy”. The GDPR aims to ensure that data subjects are protected from unethical data practices and have greater rights over their own data (Box 4.3). It requires there to be a lawful basis for collection, processing and holding personal or sensitive data, and for privacy notices to be given in clear language to those who provide data. These notices must provide information on what data are collected and held, how they are collected, with whom they are shared, for what purpose they are collected and used, and for how long they will be stored. Data subjects also have a right to access their data, and to ask for them to be corrected or deleted, meaning that organizations need systems that can identify and find data. Data subjects’ right to complain is also enshrined in the GDPR, as is the right not to be profiled or to restrict the use of one’s data for profiling. In addition to these general aspects of how data privacy must be maintained, the GDPR brings new guidance on how consent is sought, recorded and managed. Consent must be freely given, specific, informed and unambiguous. There must be a positive opt-in – consent cannot be inferred from silence, pre-ticked boxes or inactivity. It must also be separate from other terms and conditions, and you will need to have simple ways for people to withdraw consent. Public authorities and employers will need to take particular care. Consent has to be verifiable and individuals generally have more rights where you rely on consent to process their data. (Information Commissioner’s Office, 2017)

Box 4.3. Data Subjects’ Rights in the GDPR • • • • • • • •

the right to be informed; the right of access; the right to rectification; the right to erasure; the right to restrict processing; the right to data portability; the right to object; the right not to be subject to automated decision-making, including profiling.

Technology, Biases and Ethics 123 It also strengthens data protection for children. It requires parental consent for data collection and processing for anyone under the age of 16 (13 in some EU countries). Data breaches are considered a serious violation under the GDPR. National data authorities (and/or the individuals affected) must be notified within 72 hours of certain types of personal data breaches. The GDPR recommends mapping out data flows to determine where cross-border data transmission or processing is happening, and to identify local data protection supervisory authorities in those countries. The GDPR allows for significant fines to be placed on those responsible for data breaches, up to 4% of annual revenue. It also requires that organizations designate someone within their structure to take responsibility for data protection compliance if they carry out the regular and systematic monitoring of individuals on a large scale or carry out large-scale processing of special categories of data, such as health records or data on children (Information Commissioner’s Office, 2017). Many organizations around the world are now working to improve their data practices because they believe data subjects, regardless of their social, economic or citizenship status, deserve to be protected or because they expect many other countries to follow the European Union’s example and introduce their own data protection regulations. As the United Nations Conference on Trade and Development (UNCTAD) noted in 2016, personal data have become the fuel driving much of the global economy, with huge volumes of information collected, transmitted and stored every day around the globe. As more and more economic and social activities move online, the importance of data protection and privacy is increasingly recognized, not least in the context of international trade. At the same time, the current system for data protection is highly fragmented, with diverging global, regional and national regulatory approaches. (UNCTAD, 2016)

Improving Data Privacy and Protection in the Development Sector Development and humanitarian agencies can use the advent of the GDPR to begin taking greater care with data, developing policies and practices to ensure that their handling of beneficiary data doesn’t put vulnerable people at risk, and ensuring that sufficient investment is made in this area. The GDPR guidance and the various existing policies developed by other agencies should serve as a basis. Rather than reinventing the wheel, each agency should invest time and money in •

translating, localizing, contextualizing and/or adapting existing guidance;

124 Linda Raftree • • • • • •

engaging the wider organization and sector in understanding the importance of data ethics, privacy and security; training staff, researchers, evaluators and partners so that they are well informed on how to implement strong data protection policies; hiring and covering salaries of data protection officers and a network of point persons across agencies and teams; including funds to ensure data privacy and protection are addressed in evaluations and programmes in general; conducting risk-harms-benefits analyses on data initiatives (Polonetsky, Tene and Jerome, 2014); determining accountability chains for data privacy and protection.

Additionally, when it comes to big data in the development sector, evaluators and commissioners of big data efforts need to pay close attention to aspects highlighted earlier in this chapter: ensuring representativeness of data and appropriateness of proxy indicators, and avoiding bias in algorithms, the tendency of proxy indicators to create perverse incentives and unintended consequences, replication of systemic bias and discrimination, and harm to poor people. In 2016, the Ford and MacArthur Foundations co-funded research on the risks related to new ways of using data that fall outside of existing regulatory, legal and best practice frameworks. This research showed that data-intensive research approaches have the potential for risk to vulnerable populations. These newer approaches are not covered by existing data management regimes that guide research on human subjects, however, and lack a structured way of considering risks. It also warned that many social-sector projects depend on data that reflects patterns of bias or discrimination against vulnerable groups. This means that researchers need to be careful that they are not reinforcing existing disparities. However, most development agencies do not have the in-house advanced mathematics and data science skills needed to evaluate statistical models and algorithms proposed in this type of research. Another concern is that both big data and the capacity required to analyse big data are increasingly concentrated in the private sector (Robinson and Bogen, 2016). Private-sector statistical models, even those applied to social-sector problems, are typically considered proprietary (O’Neil, 2016). Even if development agencies were allowed to access these “black box” algorithms, they lack the skills to examine, assess and question them. The development sector’s work is underpinned by a moral and ethical imperative, so it is not difficult for the sector to extend its general ethos and ways of working to include the concepts of data privacy and protection. However, big data, artificial intelligence, data privacy, data protection data security, and the legal and regulatory environment are all highly complex. That means the sector needs a focused effort to make these topics part of everyday language and actions, and to build or acquire the skills and expertise needed to prevent harm to the most vulnerable.

Technology, Biases and Ethics 125 It’s common to hear people in the development sector say, “I am not technical”. This is a myth that needs to be overcome. “We are all technical now. Just because you can’t write a line of code doesn’t mean you don’t understand technology. We all have something important to contribute to the discussion” (Telford, 2018). The sector needs to create spaces for critical conversations and reflection in the face of NIT and big data “hype” so that we do not replicate the types of harm that have plagued the private sector’s efforts in this area.

References Abreu Lopes, C., S. Bailur, and G. Barton-Owen (2018), Can big data be used for evaluation? UN Women, New York, www.unwomen.org/en/digital-library/publications/2018/4/ can-big-data-be-used-for-evaluation (accessed 17 September 2018). Bamberger, M., L. Raftree, and V. Olazabal (2016), “The Role of New Information and Communication Technologies in Equity-Focused Evaluation: Opportunities and Challenges”, Evaluation, Vol. 22/2, Sage, London. Behar, R. and D. Gordon (1996), Women writing culture, University of California Press, Berkeley, California. Boyd, D. and K. Crawford (2011), “Six Provocations for Big Data”, Paper Presented at “A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society” at the Oxford Internet Institute, 21 September 2011, https://papers.ssrn. com/sol3/papers.cfm?abstract_id=1926431 (accessed 31 May 2018). Cash Learning Partnership (2013), Protecting beneficiary privacy: Principles and operational standards for the secure use of personal data in cash and e-transfer programmes, Cash Learning Partnership, Oxford, http://cashlearning.org/ downloads/calp-b eneficiary-privacy-web.pdf (accessed 9 May 2017). Data2x (2017), Gender Data Gaps, Data2x, Washington, DC, www.data2x.org/ what-is-gender-data/gender-data-gaps/ (accessed 27 January 2018). EU (2016), EU General Data Protection Regulation, European Union, Brussels, www.eugdpr.org/ (accessed 28 January 2018). Girl Effect (2016), Girl safeguarding policy: Digital privacy, security and safety principles and guidelines, Girl Effect, London, www.girleffect.org/media/3052/gem-g irl- safeguarding-policys_19-05-16.pdf (accessed 4 May 2017). Girl Effect and 2CV (unpublished), Research with girls in India, Indonesia, Bangladesh, The Philippines, South Africa and Nigeria. Greenwood, F. et al. (2017), The signal code: A human rights approach to information during crisis, Harvard Humanitarian Initiative, Cambridge, Massachusetts, https://signalcodeorg.files.wordpress.com/2017/01/signalcode_final7.pdf (accessed 9 February 2017). Hastie, R. and A. O’Donnell (2017), Responsible data management training pack, Oxfam, Oxford, http://policy-practice.oxfam.org.uk/publications/responsible- data-management-training-pack-620235 (accessed 4 May 2017). Hudson, L. (2017), Technology Is Biased Too. How Do We Fix It? FiveThirtyEight, New York, https://perma.cc/3KAC-7CQV (accessed 27 January 2018). ICRC (2013), Professional Standards for Protection Work, International Committee of the Red Cross, Geneva, www.icrc.org/eng/assets/files/other/icrc-002-0999.pdf (accessed 6 May 2017).

126 Linda Raftree ICRC (2016), ICRC Rules on Personal Data Protection, International Committee of the Red Cross, Geneva, www.icrc.org/en/publication/4261-icrc-r ules-on- personal-data-protection (accessed 6 May 2017). Information Commissioner’s Office (2017), Preparing for the General Data Protection Regulation (GDPR): 12 steps to take now, UK Information Commissioner’s Office, London, https://ico.org.uk/media/1624219/preparing-for-the-gdpr-12-steps. pdf (accessed 28 January 2018). IOM (2010), IOM data protection manual, International Organization for Migration, Geneva, http://publications.iom.int/system/files/pdf/iomdataprotection_web.pdf (accessed 6 May 2017). Khagram, S. and A. Agrawal (2016), Overview: Data roadmaps for sustainable development, Global Partnership for Sustainable Development Data, https://drive. google.com/file/d/0B9NgUfNqOEoqRXhCYUppNjk1a3M/view (accessed 27 January 2018). McGee, R. et al. (2018), Appropriating technology for accountability: Messages from Making All Voices Count, Institute for Development Studies, Brighton, https:// opendocs.ids.ac.uk/opendocs/bitstream/handle/123456789/13452/RR_Synth_ Online_final.pdf (accessed 31 May 2018). Murphy, H. (2017), “Why Stanford Researchers Tried to Create a ‘Gaydar’ Machine”, The New York Times, New York, www.nytimes.com/2017/10/09/science/ stanford-sexual-orientation-study.html (accessed 27 January 2018). O’Neil, C. (2016), Weapons of math destruction: How big data increases inequality and threatens democracy, Crown, New York. Parker, B. (2017), “Security Lapses at Aid Agency Leave Beneficiary Data at Risk”, IRIN News, Geneva, www.irinnews.org/investigations/2017/11/27/security- lapses-aid-agency-leave-b eneficiary-data-r isk (accessed 28 January 2018). Polonetsky, J., O. Tene, and J. Jerome (2014), Benefit-risk analysis for big data projects, Future of Privacy Forum, Washington, DC, https://fpf.org/wp-content/ uploads/FPF_DataBenefitAnalysis_FINAL.pdf (accessed 28 January 2018). Raftree, L. (2014), ICTs and M&E at the African Evaluators’ Conference (Part 2), Wait… What?, https://lindaraftree.com/2014/03/17/icts-and-me-at-the-african- evaluators-conference-part-2/ (accessed 27 January 2017). Raftree, L. (2017), “Buckets of Data for MERL”, MERL Tech, http://merltech.org/ buckets-of-data-for-merl/ (accessed 27 January 2017). Raftree, L. et al. (2016), Developing and operationalizing responsible data policies. Wait… What?, https://lindaraftree.com/2016/10/31/developing-and-operationalizing- responsible-data-policies/ (accessed 28 January 2018). Raftree, L., K. Appel, and A. Ganness (2013), Modern mobility: The role of ICTs in child and youth migration, Plan International USA, Washington, DC, https:// resourcecentre.savethechildren.net/sites/default/files/documents/modern_ mobility.pdf (accessed 27 January 2017). Raymond, N., D. Scarnecchia, and S. Campo (2017), “Humanitarian Data Breaches: The Real Scandal is Our Collective Inaction”, IRIN News, Geneva, www. irinnews.org/opinion/2017/12/08/humanitarian-d ata-breaches-real-s candal-our- collective-i naction (accessed 28 January 2018). Robinson, D. and M. Bogen (2016), Data ethics: Investing wisely in data at scale, Upturn, Washington, DC, www.teamupturn.org/reports/2016/data- ethics (accessed 28 January 2018).

Technology, Biases and Ethics 127 Telford, S. (2018), “Opinion: Humanitarian World is Full of Data Myths. Here Are the Most Popular”, Devex, Washington, DC, www.devex.com/news/opinion- humanitarian-world-i s-f ull- o f- d ata-myths-h ere-a re-t he-m ost-p opular-91959 (accessed 28 January 2018). The Engine Room (2014), Responsible Data, Responsible Data Forum, https:// responsibledata.io/ (accessed 28 January 2018). The Engine Room and Oxfam (2018), Biometrics in the humanitarian sector, Oxfam Great Britain, Oxford, UK, www.theengineroom.org/wp-content/ uploads/2018/05/Oxfam-Report-May2018.pdf (accessed 17 September 2018). UNCTAD (2016), Data Protection Regulations and International Data Flows: Implications for Trade and Development, United Nations Conference on Trade and Development, Geneva, http://unctad.org/en/PublicationsLibrary/dtlstict2016d1_ en.pdf (accessed 28 January 2018). UNGP (2016), Privacy and data protection principles, United Nations Global Pulse, New York, www.unglobalpulse.org/privacy-a nd- d ata-p rotection-p rinciples (accessed 5 May 2017). USAID (2018), An introduction to USAID’s work on responsible data, United States Agency for International Development, Washington, DC, www.usaid.gov/ sites/default/files/documents/15396/USAID-ResponsibleData-Introduction.pdf (accessed 28 January 2018). Waugaman, A. (2016), From principle to practice: Implementing the Principles for digital development, The Principles for Digital Development Working Group, Washington, DC, https://digitalprinciples.org/wp-c ontent/uploads/From_ Principle_to_Practice_v5.pdf (accessed 1 May 2017). WFP (2016), WFP guide to personal data protection and privacy, World Food Programme, Rome, https://docs.wfp.org/api/documents/e8d24e70cc11448383495ca ca154cb97/download/ (accessed 5 May 2017).

5 Technology and Its Implications for Nations and Development Partners Oscar A. García and Prashanth Kotturi

Any deliberation on the information and communication technology (ICT) applied to development evaluation would be incomplete without understanding the economic context in which evaluation operates and the profound impact that technology is having on the societies, economies and countries. In recent decades, technology has created immense wealth, whether directly through the growth of information technology and allied sectors or through the catalyst role that technology has played in enormous efficiency gains in nearly all sectors of the economy. The benefits of technology for the development are extensively documented. Sectors such as education and health are undergoing discernible shifts with the advent of Internet connectivity. The rapid spread of open-source knowledge through massive open online courses (MOOCs) has served as a great equalizer in the realm of education. People in the remotest of areas can now access high-quality education at little or no cost. Similarly, e-medicine has removed the spatial constraints for accessing high-quality medical services. E-governance is transforming the manner in which governments interact with their citizens. Technology has enabled an unprecedented level of transparency in policymaking. Beyond these particular sectors or fields, however, technology is affecting our society and lives at a much broader level. Technology is changing the assumptions on which economic growth models were historically based and the way in which countries will develop in the future. This chapter examines how technology might affect the future of the societies and economies and how the development paradigm might change. Such a macro-level view can shed light on the debate that development and evaluation practitioners need to have about whether we are prepared for times to come.

Structural Transition and Pathways for Economic Development The beginning of industrialization in 19th century in Europe heralded a new age of mass employment and structural change that continues today.

Technology and Its Implications 129 The trajectories of developed countries provide us with clues about what to expect from the process of economic growth in developing countries today, including changes in the three main sectors of the economy (agriculture, manufacturing and services). The process of economic growth classically entails (i) a falling share of agriculture in economic output and employment, (ii) a rising share of urban economic activity in industry and modern services, (iii) migration of rural workers to urban settings and (iv) a demographic transition in birth and death rates (Timmer and Akkus, 2008). In brief, in any structural transformation process movement can be noted across three dimensions, viz. sector, income and geography. The potential impact of technology on each of these three will be covered in the next sections of this chapter. Asia illustrates the structural transformation process well as it is home to a wide variety of nations whose income status has changed, from developed countries such as South Korea and Singapore to upper-m iddle-income countries such as Malaysia and China, lower-m iddle-income countries such as India and low-income countries such as Nepal (Table 5.1). At the start of the development trajectory, there is a substantial gap between the share of the labour force employed in agriculture and the share of gross domestic product (GDP) generated by that work force. As economic growth takes place and industrialization materializes, that gap shrinks, pointing to increased productivity in agriculture as well as overall labour productivity in the economy. However, the actual relationship between the two shares depends critically on the pace of change in manufacturing and services, and on the labour intensity of those sectors. As labour gradually moves from agriculture to manufacturing and later to services, labour productivity and wages in these sectors converge (Timmer and Akkus, 2008). Historically, increased productivity is accompanied by an increase in real wages, thus leading to an increase in the real GDP of a country. The pathway for economic development is becoming increasingly uncertain and unpredictable, however, in light of rapid recent strides in technology. At the same time, the concentration of wealth is increasing, and levels of inequality are rising. Sector - Industry’s changing dynamic. Over the past two centuries, nearly every country that has gone through the process of industrialization has followed a broadly similar path, from being a largely agrarian society to an economy based on industry and later on services. Thus, countries went through a cycle of industrialization followed by deindustrialization, with the shift to services (Ghani and O’Connell, 2014). Manufacturing has been an escalator for poor countries for four important reasons. First, there tend to be incremental increases in productivity in many manufacturing industries (Rodrik, 2013). They typically start with simpler manufacturing sectors – such as garments – and then lead to more sophisticated industries. Second, manufacturing is a tradable sector. This means that successful manufacturing industries can expand almost indefinitely, by gaining market share in

Source: ADB (2013).

Industrialized

Integration

Agricultural surplus

Late

Middle

Early

Bangladesh, Nepal, Low-i ncome countries; agricultural labour Cambodia, Viet Nam productivity only $240. Agriculture’s output share is 37%, and employment share is 66%. Bhutan, India, Kyrgyz Low-i ncome countries; agriculture’s output share ranges from 19% (Bangladesh) to 36% (Cambodia); Republic, Indonesia, employment share ranges from 33% (Kyrgyz Lao PDR, Pakistan, Republic) to 85% (Lao PDR). Agricultural labour Papua New Guinea, productivity ranges from $434 (Cambodia) to $947 China, Sri Lanka, (Pakistan). Samoa, Uzbekistan Armenia, Philippines, Middle-i ncome countries. Agriculture’s labour share Tajikistan, Thailand, ranges from 33% (Sri Lanka) to 52% (Viet Nam); output share ranges from 10% (PRC) to 21% (Viet Vanuatu Nam). Agricultural labour productivity as low as $367 (Viet Nam), up to $1,100 (Philippines). Georgia, Malaysia, Middle-i ncome country. Agricultural labour productivity is $2,800; output share is 20%; Republic of Korea employment share is 38%. Middle-i ncome country. Agricultural labour productivity approaching $7,000; employment share of agriculture is 14%; output share is 10%. Japan High-i ncome countries. Agricultural labour productivity ranges from $6,423 to $76,830 (median of $33,450). Output share ranges from 0% to 3.9% (median of 1.9%); employment share ranges from 1.0% to 10.9% (median of 2.9%)

Beginning

Description

Countries in 1980

Phase of development

Table 5.1 Stages of agricultural development for countries in developing Asia and the Pacific

Republic of Korea, Japan

Georgia, Samoa, Uzbekistan, Vanuatu Armenia, Malaysia

Bangladesh, Cambodia, Kyrgyz Republic, Lao PDR, Pakistan, Papua New Guinea, Tajikistan India, Indonesia, China, Philippines, Thailand, Sri Lanka, Viet Nam

Nepal

Countries in 2010

Technology and Its Implications 131 world markets, without running into demand constraints. Third, manufacturing is a great absorber of unskilled labour, a low-income country’s most plentiful resource (Rodrik, 2015). Fourth, productivity convergence with developed countries seems to be considerably easier to achieve in manufacturing than in other sectors such as traditional agriculture or most services (Rodrik, 2014). This paradigm is changing rapidly in favour of unknown pathways and their effects. Increased automation in manufacturing and services is altering the path that countries will take, however. As automation takes over the functions that people carried out in industry and services, these sectors will be able to absorb less labour. The problem is more acute in industry, where functions can be more easily broken down into routine and repetitive tasks in highly structured and predictable environments, conditions ripe for automation. As it happens, the manufacturing industries most amenable to automation are the simpler, less complex industries (Chui et al., 2017). On a broader scale today, most of the manufacturing processes susceptible to change are those in developing countries. About 64% of the work hours spent on manufacturing activities can be automated, based on currently demonstrable technologies (Chui et al., 2017). Developing countries account for 81% of the automatable hours and 49% of the value of labour. This change is already being witnessed on a global scale in developing countries. Manufacturing’s peak share of employment has historically been high. In the United Kingdom just before World War I, for example, it was 45%. In today’s newly emerging and industrializing economies such as India and Brazil, the share has already peaked at only 15% of the workforce (Oxford Martin School and Citi GPS, 2016). Manufacturing’s loss of its labour-intensive nature is now being referred to as “premature deindustrialization” (Rodrik, 2016). Advances in technology have reduced the threshold for deindustrialization from the per capita income of US$ 34,000 in 1970 to US$ 9,000 in 2010 (Mehta, Felipe and Rhee, 2015). This presents a conundrum for developing countries, where an increase in incomes accompanying a manufacturing boom leads to an increasing substitution of labour with technology. What this means in concrete terms is that developing countries will be more susceptible to be caught in a middle-income trap,1 as deindustrialization sets in earlier (Yusuf, 2017). Developing countries may not be able to harness their most important competitive advantage, low-cost labour, for very long in their growth trajectory. The reduction in the threshold deindustrialization is driven by a combination of factors, including falling costs of technology, globalization and increasing integration of countries into global value chains. (Alviarez, 2019; Dao et.al, 2017). Upcoming changes in industry. While current technologies are already altering the dynamic of mass manufacturing, an essential engine of growth, emerging technologies are expected to provide further impetus to such

132 Oscar A. García and Prashanth Kotturi change. Mass manufacturing, a key employment generator in many emerging and developing countries, is being challenged by 3D printing/additive manufacturing, which constructs objects by layering materials such as plastic, metal or concrete. Developing and emerging economies have often used their low labour costs to attract mass manufacturing enterprises. However, the usual economies of scale barely apply to 3D printers. Their easy-to- change software means they can turn out one-off items with the same equipment and materials needed to make thousands, thus enabling a high level of customization. That alters the nature of manufacturing (Lakkad et al., 2013). Given that 3D printing requires little semi-skilled, low-cost labour, it makes manufacturing much less labour cost-sensitive, thus potentially reducing the concentration of manufacturing in developing countries. Moving production back to developed countries enables producers to take advantage of a shorter time to markets as well as highly customized production, which is currently not possible with cost-driven mass manufacturing. 3D printing could also usher in an age of highly efficient and environmentally sustainable manufacturing, given the minimal waste from this process. This also holds real potential to promote the utopian yet much needed idea of a circular economy and partly decouple production from natural resource exploitation (Despeisse, et al., 2017). This has the potential to address the issues which are at the core of the sustainability agenda of the SDGs thus ushering in the doughnut economy that Kate Raworth (2017) so elegantly put in her work. Choices for policymakers. For policymakers and development partners, technology presents a worrying scenario of disruption in the sustainable growth trajectories of rapidly developing countries and potential future pathways for other developing countries, especially those in Sub-Saharan Africa that have yet to take off on the industrialization pathway. The labour force will find it increasingly difficult to move out of agriculture and into more productive sectors. Countries will have to chart new pathways for growth and development. For developing countries, a model based more on the services sector may need to be devised. These alternative models come with their own challenges. The constraints of a services-led structural transformation are inherent tradability or lack of it, the scale of productivity enhancement and the comparative skill sets in the economy, with interconnectedness running between them. A leap from agriculture to services requires a shift to high- end services that can be highly productive (Amirapu and Subramanian, 2015). Traditional services (social and personal services, and hotels and restaurants) do not possess the kind of productivity required for rapid income increases (Ghani and O’Connell, 2014). High- end services require tradability beyond national borders. Trade and exports in particular provide a source of unconstrained demand for the expanding sector. In turn, tradability of services is being facilitated by the rapid spread of technology, thus removing the

Technology and Its Implications 133 age-old dilemma and concern of a transition into low-productivity services sector. The increased tradability also makes the services sector more prone to absorption of labour. While high-wage services might be a substitution for increasingly automated manufacturing, the skill sets and comparative advantages required in each sector are different. To ensure that expansion occurs and the benefits of fast-g rowing sectors are widely shared across the labour force, there should be a match between the skill requirements of the expanding sector and the skill endowment of the country. Only if it meets these conditions will the service sector be able to fulfil manufacturing’s traditional role of being an “escalator” out of poverty (Amirapu and Subramanian, 2015).

Income - Who Has Technology Affected the Most? Global wealth in terms of GDP and per capita income is at record levels and development indices for basic indicators such as health and education seem to be improving in most developing countries. The structural transformation in the past decades has lifted hundreds of millions out of poverty and created a strong middle class in many developing countries, thus serving as the engine of further economic growth. However, while incomes between countries have started converging, the incomes within countries have been diverging. The fruits of economic growth have not been uniformly distributed. The World Inequality Report 2018 found the top 1% of the world captured twice as much real income growth as the bottom 50% of the population between 1980 and 2016. The same report showed that income inequality (the share of total national income accounted for by the top 10% of earners) has increased across the world, from China, India and Sub-Saharan Africa to the United States and Europe, with varying magnitudes. A wide variety of reasons have been provided for such increasing inequalities, including changing distribution of political power (Acemoglu and Robinson, 2008), presence and strength of political and economic institutions (Acemoglu and Robinson, 2014), fiscal and social policy (Dervis and Qureshi, 2016), and globalization and technology (Dabla-Norris et al., 2015). To understand how technology is affecting inequality, it is useful to consider the three ways in which household income distribution is interpreted (UNDP, 2013). Primary income distribution is the distribution of household incomes consisting of the (sometimes cumulated) different factor incomes such as rent, wages and profits as determined by markets and market institutions for different factors of production within each household before taxes and subsidies. Secondary income distribution is the distribution of household incomes after deduction of taxes and inclusion of transfer payments (i.e. as determined by fiscal policies). Tertiary income distribution is the distribution of household incomes when imputed benefits from public expenditure are added to household income, after taxes and subsidies. This interpretation of household income is

134 Oscar A. García and Prashanth Kotturi particularly relevant for countries where government services are provided for free or below market prices. Until recently, the inequality debate has focused more on secondary and tertiary income distributions and less on primary distribution. The link between these three is organic. Secondary and tertiary income distributions might be areas where solutions lie in the medium term to address growing inequalities, even inequalities caused due to imbalances in primary income distribution. Primary income distribution is profoundly affected by growth- led structural transformation and the embedding of technology within it. Labour share in national income and technology. As technology and automation have advanced over the past four decades, the share of labour (or wages) in national income has consistently declined, while the share of capital (or profits) has been increasing (Figure 5.1). Historically, the share of labour in national income is lower in developing countries to start with. However, the recent decline in the share of labour has been attributed to several factors, including the decline in the labour union movement, globalization, trade agreements and technology. The International Monetary Fund (IMF) attributes about half of the fall in labour share to technology in advanced economies and marginal effects 55 54 53 52 51 50 49 48 Percentage

47 46 45 44 43 42 41 40 39 38 37 36 35 1970

1975

1980

1985

1990

1995

2000

2005

2010

2015

Years Advanced Economies

Emerging Markets and Developing Economies

Figure 5.1 L abour’s percentage share of GDP has been declining in recent decades. Source: IMF (2017).

Technology and Its Implications 135 in developing and emerging economies (Dao et al., 2017). Many developing countries have inherent labour cost advantages, which blunt the effects of automation. But developing countries still face the risk that technology could affect labour share because the costs of technology are falling. This risk is reinforced by the proportion of jobs that can be automated in developing countries and the early peak in these countries in the share of employment in manufacturing (premature deindustrialization). Falling costs of capital investments will only make automation more likely in developing countries. The fall in labour share, driven partly by technology, underlines a more worrying trend of widening inequality. Owners of capital tend to be richer than those who can render their labour. So, if labour’s share of national income falls and capital’s share rises, the poorer sections of any country lose out and the richer sections of society gain, thus increasing inequality. In terms of empirical evidence, Figure 5.2 amply depicts the effect of declining share of labour on income inequality (as measured by the Gini coefficient) within countries. Lower labour share is strongly associated with higher inequality. Seen in combination with the previous graph, the declining share

70 65 y=-34.83*x + 62.24

60

R2= 0.108

Gini Coefficient

55 50 45 40 35

y=-38.01*x + 50.31 R2= 0.128

30 25 20 15

0.2

0.3

0.4

0.5

0.6

Labor Share Gross

Net disposable

Figure 5.2 W hen labour’s share of GDP declines, inequality tends to rise. Source: IMF (2017).

136 Oscar A. García and Prashanth Kotturi of labour might provide one among a host of plausible explanations for the rising inequality within countries. Intra-labour market polarization. Inequality is increasing not only between owners of labour and capital, however, but also within the labour force, especially in developed countries. The share of income of middle- and low-skilled workers in national incomes has been falling in developed as well as developing countries, while the share of income of highly skilled workers has risen (Figure 5.3).2 Such changes within labour markets have led to a sharp polarization, as a small number of earners command large compensation packages, while large sections of the labour force suffer from stagnant or falling wages. Many factors influence such movements, including globalization, trade, global value chains and technology (Dao et al., 2017). The prevalence of free trade and integration into global value chains is also partly enabled by technology, which can make services tradeable, cut transportation times and facilitate seamless integration of global operations in multinational companies. Hollowing out the middle. The labour share of low- and middle-skilled workers has been affected the most by myriad factors (Figure 5.4). Interestingly, technology disproportionately affects the incomes of middle-skilled workers, while having little effect on low-skilled and highly skilled workers. This has given rise to a phenomenon known as “hollowing out the middle”. The path to development has rested on the establishment of a large middle class and will remain so, and creating middle-skilled jobs is an important part of the process. However, technology will generate new challenges to creation of middle-skilled jobs. The price of technological capital has been falling in real terms in recent decades. The skill complementarity of technological capital is higher with higher skills. Hence, the skill premium is expected to increase. In other 32-

22-

28-

20-

24-

18-

20-

16-

16-

14-

12-

12-

8-

10-

1995

1997

1999

2001

High Skill

2003

Low Skill

2005 Middle Skill

2007

2009

1995

1997

1999 High Skill

2001

2003

Low Skill

2005 Middle Skill

Figure 5.3 S hare of labour by skills (as per cent of national income). Source: IMF (2017).

2007

2009

Technology and Its Implications 137 Technology

Global Value Chain Participation

Financial Integration

Skill Suppy and Other Composition Shifts

9 8

* Actual Change

*

7 6 5 4 3 2 1 0 -1 -2 -3

*

-4

*

-5 -6

*

-7 -8 High Skill

Low Skill

Middle Skill

Middle Skill Advance Economies

Figure 5.4 Impact of various factors on aggregate labour share (in percentage points) change by skill, 1995–2009. Source: IMF (2017).

words, the intra-labour market polarization could become sharper (Krusell et al., 2000). In summary, the increasing polarization of incomes between capital and labour and within labour markets is directly correlated to the increase in inequality within countries. Even within the labour force, economic growth is increasingly benefiting the highly skilled part of the workforce. The drivers of such inequality are, inter alia, trade, globalization and technology.

A Luddite’s Nightmare or a Passing Phenomenon? Technology has brought about immense changes in society, for better or for worse. The narrative so far might suggest a Luddite’s hand behind it.3

138 Oscar A. García and Prashanth Kotturi Is technology really going to create a true Keynesian “world of leisure”, in which technology replaces humans and humans enjoy more leisure time? There is a spectrum of opinions on whether technology will displace or complement humans. For some more optimistic constituents, the present is not so different from the past. Throughout history, new technologies have been initially accompanied by stagnant wages and rising inequality. This was true during the Industrial Revolution in the early 19th century and during the wave of electrification that began at the end of the 19th century. After some decades, however, these patterns reversed; large numbers of ordinary workers eventually enjoyed robust wage growth thanks to new technology (Bessen, 2015). The steam engine took over from horse-drawn carriages, giving new meaning to the word mobility. In the process, some lost their jobs. But an entire transportation industry was born, which employed many more people subsequently and fuelled industrialization in Europe and around the world. Whether one takes an optimistic or a cynical view of technology and the permanence of its effects on the society and economy, the paradigm of economic development at large will be in a state of flux and uncertainty for some time to come. The rewards of the second machine age will no doubt be large for some. However, the key to real development in the coming years will be to distribute such rewards more equitably. This will require conscious intervention on the part of governments, societies, corporations and individuals. Whether the pace of change will be gradual or rapid is also an open question. If the pace of growth in automation and artificial intelligence just in the last five years is any indication, however, the pace of change will accelerate. The disruptions already caused by technology and its capacity to cause more disruption should make us question the historical assumptions and pathways on which economic development efforts of governments and development partners alike are based. The past may not provide sufficient guidance on this.

Geography - Implications for Sustainable Rural Development In terms of geographic movement, a historically known pathway involved a rapid process of urbanization, as a natural consequence of movement towards manufacturing and services. Paradoxically, rural development in the context of structural transformation is also accompanied by a concentration of economic activity in urban areas. However, if the historical assumptions for development pathways do not hold true anymore and if the previous analysis on distribution of incomes and cross-sectoral movements materialize, it will mean that the existing rural-urban divide will be exacerbated further. The changing fundamentals of sustainable structural transformation, including premature de-industrialization call for attention to specific aspects of rural development. First, the role of rural non-farm economy will become more important in absorbing the excess labour from agriculture (Barret et al., 2015). The rural non-farm sector provides a crucial

Technology and Its Implications 139 bridge between commodity-based agriculture and livelihoods earned in the modern industrial and service sectors in urban centres (Haggblade et al., 2007). Many factors determine the nature and scope of rural populations’ participation in the non-farm economy, including asset endowments (such as land, livestock and real estate), quality of human resources and skills, quality of local governance, and linkages with urban markets (Davis and Bezemer, 2004; Haggblade et al., 2007). These are areas of interventions which require systematic policy-level efforts to calibrate focus away from an inherent urban bias in development planning towards a more nuanced and balanced rural focus (Bates, 1981). Second, the shift of rural labour into the services sector will have to be more systematically geared towards high-productivity sectors (MoF, 2018; UNICRI, 2010). However, such sectors will inevitably carry skill premium in favour of high-skill workers. There is data to suggest that services can indeed substitute for manufacturing as the “escalator” out of poverty (Ghani and O’Connell, 2014). However, there is one major change from past pathways. The skill levels required in both manufacturing and services have been steadily increasing due to the proliferation of technology (Eichengreen and Gupta, 2011). If rural populations are to benefit from such jump, there will have to be more systematic efforts to erase the inherent disadvantages that rural populations face in terms of access to the services which will prepare them for the economy of tomorrow. These are covered in more detail below.

Dealing with Disruptions and Moving Forward Education and employability. In recent decades, countries have built their education systems on two major assumptions: (i) a certain number of people will be absorbed into various kinds of employment based on the level of education attained, and (ii) education is a one-time endeavour, after which people will hold onto lifetime of employment. Technology will challenge these assumptions. Post-war prosperity involved mass absorption of workers into jobs requiring different mixes of skills, from process operatives in manufacturing or clerical administration to bankers, accountants and lawyers. Such jobs demanded robust schooling systems inculcating basic literacy and technical skills for low- and lower-m iddle-skilled workers and tertiary education for those in the middle-skilled and highly skilled jobs. The disproportionate influence of technology will change the profile of lower- and middle-skilled jobs. Education systems will need to be realigned to prepare people for redefined jobs and economies. Inevitably, proliferation of technology creates a skill bias in labour markets (Acemoglu and Autor, 2010), as has been covered earlier. This will be a bigger challenge in developing countries because of their weaker financial and institutional capacities and because they start from a lower base. As the McKinsey report A Future that Works states, education systems will need to evolve for a changed workplace. Policymakers will need to work with education providers to

140 Oscar A. García and Prashanth Kotturi improve basic skills in the fields of science, technology, engineering and mathematics, and to put a new emphasis on creativity, as well as on critical and systems thinking. This will better enable the workforce to move from agriculture to high-value manufacturing and services. In fact, this becomes even more critical for rural populations which are oftentimes marginalized in the educational sphere and elsewhere. In the absence of such a jump, the workforce might become embedded in low-productivity service sectors. Rapid strides in technology will also mean that displaced workers of today will need to be accommodated in other jobs tomorrow. The labour force will need to constantly learn and keep up to date with technological advances. This will require institutions and systems for lifelong learning (UN DESA, 2017). Access to natural, social, political and economic capital and creation of human capital. More emphasis on the rural non-farm economy entails improving rural people’s access to natural, physical and economic capital so they can take advantage of non-agricultural opportunities. Rural markets will also have to be connected to broader markets, which will require sustainable investments in rural areas. More systematic investments will have to be made in rural infrastructure, livelihoods, educations systems and local governments. If political will can focus public efforts and funding on growth of the rural non-farm economy, it could serve as an alternative to traditional development pathways. Rural economies can no longer be seen as transitional and as a feeder into urban agglomerations but as a means to an alternative economic pathway. A wider push towards access to skill building and education will be required to erase the urban bias in service market. Social security and safety nets. One defining feature in sustaining recent prosperity has been the presence of social welfare “safety nets” in developed countries and emerging countries. These have ensured that citizens going through hardships still have access to basic services. As developing countries have prospered over the years, they have also started building social safety nets. These nets were and are built on the assumption of mass employment of labour in the organized economy and the labour force’s own contribution to such nets, complemented or substituted to some extent by government’s tax revenues. However, technology can bring temporary or permanent displacement and possibly movement of labour. Technology- driven fragmentation of labour – as witnessed in the “gig economy” – will also throw up enormous challenges for vulnerable sections of the workforce. In addition, if the trend in intra-labour market polarization is any indicator, we could witness increased concentration of labour in the lower-skilled and informal occupations, which carry little access to protections and safety nets. Technology also implies that the labour force that can contribute to social safety nets will shrink, while life expectancy remains high in developed countries and keeps increasing in developing countries. In addition to existing social safety nets, some are proposing a universal basic income, which is being piloted in select developed countries, such as Finland. Such a payout implies a strain on the public exchequer that would disproportionately

Technology and Its Implications 141 affect developing countries, given their vulnerability to technology at the outset and their weaker fiscal situation. Taxation and fiscal systems. Social security nets need mechanisms to fund them. In the past, large-scale, labour-intensive and secure employment created a large pool of reliable taxpayers. Such taxation systems, along with the collectivization of labour caused by manufacturing, led to the creation of welfare states in many developed countries. However, deindustrialization and loss of labour in manufacturing and, to some extent, in services may weaken such a taxpayer base. Automation allows firms to avoid employee and employer wage taxes levied by federal, state and local taxing authorities. It also permits firms to claim accelerated tax depreciation on capital costs for automated workers, and it creates a variety of indirect incentives for machine-labour substitution. This is the result of a tax system that was originally designed to tax labour rather than capital (Abbot and Bogenshneider, 2018). Future taxation systems will have to recalibrate their assumptions and their sources of revenue. Some (such as Abbot and Bogenshneider, 2018) have suggested that taxation systems of the future might have to consider taxing the “robots that displace humans”. Others have suggested future taxation systems could provide wage subsidies and increase taxes on the rich. Countries as well as their development partners will have to deliberate on new and more progressive taxation and fiscal policies to meet the challenges that accompany the “second machine age”. Governance, political and social stability. Changes in the status quo of economic growth and its pathways have the potential to create accompanying social changes. Social media and wireless technology today have connected people in unforeseen ways, amplifying the potential for expression of public opinion about governments and their institutions. The change that technology brings today, along with the large young populace in developing countries, implies that any changes, positive or otherwise, will disproportionately affect younger people in the future. Political systems will have to accommodate the voices of adversely affected populations and find alternative ways to engage them in the social and economic sphere. This is especially true if jobless growth and inequality become defining features of the fourth industrial revolution, given the accompanying demographic boom that developing countries are experiencing. The first effects of this flux in the political systems are already visible in the rise of a new strain of political actors in countries around the world. All the above measures fall under the ambit of secondary and tertiary income distributions. The solutions to the primary income distribution might lie in addressing secondary and tertiary income distributions. Development partners and governments will have to recalibrate their thinking in the new technology era. They will have to start looking for integrated solutions that respond to the integrated and complex nature of the Sustainable Development Goals (SDGs). Resource distribution and policy efforts will have to be geared to harness the potential benefits of the fourth industrial revolution.

142 Oscar A. García and Prashanth Kotturi

Implications for Development Partners Architecture of the past for challenges of the future. The international aid architecture, consisting of the UN system, regional and multilateral development banks, and international, regional and local NGOs, has existed in its current form for seven decades. The number of actors has increased dramatically, and the field has undergone changes in its philosophy and its processes. But the core assumption at the heart of the human development endeavour has remained that if an enabling social, political and economic environment is created, citizens will be able to engage in rewarding economic activities and further their well-being. This is how developing countries were expected to chart their economic development pathway. This thinking has dominated the post-war era, even as ideologies, as a means of creating such an enabling environment, have oscillated from one side to the other. While change has always been a feature of the work of the development sector, never have the economic fundamentals changed so rapidly or the potential been so great to change the economic paradigm so quickly. The development architecture will have to come to grips with the reality that technology is the driving force behind these changes at a global scale. Discussion within the development community on this topic has been largely muted, however, and has only recently been reflected in a United Nations Department of Economic and Social Affairs publication titled “Impact of Technological Revolution on Labour Markets and Income Distribution”. Development professionals need to be aware of the influence of technology and its role in shaping the direction of global policy discourse. In the absence of dialogue and action to equitably harness the benefits of the ongoing technological revolution, development partners risk becoming spectators, with a marginal role in shaping its outcomes in favour of the more vulnerable populations around the world. Development partners, given their roles in global advocacy and shaping policy discourse at the national, regional and global levels, will have to weigh in to make sure the technological revolution does indeed benefit the broader population. Development partners will also have to advocate for piloting new economic models and solutions for tomorrow’s technology- driven changes. New models for education and provision of skills, and for the taxation and fiscal systems of tomorrow, will have to be developed and thought through. If these are addressed, technology has the potential to deliver productivity growths which can outdo those witnessed during the previous industrial revolution. In fact, ICTs might provide the solutions as well, in the form, digital delivery of education and training, e-medicine, e-governance systems, etc. Artificial intelligence, 3D printing and increasingly advanced industrial robots have the potential to make production more efficient and in turn make economic activity more sustainable and deliver prosperity to a broader base. On the other hand, lessons so far indicate that the changing macroeconomic fundamentals are at the risk of leaving the poorest among countries

Technology and Its Implications 143 and individuals behind. Those with the least likelihood of reaping the dividends of today’s technologies will be left out of the knowledge economy of tomorrow’s technology-driven economic paradigm. Technology might amplify the existing structural disadvantages that weaker sections of the population face. Development partners have committed to “leaving no one behind” as a part of their commitment to the SDG agenda, and they will have to uphold this principle in the midst of this churn. This churn has only just started and will outlast the time horizon of SDGs. The question is, can development partners keep up?

Notes 1 According to Gill and Kharas (2007), the middle-i ncome trap occurs when middle-i ncome countries are squeezed between low-wage poor-country competitors that dominate in mature industries and rich- country innovators that dominate in industries undergoing rapid technological change. 2 The definition of skill types is based on the level of education of workers. The World Input-Output Database uses the 1997 International Standard Classification of Education to define low skilled as workers with primary and lower secondary education, middle skilled as those with upper secondary or postsecondary, non-tertiary education, and high skilled as those with first-stage tertiary education or higher. 3 The Luddites were a group of English textile workers and weavers in the 19th century who destroyed weaving machinery as a form of protest. The group was protesting the use of machinery in a “fraudulent and deceitful manner” to get around standard labour practices. The Luddites feared that the time spent learning the skills of their craft would go to waste as machines would replace their role in the industry.

References Abbot, R. and B. Bogenschneider (2018), “Should Robots Pay Taxes? Tax Policy in the Age of Automation”, Harvard Law and Policy Review, Vol. 12, http:// harvardlpr.com/wp-content/uploads/2018/03/AbbottBogenschneider.pdf. Acemoglu, D. and J. A. Robinson (2008), “Persistence of Power, Elites and Institutions”, American Economic Review, Vol. 98/1, American Economic Association, Pittsburgh, www.aeaweb.org/articles?id=10.1257/aer.98.1.267. Acemoglu, D. and D. Autor (2010), Skills, tasks and technologies: Implications for employment and earnings, National Bureau of Economic Research, Cambridge, Massachusetts, www.nber.org/papers/w16082. Acemoglu, D. and J. Robinson (2014), Rise and decline of general laws of capitalism, MIT Economics, Cambridge, Massachusetts, https://economics.mit.edu/ files/10422. ADB (2013), Agriculture and structural transformation in developing Asia: Review and outlook, Asian Development Bank, Manila, www.adb.org/publications/ agriculture-and-structural-transformation-developing-asia-review-and-outlook (accessed 1 June 2018). Amirapu, A. and A. Subramanian (2015), Manufacturing or services: An Indian illustration of a development dilemma, Center for Global Development, Washington,

144 Oscar A. García and Prashanth Kotturi DC, www.cgdev.org/publication/manufacturing-or-services-i ndian-i llustration- development-dilemma-working-paper-409. Alviarez, V. (2019), “Multinational Production and Comparative Advantage”, Journal of International Economics, Vol. 119, https://doi.org/10.1016/j. jinteco.2019.03.004. Barret, C. et al. (2015), The structural transformation of rural Africa: On the current state of African food systems and rural non-farm economies, Cornell University, Ithaca, New York, http://barrett.dyson.cornell.edu/files/papers/Barrett%20 Christiaensen%20Sheahan%20Shimeles%20v2.pdf. Bates, R.H. (1981), Markets and states in Tropical Africa. The political basis of agricultural policies, University of California Press. https://mpra.ub.uni-muenchen. de/86293/1/MPRA_paper_86293.pdf. Bessen, J. (2015), Learning by doing: The real connection between innovation, wages, and wealth, Yale University Press, New Haven, Connecticut. Chui, M. et al. (2017), Human + machine: A new era of automation in manufacturing, McKinsey, New York, www.mckinsey.com/business-functions/operations/ our-i nsights/human-plus-machine-a-new-era-of-automation-i n-manufacturing. Dabla-Norris, E. et al. (2015), Causes and consequences of income inequality: A global perspective, International Monetary Fund, Washington, DC, www.imf.org/ external/pubs/ft/sdn/2015/sdn1513.pdf. Dao, M. et al. (2017), “Why Is Labor Receiving a Smaller Share of Global Income? Theory and Empirical Evidence”, in Dao, M, M. Das, Z. Koczan and W. Lian, IMF Working Paper WP/17/169. Davis, J. and D. Bezemer (2004), The development of the rural non-farm economy in developing countries and transition economies: Key emerging and conceptual issues, University of Greenwich, London, http://projects.nri.org/rnfe/pub/papers/keyissues.pdf. Dervis, K. and Z. Qureshi (2016), Income distribution within countries: Rising inequality, Brookings Institution, Washington, DC, www.brookings.edu/wp- content/uploads/2017/12/income-i nequality-w ithin- countries_august-2016.pdf. Despeisse, M. et al. (2017), Unlocking value for circular economy through 3D printing: A research agenda, Technological Forecasting and Social Change, Elsevier. Eichengreen, B. and P. Gupta (2011), The service sector as India’s road to economic growth, National Bureau of Economic Research, Cambridge, Massachusetts, www.nber.org/papers/w16757.pdf. Ghani, E. and S.D. O’Connell (2014), Can service be a growth escalator in low income countries? World Bank, Washington, DC, http://documents.worldbank.org/ curated/en/823731468002999348/pdf/WPS6971.pdf. Gill, I. and H. Kharas (2007), An East Asian renaissance: Ideas for economic growth, World Bank, Washington, DC, https://openknowledge.worldbank.org/bitstream/ handle/10986/6798/399860REPLACEM1601OFFICAL0USE0ONLY1.pdf. Haggblade, S., P.B.R. Hazell, and T.A. Reardon (2007), Transforming the rural nonfarm economy: Opportunities and threats in the developing world, International Food Policy Research Institute/Johns Hopkins University Press, Washington, DC, www.ifpri.org/cdmref/p15738coll2/id/31461/filename/31462.pdf. IMF (2017), World economic outlook, April 2017, International Monetary Fund, Washington, DC, www.imf.org/en/Publications/WEO/Issues/2017/04/04/world- economic-outlook-april-2017 (accessed 1 June 2018).

Technology and Its Implications 145 Krusell, P. et al. (2000), “Capital-Skill Complementarity and Inequality: A Macroeconomic Analysis”, Econometrica, Vol. 68/5, The Econometric Society, New York, www.jstor.org/stable/2999442. Lakkad, M. et al. (2013), Manufacturing reinvented: How technology is changing the future of manufacturing, Tata Consultancy Services, Bangalore, http://info.tcs. com/rs/120-PTN-868/images/Manufacturing%20Reinvented%20-%20How%20 Technology%20is%20changing%20the%20Future%20of%20Manufacturing.pdf. Mehta, A., J. Felipe, and C. Rhee (2015), “The Manufacturing Conundrum”, World Bank, Washington, DC, http://blogs.worldbank.org/jobs/manufacturing- conundrum (accessed 16 January 2018). MoF (2018), Is there a “late converger stall” in economic development?: Can India escape it?, Economic Survey 2017–2018, Ministry of Finance, India http://mofapp. nic.in:8080/economicsurvey/pdf/068-081_Chapter_05_ENGLISH_Vol_01_201718.pdf. Muro, M. et al. (2015), America’s advanced industries, Brookings Institution, Washington, DC, www.brookings.edu/wp-content/uploads/2015/02/Advanced Industry_FinalFeb2lores-1.pdf. Oxford Martin School and Citi GPS (2016), Technology at Work v2.0: The Future Is Not What It Used to Be, Oxford Martin School and Citi GPS, Oxford, www. oxfordmartin.ox.ac.uk/downloads/reports/Citi_GPS_Technology_Work_2.pdf (accessed 31 May 2018). Rodrik, D. (2013), “Unconditional Convergence in Manufacturing”, Harvard Quarterly Journal of Economics, Vol. 128/1, pp. 165–204, http://j.mp/2o3W1Gy. Rodrik, D. (2014) “The Past, Present, and Future of Economic Growth, Challenge”, Taylor & Francis Online, Vol. 57/3, pp. 5–39, DOI: 10.2753/0577-5132570301. Rodrik, D. (2015), Work and human development in deindustrializing world, United Nations Development Programme, New York. Rodrik, D. (2016), “Premature Deindustrialization”, Journal of Economic Growth, Vol. 21/1, Springer, New York, https://drodrik.scholar.harvard.edu/files/dani- rodrik/files/premature_deindustrialization.pdf (accessed 31 May 2018). Timmer, P. and S. Akkus (2008), The structural transformation as a pathway out of poverty: analytics, empirics and politics, Center for Global Development, Washington, DC, www.cgdev.org/sites/default/files/16421_file_structural_transformation. pdf (accessed 31 May 2018). UN DESA (2017), The impact of the technological revolution on labour markets and income distribution, United Nations Department of Economic and Social Affairs, New York. UNDP (2013), Humanity divided: Confronting inequality in developing countries, United Nations Development Programme, New York, www.undp.org/content/ dam/undp/library/Poverty%20Reduction/Inclusive%20development/Humanity %20Divided/HumanityDivided_Full-Report.pdf (accessed 31 May 2018). UNICRI (2010), Combating poverty and inequality: Structural change, social policy and politics, United Nations Research Institute for Social Development, Geneva, www.unrisd.org/unrisd/website/document.nsf/(httpPublications)/BBA 20D83E347DBAFC125778200440AA7. Yusuf, S. (2017), Automation, AI, and the emerging economies, Center for Global Development, Washington, DC, www.cgdev.org/publication/automation-ai-and- emerging-e conomies (accessed 5 November 2018).

Conclusions Oscar A. García and Prashanth Kotturi

The Story This Far The discussion has encompassed five chapters in total which cover a wide array of topics. Hence, this chapter will take the opportunity to recap the discussion this far. What is the challenge at hand? – The first chapter delved extensively into the kind of challenges and opportunities that Sustainable Development Goals (SDGs) have thrown at evaluators. Sustainable development is at the heart of SDGs, and this sustainability agenda spans economic, social and, most importantly, environmental dimensions. Looking at the multidimensionality of sustainability implies taking a cross-sectoral and systems view of the goals. In turn, such systems view puts complexity as a key theme as well as challenge in the process of measuring progress on SDGs. To that end, SDGs have set ambitious demands on evaluators and development practitioners. Evaluating progress on SDGs and dissecting the inherent complexity will require a data paradigm which can fulfil the ambitious agenda that has been set. However, the ambition and the challenges go beyond the sheer number of indicators that SDGs will entail and the volume and variety of data that will be required. The world of development evaluation faces the twofold challenge of lack of data to meet evaluation needs within the complexity paradigm and the speed with which such data has to be utilized. The traditional methods of data collection, analysis and information sharing may be insufficient and thus call for newer ways to meet these challenges. Why focus on information and communication technologies (ICTs)? – ICTs come into picture with their potential to transform the way in which data collection and analysis take place. The second chapter dealt extensively with the theory and practice of Information and Communication Technologies for Evaluation (ICT4Eval). From tools as simple and ubiquitous as mobile phones to relatively complex tools such as remote sensing and advanced machine learning for analysis can be deployed in the field of development and especially in evaluation. These tools may provide and analyse data on indicators which were hitherto unavailable and do so with speed. There is scope not just to introduce newer tools such as machine learning

Conclusions 147 but also use older tools such as mobile phones in newer ways (refer to Case 7 in Chapter 2). However, ICTs are not merely a means of collecting and analysing data but also a source of data by themselves, as the third chapter on big data elaborates. Beyond filling glaring gaps in data for measuring SDGs, ICTs can help us break down complexity and collaborate on a global scale to meet ambitions set in measuring SDGs. The near ubiquity of the Internet and the advent of cloud storage and cloud-based applications mean that development partners across the world can communicate, collaborate seamlessly thus tackling complexity through a multi-i nstitutional effort. The prevalence of application programme interfaces (APIs) enables datasets of various kinds, produced by various actors around the world, to be used to track numerous indicators while triangulating analysis and making it more rigorous. In addition, ICTs such as machine learning can help break down troves of data and detect complex interrelationships between various variables, as has been witnessed in the case studies in Chapter 2. In addition, given that the ICTs available today can help evaluators collect and analyse a wide variety of data, existing evidence can be triangulated with more ease. These are just some ways in which complexity can be dealt with. As case studies have shown, ICTs can be used for a wide variety of projects and indicators ranging from healthcare, nutrition, food security, environment to rural development. Thus, as some ICTs become more accessible, affordable and easy to use, there is merit for evaluators to seriously consider their potential role in meeting the challenges in evaluating SDGs. Development practitioners in general and evaluators in particular are already exploring the usage of various ICT tools as evidenced by the numerous examples in this book. Where are some of the constraints with ICT4Eval? – The fact remains that evaluators are not technology experts themselves. The use of ICTs brings with it issues of capacity of evaluators to internalize the usage of the tools and use them to their full potential. In such times, evaluators and organizations are faced with trade-offs around investing in building internal capacities or hiring external specialization for conducting evaluations using specific tools. Even if external expertise is hired evaluators and technical experts might lack a common language to communicate each other’s requirements. ICTs often come with sunk costs which have to be justified over a large number of evaluations. Thus, there are some constraints on how freely evaluators can resort to ICTs for their evaluation exercises. In addition, the extensive literature on applications of big data in development has paid very little attention to how big data analytics can be applied to evaluation. While predictive analytics are well developed, much less progress has been made on causal (attribution) analysis. To address this shortcoming, the relationship between the people generating data and evaluators needs to be scrutinized.

148 Oscar A. García and Prashanth Kotturi Another overarching constraint lies in the ethics, privacy and bias related concerns that using data from ICTs and/or interpreting data using ICTs may bring. Development professionals deal with some of the most vulnerable populations and evaluators may be privy to some of the intricate details of their beliefs, opinions and lives. Hence evaluators will have to remain conscious of the shifting paradigm of data and create systems, policies and processes in mainstreaming privacy and ethical concerns into evaluations. Technology also holds the possibility of adding and amplifying third parties’ biases to data collection and analysis process, which has been explored at length in chapter 4. How do evaluators get around these constraints? – Evaluators should not be expected to become experts in usage of ICTs. However, a conceptual idea of the workings of technology tools will help evaluators better enunciate their requirement and bridge the gap with those who possess specialist ICT skills. Here, the use of programme theories of change will also help create a simple yet commonly understood framework to bridge the language/specialty divide between ICT specialists and evaluators. At an organizational level, there are two things that will have to be undertaken that will help organizations overcome the constraints mentioned earlier. The first is to mainstream the usage of ICTs into the monitoring and implementation of projects. For example, geo-referencing the location of development interventions at the design and implementation phase for more efficient deployment of remote sensing in evaluations or integrating business intelligence and enterprise resource planning solutions to facilitate consistent and accurate reporting of administrative as well as project related data. This will help mainstream the ICT4Eval agenda within projects and build an organizational culture around innovation and lead to enhancement of capacity over a period of time. Second, there is a need to build cross-organizational and shared capacities to bring down the fixed costs of ICTs while also providing a reliable source of expertise. Changing frontiers beyond evaluations. Technology is changing the frontiers beyond evaluation as well. Rapid changes in the structural transformation processes resulting from rise of automation and artificial intelligence will have a significant impact on economic sectors and trajectories of countries. Technology could disproportionally benefit certain sections of the society while leaving others behind and this was demonstrated in chapter 5, with illustration of ‘hollowing out of the middle’. This runs contrary to the fundamental principle of SDGs of ‘leaving no one behind’. Development architecture and governments will have to come to grips with the reality that technology is the driving force behind these changes at a global scale, for better or worse.

Key Takeaways Technology’s effect will be felt within and much beyond evaluation itself. It is already changing the fundamentals of development pathways that countries will take in the future. Similarly, technology will change the future

Conclusions 149 of evaluation as well. In fact, given the challenges posed by data needs of SDGs evaluators will have to make conscious efforts to disrupt status quo in the field of evaluation. In doing so, evaluators will have to remember a few overarching messages. Variety of instruments are available for evaluators that enable more data to be collected, often remotely, and to be processed faster. The book presents only a select number of ICT instruments available for development evaluation based on their overall accessibility. The use of instruments may range from something as simple as a mobile phone for data collection or even dissemination to usage of deep learning, an offshoot of machine learning, for analysis of large datasets. In the end, the data that is required to answer the evaluation questions is what will dictate the kind of instruments required to be used. Newer tools will become available as we move forward. ICTs are not a panacea, but only a means to an end. Technology will only be as good as the soundness of the evaluation methodological approach and the evaluators who use it, and evaluations of development programmes will still need to be grounded in robust theory. What technology enables us to do is go further in exploring the theory of change regarding development programmes, and with greater rigour. The use of ICT does not absolve the evaluator from thinking carefully about the design and assumptions needed to rigorously assess the causal effect of development policies strategies and programmes evaluated. Some inherent risks such as inbuilt biases, and ethical and privacy breaches, make it vital for development practitioners to continue to consider carefully what to study and why, while incorporating robust mechanisms to safeguard the integrity of the evaluation process and the rights of the beneficiaries we work for. Evaluations will need to remain people-centric, especially for the target groups, rather than technology-c entric. It is often easy to get lost in the flurry of ICTs that one might be able to use. However, the end benefit of an evaluation lies in its ability to hear from the target groups on the ground. Evaluations, when truly participatory, will reflect the viewpoints of a wide variety of stakeholders. What this would mean is that evaluators should not use technology solely for the sake of innovation.

Looking Ahead This book has focused on new and old ICTs that are available for evaluations. However, in reality, technology is just a means to either produce or analyse data and disseminate information and knowledge. The underlying focus has been on data. To put it in a sentence, “the purpose of the book has been for evaluators to critically look at scope of the data architecture that they currently work with”. The term architecture goes beyond merely collecting and analysing data and delves into “how data is collected, stored, transformed, distributed and consumed. It includes rules governing

150 Oscar A. García and Prashanth Kotturi structured formats, such as databases and file systems, and the systems for connecting data with the business process that consume it”. Hence, it goes beyond just a transactional focus on data. Evaluators will continue living in a world which becomes even more populated with data. This data will be generated from a wide variety of sources. It will give rise to the scope for measuring indicators which were previously not possible. In the tsunami of data that exists and even more that will come into being, a trickle might be usable for the purpose of development evaluation. A trickle which, in reality, is incredibly large in absolute terms and the data immensely complex and muddled at the same time. This begs the larger question if evaluators at large are prepared to tackle such data and deal with ICTs and create the data architecture for the present and the future. Evaluators have spent a better part of last two decades in debating on evaluation methods and methodologies. In the course of such debate, evaluation as a profession has come a full circle and formalized itself into a dedicated practice within the development sphere. In the coming years, evaluators will need dialogue on the data architecture that will fit into the methodologies and methods that evaluators have devised. It is also possible that vice versa scenario of evaluators retrofitting the methodologies to newer sources of data may also occur. Organizations and evaluators themselves will have to align their competencies towards the data architecture of the future. There will be one overarching question each for development organizations and individual evaluators, respectively. For organizations, the real question will be on how to invest their limited resources in a way as to maximize the leverage the use of ICTs to create this architecture. Organizations will need to invest in everything ranging from data infrastructure, to data policies, organizational competencies and partnerships. For individual evaluators, the broader question will be on the preparation needed to align their competencies with the architecture that will shape up? The sustainability agenda took about 28 years from the dissolution of the landmark Brundtland commission, to be mainstreamed into our broader development goals. However, the time horizon of 15 years for achieving SDGs is much shorter and more ambitious with the challenges that we face today. This is a recognition that complacency is not an option and that sustainable development is non-negotiable. It is upon evaluation to hold policymakers accountable for the goals they have collectively agreed to and provide consistent feedback on what works in their endeavours to achieve these. Evaluators will have to act resolutely and quickly to meet this demand, and ICTs can certainly play a very important role to do both. This is certainly not a dialogue that can be had in the course of one book and will be an ongoing discussion. Dialogue will have to move beyond the realm of methodology and go into the realm of data architecture. It will push evaluators out of their comfort zone. But as has been demonstrated throughout this book, technology will move ahead with or without our

Conclusions 151 cognizance and transform our profession and the societies and economies at large. SDGs will have to be achieved, with or without evaluators in the loop. The achievement of the SDGs could be accelerated if lessons from what has worked, what has not worked and more importantly the analysis of factors that led to improved performance are taken into account by policymakers. Hence, there is a point in strengthening the capacity of evaluators to undertake more comprehensive, systemic and dynamic analysis by the use of new ICTs. Status quo is not an option. This book is just another step towards an otherwise broader debate.

Index

Note: Bold page numbers refer to tables, Italic page numbers refer to figures and page numbers followed by “n” denote endnotes. Agenda 2030 9, 10, 24, 68–9 agricultural development 130 “algorithmic accountability” 119 American Evaluation Association 105 AOI see area of interest (AOI) APIs see application program interfaces (APIs) application program interfaces (APIs) 19, 29, 147 Areais, A. 77 area of interest (AOI) 58, 59, 61 Athey, S. 50 Bamberger, M. 62, 78 biases and ethics: access, inclusiveness of 116–18; Big Data, application of 118–20; data and technology 115–16; data privacy 123–5; data subjects’ rights 121–3; information technology access 112–15 Big Data algorithms 119 Big Data analytics: assumptions of 102; Big Data Literature 77–9; vs. conventional evaluation data 81; data analytics cycle 82; definition of 80; demystifying Big Data 79–89; development evaluation 91–6; evaluation activities 97; exercise caution 96–102; exponential growth of 79; overcoming barriers 102–8; programme monitoring and evaluation 82–3; revolution 89–91; velocity/volume/variety 80 Big Data ecosystem 77, 90 Big Data Innovation Challenge 78 biodiversity 31, 32

biometric data 79, 112 “black box” algorithms 124 Bloomberg website 77 Boyd, D. 119 case study: environmental evaluation, geospatial analysis 30–8; evaluator and evaluand 42–7; fine-scale humanitarian maps 55–61; fragile and conflict environments 39–42; impact evaluations 50–5; machine learning, role for 47–9; real-time data during fieldwork 62–5 change theory 44, 45, 69n6, 85, 94 Child Welfare System 85, 87 chronic humanitarian crisis 39 climate change 9, 18 cloud computing 28, 38 CMAM see Community- based Management of Acute Malnutrition (CMAM) CNNs see convolutional neural networks (CNNs) commercial predictive analytics 85 commodity-based agriculture 139 Community-based Management of Acute Malnutrition (CMAM) 39 complexity and dynamic interaction 14–18 complexity-responsive evaluations 94, 95 computer algorithms 49 computer-assisted data analysis 29 conventional evaluation data 81 convolutional neural networks (CNNs) 57–9 cost-effectiveness 42

154 Index Country Strategy and Programme Evaluation (CSPE) 43, 45 Crawford, K. 119 cross-border data transmission 123 CSPE see Country Strategy and Programme Evaluation (CSPE) cutting-edge technology 68 data analysis: applications of 91; approaches 99–101; criticism of 104; cycle 82, 86; and digital survey applications 4; machine learning revolution 29; quality control of 95; SenseMaker® Explorer software 44; social media/radio broadcasts 19 data collection 94–5; applications 28; automation and integration of 27; big data techniques for 96; cloud computing 28; and data analytics 91; quality control of 95, 97; remote sensing 27–8; SenseMaker® approach 45; wireless devices and communication 28 “data exhaust” 113, 116, 120 data generation 88–9 data-intensive research approaches 124 data privacy 6, 123–5 data quality and integrity 42 deep learning 7, 38, 58–9, 149 Demographic and Health Surveys Program (DHS) 56, 57 demystifying Big Data: and data analytics 80–6; data continuum 86–8; defining Big Data and NIT 79–80; NIT ecology and linkages 88–9 descriptive and exploratory analysis 82, 83–4 Development Assistance Committee of the Organisation for Economic Co- operation and Development (OECD- DAC) 18 DHS see Demographic and Health Surveys Program (DHS) “digital development” 111 disaster management committees (DMC) 64 dissecting complexity 18, 19 dissemination and learning 29–30 DMC see disaster management committees (DMC) “do no harm” principle 40 “double machine learning” (DML) 52, 53, 70n11 “double selection” (DS) 52, 70n11

economic development: development partners, implications for 142–3; disruptions and moving forward 139–41; economic growth, process of 129; GDP and per capita income 133–7; Luddite’s nightmare/passing phenomenon 137–8; structural transition and pathways for 128–33; sustainable rural development 138–9 education and employability 53, 139–40 enabling environment, evaluation 21, 142 environmental evaluation, geospatial analysis: international waters focal area 35–8; land degradation interventions 35; protected areas and protected area systems 32–4 European Space Agency 4, 58 evaluation 2.0 68–9 evaluation activities 20, 97, 97–8 evaluation and data diagnostics 85–6 evaluation design 98–9, 102, 104, 118 evaluation environment 22 evaluator and evaluand 42–7 Excel-based quantitative analysis 29 faster feedback loops 18–19 field teams, engagement with 40, 42 food security 9, 39, 56, 57, 61 Food Security and Nutrition Analysis Unit for Somalia 39 Ford and MacArthur Foundations 124 forest change analysis 32 found data 117 GDP see gross domestic product (GDP) GDPR see General Data Protection Regulation (GDPR) gender data gap analysis 111 gender-responsive evaluation 12 General Data Protection Regulation (GDPR) 118, 122–4 geographic information system (GIS) 31, 66, 92 geospatial analysis 30–2, 37 geospatial methods 31, 32, 35, 38 Gertler, P.J. 62 “gig economy” 140 Gill, I. 143n1 GIS see geographic information system (GIS) Global Environment Facility (GEF): evaluation study of 35; international waters focal area 35–8; land degradation interventions 35;

Index 155 protected areas and protected area systems 32–4 Global Environment Facility Independent Evaluation Office (GEF IEO) 32, 36, 38 global evaluation community 10–11, 18 Global Partnership for Sustainable Development Data (GPSDD) 111 global partnerships 2, 19 Global Positioning System (GPS) 41, 57, 89 Google Maps 30, 58 governance, political and social stability 141 GPS see Global Positioning System (GPS) GPSDD see Global Partnership for Sustainable Development Data (GPSDD) gross domestic product (GDP) 129, 133, 134, 135 Guijt, I. 62 Guterres, António 18

institutional framework 22 Inter-Agency Expert Group on SDG Indicators (IAEG-SDGs) 20 International Fund for Agricultural Development (IFAD) 4, 27, 43, 44 International Monetary Fund (IMF) 134 International Organization on Migration (IOM) 121 international waters focal area 35–8 Internet 3, 19, 28, 76, 79, 84, 116, 117, 147 Internet exchange points (IXPs) 67 Internet of things (IoT) 38, 76, 79 intra-labour market polarization 136, 137, 140 IOM see International Organization on Migration (IOM) IoT see Internet of things (IoT) IXPs see Internet exchange points (IXPs)

Heinemann, E. 62 highly productive 132 high-resolution spatial mapping 56–7, 59 Hiran programme 40 HLPF see UN High-Level Political Forum on Sustainable Development (HLPF) Holland, J. 62 “hollowing out the middle” 136 household survey 55, 62–5 humanitarian systems 121, 123

labour, absorption of 133 labour share, in national income 134–6 land degradation interventions 35 learning across sites/locations 42 “leaving no one behind” 11–12, 14, 143, 148 Le Blanc, D. 14, 18 Letouzé, Emmanuel 77 Living Standards Measurement Study (LSMS) 56–7 low- and middle-skilled workers 136 LSMS see World Bank’s Living Standards Measurement Study (LSMS)

IAEG-SDGs see Inter-Agency Expert Group on SDG Indicators (IAEG-SDGs) ICT see Information And Communication Technology (ICT) IFAD see International Fund for Agricultural Development (IFAD) Imbens, G.W. 50 IMF see International Monetary Fund (IMF) income distribution 133–4, 141, 142 individual level, capacity development strategy 22 industrialization, beginning of 128–9 Industrial Revolution 138, 141, 142 information and communication technologies (ICTs) 30, 62, 63, 65, 68, 113, 142

Jackson, S. 77 Kharas, H. 143n1

Mabry, L. 62 McGee, R. 62 machine algorithms 19, 49 machine learning: address 51; algorithms 6, 48, 51, 52–4; applications of 54; Big data, artificial intelligence 118–20; definition of 51; description of 5; evidence, systematic review of 47–9; to improve impact evaluations 50–5; and neural networks 19, 85; revolution 29 machine learning-driven modelling 55 machine learning methods 50–1, 57 Marr, B. 77 massive open online courses (MOOCs) 128

156 Index MDGs see Millennium Development Goals (MDGs) Meier, Patrick 77 MERL see monitoring, evaluation, research and learning (MERL) M&E systems 92 methodology, strengths of 44 Millennium Development Goals (MDGs) 2, 3, 9, 10, 20, 68 monitoring, evaluation, research and learning (MERL) 118 MOOCs see massive open online courses (MOOCs) Mullainathan, S. 50 Narrative-based interviewing techniques 43, 46 National Aeronautics and Space Administration (NASA) 27, 38 National Evaluation Policy (NEP) 21 National statistical systems 12 NEP see National Evaluation Policy (NEP) neural networks 7, 19, 52, 58 New information technology (NIT) 76–7, 79, 80, 96, 104, 106, 111 nightlight data 58 NIT see New information technology (NIT) non-linear estimation models 55 Norwegian Refugee Council 121 OFWs see overseas foreign workers (OFWs) Olazabal, V. 62, 78 O’Neil, Cathy 119 Open Data Kit ecosystem 28 open data movement 28, 117 OpenStreetMap information 55, 59 OPM see Oxford Policy Management (OPM) organization’s operations 67–8 OTP see Outpatient Treatment Programme (OTP) Outpatient Treatment Programme (OTP) 39 outsourcing 66–8 overseas foreign workers (OFWs) 93 Oxfam GB 62, 63, 65 Oxford Policy Management (OPM) 50, 52–3, 55, 70 POs see producers’ organizations (POs) poverty among males 13, 13, 14

predictive analysis 77, 84–5, 102 “premature deindustrialization” 131 private-sector statistical models 124 producers’ organizations (POs) 43, 44 programme evaluation 50, 54, 78, 96, 99–101 proof-of-concept projects 77, 96 Puntland programme 40 QCA see qualitative comparative analysis (QCA) qualitative analysis method 29, 88 qualitative comparative analysis (QCA) 29, 99 qualitative data analysis 29 “quantified community” 79, 89–90 “quantified self ” 79, 89 quasi-experimental research design 32 Raftree, L. 62, 78 Raworth, Kate 132 real-time data during fieldwork 42, 62 red, green and blue (RGB) bands 58 Red Rose software 121 remote sensing and geographic information system (RS&GIS) 66 remote sensing systems 27–8, 35 Renger, R. 31 “Responsible Data” 121 RGB bands see red, green and blue (RGB) bands RS&GIS see remote sensing and geographic information system (RS&GIS) Rugh, J. 62 SAE see small-area estimates (SAE) satellite data-based image recognition 55 satellite remote sensing, application of 31 Save the Children programmes 39–42 SDGs see Sustainable Development Goals (SDGs) seamless data 117–18 “second machine age” 138, 141 SenseMaker®: approach 45; Collector application 44; methodology 43–5, 69n4 services-led structural transformation 132 SFV see simulated field visit (SFV) Siegel, E. 77 simulated field visit (SFV) 40, 41 skill types, definition of 143n2

Index 157 small-area estimates (SAE) 56 Snowden, Dave 69n4 social media 19, 30, 84, 101, 103 social security and safety nets 140–1 software and participatory analysis 44 Somalia 39–42 Spiess, J. 50 structural transformation process 129, 148 Sunk Costs vs. Opportunity 65–6 supply-driven app development 102 sustainable development goals (SDGs): achievement of 29; Agenda 2030 9, 10, 24; broad, inclusive process 10; complexity and dynamic interaction 2, 14–18; comprehensive and integration 10; country-level review of 23; evaluating progress 146; fundamental principle of 148; global evaluation community 10–11; new technologies, advantage of 18–24; review process 23; sustainability agenda of 132; systematic follow-up and review 11–14; transformational impact 14–18 systematic follow-up and review 11–14 “systems view” of evaluation 19 taxation and fiscal systems 141, 142 theory-based approach 103 theory of change 44, 45, 69n6, 85, 94 3D printing/additive manufacturing 132, 142 top-down extractive approaches 96 tradability 132, 133 traditional data 117 trained vs. test models 59 transfer learning 57–9 transformational impact 9, 14–18 2016 World Development Report 78 2012 White Paper “Big Data for Development” 77

UNCCD see United Nations Convention to Combat Desertification (UNCCD) UNCTAD see United Nations Conference on Trade and Development (UNCTAD) UN Evaluation Group 13 UN Global Pulse 50, 77, 78, 96 UN High-Level Political Forum on Sustainable Development (HLPF) 11 United Nations Conference on Trade and Development (UNCTAD) 123 United Nations Convention to Combat Desertification (UNCCD) 38 United Nations system 30 United States Agency for International Development (USAID) 62, 78 VAM see Vulnerability Analysis and Mapping (VAM) van den Berg, Rob D. 10 Van Hemelrijck, A. 62 velocity/volume/variety 80 virtual reality, emergence of 30 VNRs see voluntary national reviews (VNRs) voluntary national reviews (VNRs) 11, 12 voluntary organizations for professional evaluation (VOPES) 20–4 Vulnerability Analysis and Mapping (VAM) 55–9, 61 Weapons of Math Destruction (O’Neil) 119 WFP see World Food Programme (WFP) wireless devices and communication 28 World Bank 40, 50, 55, 78, 90 World Food Programme (WFP) 55–7 World Inequality Report 133 World Input-Output Database 143n2 “world of leisure” 138