Evaluation Management in South Africa and Africa [1 ed.] 9781920689513, 9781920689506

This blind peer reviewed book systematically records, analyses and assesses for the first time in a single volume the im

171 64 8MB

English Pages 662 Year 2014

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Evaluation Management in South Africa and Africa [1 ed.]
 9781920689513, 9781920689506

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Evaluation Management in South Africa and Africa

Fanie Cloete • Babette Rabie • Christo de Coning EDITORS

Evaluation Management in South Africa and Africa Published by SUN MeDIA Stellenbosch under the SUN PRESS imprint. All rights reserved. Copyright © 2014 Contributing authors The author(s) and publisher have made every effort to obtain permission for and acknowledge the use of copyrighted material. Please refer enquiries to the publisher. No part of this book may be reproduced or transmitted in any form or by any electronic, photographic or mechanical means, including photocopying and recording on record, tape or laser disk, on microfilm, via the Internet, by e-mail, or by any other information storage and retrieval system, without prior written permission by the publisher. Views expressed in this publication do not necessarily reflect those of the publisher. First edition 2014 ISBN 978-1-920689-50-6 ISBN 978-1-920689-51-3 (e-book) Design and layout by SUN MeDIA Stellenbosch SUN PRESS is an imprint of AFRICAN SUN MeDIA. Academic, professional and reference works are published under this imprint in print and electronic format. This publication may be ordered directly from www.sun-e-shop.co.za. Produced by SUN MeDIA Stellenbosch. www.africansunmedia.co.za africansunmedia.snapplify.com (e-books) www.sun-e-shop.co.za

Contents Foreword

xiii

Preface

xvi

Acknowledgement

xviii

Biographical and Contact Details of Editors Chapter Contributor Case Contributors

PART I:



Conceptual Approaches to Evaluation

CHAPTER 1 CHAPTER 2

The Context of Evaluation Management Babette Rabie & Ian Goldman

Historical Development & Practice of Evaluation

Charline Mouton, Babette Rabie, Fanie Cloete & Christo de Coning

Theories of Change and Programme Logic

Fanie Cloete & Christelle Auriacombe

CHAPTER 3

Case: Developing impact theory for a social protection programme in Maputo, Mozambique

Lucilla Buonaguro & Johann Louw

Evaluation Models, Theories and Paradigms

Babette Rabie

CHAPTER 4

Case: Do Managers Use Evaluation Reports? A Case Study of a Process Evaluation for a Grant-Making Organisation

Asgar Bhikoo & Joha Louw-Potgieter

CHAPTER 5 CHAPTER 6 CHAPTER 7

Programme Evaluation Designs and Methods

Johann Mouton

Indicators for Evidence-Based Measurement in Evaluation

Babette Rabie

Institutional Arrangements for Monitoring and Evaluation

Christo de Coning & Babette Rabie

Evaluation Professionalisation and Capacity-Building

CHAPTER 8

Donna Podems & Fanie Cloete

Case Study of VOPES

Jim Rugh

PART II:

xx xxiii xxv



CHAPTER 9

3 28 79 103 116 152 165 204 252 314 339

Public Sector Evaluation in South Africa and Africa Evaluation in South Africa

Ian Goldman, Sean Phillips, Ronette Engela, Ismail Akhalwaya, Nolwazi Gasa, Bernadette Leon, Hassen Mohamed & Tumi Mketi

Case Study of SAMEA

Raymond Basson & Jennifer Bisgard

351 379

Evaluation in selected African Countries

Fanie Cloete

CHAPTER 10

10.1 10.2 10.3

Benin

Aristide Djidjoho & David Houins

Ghana

Charles Amoatey

Kenya

Samson Machuka, Boscow Okumu, Francis Muteti, Viviene Simwa & David Himbara

388 393 396

v

10.4 10.5 10.6 10.7 10.8

CHAPTER 10

10.9 10.10 10.11

Senegal

Momar Ndiaye & AW Boubacar

Uganda

Albert Byamugisha & Narathius Asingwire

AfrEA

Issaka Traore & Nermine Wally

Cote D’Ivoire: RISE

Samuel Kouakou

Egypt: EREN

Maha El-Said & Nivine El-Kabbag

Kenya: ESK

Jennifer Mutua, Julius Nyangaga, James Mwanzia Mathuva & Samuel Norgah

Morocco: MEA

Ahmed Bencheikh

Niger: ReNSE

Boureima Gado

Senegal: SenEval 10.12 Maguette Diop, Soukeynatou Somé Faye, Ian Hopwood, Ousseni Kinda, Monica Lomeña-Gelis, Guennolet Boumas Ngabina, Ndeye Fatou Diop Samb & Moctar Sow

Strategic Lessons for the Future of Integrated Evaluation Management

CHAPTER 11

PART III:

Fanie Cloete, Babette Rabie & Christo de Coning



12.2 12.3 12.4 12.5 12.6

413 423 429 436 446 454 458 468

The use of ICT and software in M&E Fanie Cloete

DFID Programme Theory of Change: Roads in East DRC

Isabel Vogel & Zoe Stephenson

World Bank Implementation Completion Report (ICR) Review: Dar es Salaam Water Supply and Sanitation Project

IEG

UNICEF Review of Ipelegeng Programme in Botswana

UNICEF

UNDP Contribution to Development Results in Angola

UNDP

Alternatives to the Conventional Counterfactual

Michael Bamberger, Fred Cardin & Jim Rugh

482 494 503 519 528 534

12.7

Made in Africa Evaluation: Uncovering African Roots in Evaluation Theory and Practice

546

12.8

I nstitutionalisation Philosophy and Approach Underlying the GWM&ES in South Africa

554

I nstitutionalisation of the Eastern Cape Provincial Government M&E System in South Africa

572

CHAPTER 12

12.9 12.10

Bagele Chilisa & Chiko Malunga

Ian Goldman & Jabu Mathe

Candice Morkel

Community Evaluation in South African Local Government

Edwin Ijeoma

585

Integrating Monitoring and Evaluation into the Normal Functioning of an 12.11 NGO: A Grassroots Approach

595

Evaluating the Public Sector Infrastructure Delivery Improvement 12.12 Programme (IDIP) in South Africa

605

Ros Hirschowitz and Anne Letsebe

Jan Koster

12.13

vi

406

Case Studies on M&E Issues 12.1

Index

399

Proposed Impact Evaluation of the Grade R School Programme

Servaas van der Berg & Marisa Coetzee

620 629

Figures, Tables & Boxes FIGURES 1.1

The policy life cycle

9

1.2

Rakoena’s typical programme/project controlling process

9

1.3

M&E dimensions at different stages of the policy/programme cycle

21

3.1

CLEAR Theory of Change

82

3.2

Theory for Change of the Hunger Project in Africa

83

3.3

Theory of Change for the Impact Investing Initiative

84

3.4

Theory of Change for Enhancing Results-based Management in Organisations

85

3.5

Macro Theory of Change

89

3.6

Conceptual theory and Action Theory

90

3.7

Introduction to programme logic

92

3.8

CIDA logic model

92

3.9

The Human Performance Technology (HPT) Model

93

3.10

Key performance information concepts

94

3.11

Results Chain

94

3.12

Concepts used in theory of change thinking

96

3.13

Generic theory of change thinking model

99

3.14

Theory of change and the project cycle

99

3.15

Types of evaluation

101

3.16

Outline of the South African Government’s Evaluation Programme logic

102

3.17

Final programme impact theory

108

4.1

The Evaluation Tree

118

4.2

The Logic Model

130

4.3

CMOC Framework

133

4.4

Stake’s responsive clock

138

4.5

The transparent box paradigm (Love 2004:66)

146

4.6

Applying mixed methods to develop alternative counterfactual designs

150

5.1

The logic of evaluation design

169

5.2

On methods and sources for data collection in evaluation studies

170

5.3

A typology of evaluation purpose

174

5.4

A decision-framework for selecting an evaluation design

175

5.5

Definition of impact assessment

182

5.6

The true experimental randomised design

186

5.7

Non-equivalent group design

190

5.8

Time-series design

194

5.9

Vehicular Accident Rates

194

6.1

The programme logic chain

208

6.2

Examples of useful indicator meta-data

220

vii

FIGURES (continued)

6.3

Selected indicators from the Millennium Development Goal Initiative

221

6.4

Selected indicators from the World Bank’s World Development Indicators

223

6.5

Selected thematic indicators adopted by the Council for Sustainable Development Indicator Initiative

224

6.6

Selected indicators for measuring peace and stability

224

6.7

Selected indicator from the South Africa’s Development Indicators Initiative

228

6.8

Select indicator from the Baltimore Neighbourhood Indicators Alliance Initiative

230

6.9

Consolidated Matrix for the GWM&ES

231

6.10

Informal to Formal Data collection methods

237

7.1

M&E and related functions

271

7.2

Data-collection methods

293

7.3

Typical relationship between strategic planning and reporting

296

8.1

Cumulative number of VOPEs in existence, by year

341

9.1

Data terrains in the GWM&ES

355

9.2

Main stakeholders in M&E in South Africa, and their sources of authority

358

9.3

Main roles of DPME

360

9.4

System barriers to M&E

362

9.5

Culture-based barriers to M&E

363

9.6

The MPAT levels for M&E

367

9.7

Scores on M&E (% of departments scoring at the levels shown in Figure 9.6)

368

9.8

Summary of results for 186 facilities monitored during 2013

369

9.9

Summary Scorecard for police stations

370

12.1

Duignan’s Outcomes System Diagram

484

12.2

The Gautrain Rapid Rail Link

486

12.3

The ASPIRE assessment

487

12.4

World Development Report 2011 framework

494

12.5

Roads and Development in Eastern DRC Theory of Change

495

12.6

The Africa Rooted Evaluation Tree

548

12.7

Culture-based barriers to M&E in South Africa

562

12.8

Pillars of the EC M&R Framework

567

12.9

Alignment to the POA i.r.o. Provincial M&R Framework

579

12.10

The IDIP Evaluation System

608

1.1

The relationship between Monitoring and Evaluation

7

1.2

Possible evaluation activities during the life of a programme

9

1.3

Relationships among selected evaluation-related concepts and processes

11

1.4

Benefits and perverse effects of performance management/M&E

17

1.5

Conditions under which performance management/M&E is possible and more challenging

18

1.6

Examples in South Africa of the use of M&E in Mintzberg’s different models of government

19

TABLES

viii

TABLES (continued)

1.7

Mitigation measures needed to manage possible unintended negative consequences of M&E systems (adapted from DPME, 2014)

22

2.1

Visits of international M&E experts to South Africa

67

3.1

Different applications of theory of change thinking

81

3.2

The logframe monitoring tool

97

4.1

Aim of evaluation at various stages of programme development

119

4.2

Evaluation studies commissioned during an intervention life cycle

120

4.3

Guidelines for completing a logframe matrix

132

4.4

Preordinate evaluation versus responsive evaluation

139

4.5

Structure of the evaluation report

141

4.6

Quantitative vs qualitative evaluation approaches

149

4.7

Programme Manager’s Perceptions of Recommendations Implemented plus Verifications

154

4.8

Evaluation Usages Perceptions of Programme Manager and 2006 Evaluator as Measured by Evaluation Use Questionnaire

155

5.1

Evaluation and Research Attributes

165

5.2

Intervention dimensions and evaluation criteria

167

5.3

Three primary uses/purposes of evaluation studies

172

5.4

Intervention dimensions, evaluation criteria and evaluation design types

176

5.5

Key evaluation activities that form part of Clarificatory Evaluation

177

5.6

Final analytical framework for assessment of evaluation design types

201

6.1

Focus of evaluation indicators

211

6.2

Strengths and weaknesses of outputs and outcomes

212

6.3

Criteria for rating answers to key evaluation questions

216

6.4

Selected SASQAF Indicators and Standards for quality Statistics

217

6.5

Indicator checklist from United Way of America

218

6.6

New Zealand checklist for assessing developed indicators

219

6.7

Potential Sustainability Indicators

232

6.8

Argentina Sustainable Development Indicators

233

6.9

Suggested indicators for measuring empowerment

233

6.10

Potential outcome indicators for measuring locality development

235

6.11

Sources of data

236

6.12

Types of Evaluation and their Uses

238

6.13

Foci of input indicators

242

6.14

Foci of process indicators

243

6.15

Foci of output indicators

244

6.16

Foci of outcome indicators

245

7.1

Advantages and disadvantages of internal and external evaluators

285

7.2

Choices facing evaluators

290

7.3

Template for reporting in the public sector

295

7.4

Strategies for enhancing evaluation capacity

299

8.1

Years VOPEs were formed, emphasising big and regional VOPEs

340

ix

TABLES (continued)

8.2

EvalPartners VOPE survey by the numbers

341

8.3

Survey responses and case studies received

341

8.4

Membership numbers of largest VOPEs

342

8.5

Involvement in policy advocacy on the part of VOPEs

342

10.1

Relationships with other organisations

452

12.1

Project Data

503

12.2

Economic Rate of Return (ERR)/Financial Rate of Return

511

12.3

Ratings

517

12.4

Alternative strategies for defining a counterfactual when statistical matching of samples is, and is not possible.

540

12.5

Timeline for start-up of the evaluation system in South Africa

558

12.6

Applicability of the analytical framework for institutionalising M&E

569

12.7

Logframe analysis

602

12.8

Theory of change for the implementation of the evaluation design

625

12.9

List of covariates

626

1.1

The power in measuring results

4

2.1

The General Accounting Office (GAO) and the Bureau of the Budget (BoB)

32

2.2

New Public Management (NPM)

33

2.3

Two examples of reform initiatives

41

5.1

Why genuine evaluation must be value-based

166

6.1

Measuring the perception of safety and security in a community

206

6.2

Focus on Efficiency of Effectiveness

212

7.1

Case example of readiness training of managers in Gauteng

258

7.2

Case example: The M&E policy framework of the Gauteng Provincial Government

260

7.3

Case: M&E capacity-building initiatives: The emerging evaluator scholarships of SAMEA

273

7.4

Case: Cooperative governance and the role of NGO’s in M&E: The challenge of the bottom-uppers

277

7.5

Case example of cooperative arrangements between Government and civil society: The SAMEA/DPME Standing Committee

278

7.6

Case study extract: Programme evaluation and the IDIP

283

7.7

M&E establishment appraisal checklist

306

9.1

DPME’s approach

361

9.2

Evaluation of government coordination systems

365

9.3

Good practice example: National Department of Environmental Affairs (DEA) (DPME, 2013d)

368

10.1

MEA Mission

446

10.2

Strategic Priorities 1 and 2 of MEA Action Plan: 2011-2013

447

10.3

Strategic Priorities 3 and 5 of MEA Action Plan 2011-2013

449

10.4

Strategic Priorities 2, 3 and 5 of MEA Action Plan 2011-2013

450

10.5

In line with Strategic Priorities 3, 4 and 5 of MEA Action Plan 2011-2013

450

BOXES

x

BOXES (continued)

12.1

An example of systematic positive bias in the evaluation of a food-for-work project

535

12.2

Identifying comparison groups for evaluating the impacts of a rural roads project in Eritrea

536

12.3

Lessons from Building Country M&E Systems

555

12.4

Sources of Power

557

12.5

Combined analytical framework

557

xi

Foreword Africa is flourishing – and so is the evaluation profession on the continent. Evaluation is a challenging, exciting endeavour. If well positioned for use, it has the power to change many lives for the better – or for worse. Mediocrity is therefore not an option. Evaluation has to build reliable, useful knowledge through credible, insightful practices. It calls for evaluators and evaluation managers with a unique combination of expertise. They need to be technically well versed in research as well as the evaluative methodologies unique to the profession. They need to work across boundaries – whether disciplinary, sectoral, geographic, cultural, social or political. They have to be smart, realistic about real life and constantly searching for what lies beyond the obvious. They need to integrate, consider systems and understand the notion of complexity. Integrity is a non-negotiable imperative. And they have to commit to doing their best to help make the world a better place. Well-designed and well-executed evaluations are therefore particularly important in countries with fragile institutions and large swathes of vulnerable populations. In such countries a strong tradition of evaluation in the public sector tends to be absent, and in particular evaluation based on local expertise and indigenous values and ways of thinking and working. And without sufficient academic nodes, opportunities to cultivate innovation and deeper engagement with evaluation issues in theory and practice – providing “thought leadership” to advance the profession – remain limited. This is still the case across much of Africa, although the fledgling evaluation profession that emerged on the continent at the turn of the century has been expanding impressively – driven by often unsung pioneers working in demanding social or political circumstances, or doing their best to promote and strengthen the profession in spite of heavy workloads and other challenges. This book is the result of one such effort. It is to be applauded and valued as one of the very first scholarly books on evaluation to emerge from Africa and solidly placed within the African context. It is of interest to all those who care about evaluation for development in Africa, irrespective of where they are based in the world. It is unique in its focus on and advocacy for an Africa-centred, public sector driven engagement with evaluation as an integral and essential part of management for development. The importance of this approach cannot be overstated on a continent where the real value of evaluation will only emerge if this is achieved. The authors highlight key practices and issues related to the institutionalisation of evaluation with this focus. They acknowledge that much can still be added, yet succeed in illuminating critical concepts and components related to evaluation management for development, placing them within the historical and current context in Africa. The authors also provide valuable new insights into how evaluation has evolved on the continent in tandem with international trends and events. Among others, they emphasise

xiii

the normative purpose of evaluation, and the need to move away from simplistic approaches and solutions when dealing with the evaluation of development within diverse societies with culturally distinct practices and value systems, and very different and dynamic contexts. They challenge those who believe in simple quantitative indicator-driven monitoring and evaluation that neglect that which is less tangible. They reinforce the need for institutional arrangements that facilitate participative approaches and recognise the value systems that support evaluation, and call on the state to use evaluation to improve the nature of its governance approaches in order to fulfil its functions effectively. They advocate for arrangements at the apex of government, civil society and business to enable reflection on the merit and value of evaluations, and promote their use. The authors might not agree on all aspects at all times, but where there are differences, they open avenues for consideration and debate. The occasional emphasis on South Africa is not out of place. It has been a leader on the continent in terms of the dynamism of its evaluation community as well as in the efforts of, among others, its Public Service Commission (PSC) and the Ministry and Department of Performance M&E (DPME) in the Presidency to institutionalise evaluation for the benefit of government performance. Future editions can strive to include and expand similar information from all African regions. Since 2007, starting with the release of a position statement at the Fourth African Evaluation Association Conference in Niamey, different groups of Africans have called for a greater emphasis not only on “Africa-led”, but also on “Africa-rooted” evaluation – challenging African evaluators to consider what evaluation would be like if it had originated in Africa, and to ensure that African knowledge systems and ways of doing inform current monitoring and evaluation initiatives. The authors reinforce this call to action for the development of theories and practical examples that can help anchor evaluation in global good practice informed by African cultures and value systems – and vice versa. This should inspire African evaluators and, perhaps more importantly, those with the power to commission and guide such studies and evaluations. A next edition will be well positioned to display such advances in evaluation on the continent. It is appropriate that the book has been published at a time when evaluation is becoming increasingly prominent. The UN declared 2015 as the Year of Evaluation, and there are calls for a more prominent role for evaluation in the post 2015 global development agenda. All these initiatives – and the information drawn together in this book – should spark the interest of young as well as experienced, smart, ambitious Africans to pursue evaluation as a worthwhile career. The book is also well positioned to help build capacity among prospective and practicing evaluators, as well as among evaluation funders, commissioners and managers in- and outside Africa who want to learn more about monitoring and evaluation in the African context, and who might want to hone their approaches and skills to ensure better quality development through better quality evaluation results. Evaluation – in synergy with monitoring, its sister activity – has to contribute to the empowerment of Africa. Evaluators as well as evaluation funders, commissioners and

xiv

managers have a responsibility to ensure this. Yet, if evaluation is to fulfil its potential and promise to support development in Africa, much still has to be done. May this book inspire us all to help ensure that evaluation takes its rightful place in management systems for the benefit of national development efforts across the continent.

Zenda Ofir International Evaluation Specialist Former AfrEA President Honorary Professor, School for Public Leadership, Stellenbosch University, South Africa. E-mail: [email protected] 21 April 2014

xv

Preface The purpose of this book is to systematically record, analyse and assess for the first time in a single volume the implications of the global development and management of professional evaluation for the African continent. The main focus is on South Africa, with comparisons to similar or different developments in other African countries where systematic, professional policy, programme and project evaluation is the most advanced. The book therefore comprises an exploratory study that does not pretend to be the final word on the topic, but is in fact the start of a process to build up institutional memory about the origins and state of evaluation in Africa against the background of international developments in this field. A secondary aim of the book is to provide some motivation and encouragement as well as core documentation and other tools that might be useful for purposes of evaluation capacity building and institutionalisation to organisations that decide to utilise the potential of systematic evaluation as a strategy to improve their policy, programme or project performance and results. The editors and other contributors are all actively involved in the conduct, management and capacity-building of professional evaluation skills at local, national and international levels, and have over time gained extensive experience on different evaluation-related issues. We decided to produce this book after consultation with a number of different stakeholders in the field, who all seemed to be increasingly concerned about the lack of suitable contextualised capacity building materials to promote the professionalisation, quality and utilisation of evaluation lessons specifically within the South African and African environments. The project soon gained widespread support in professional evaluation practice and academic evaluation circles, and will be the first text of its kind published in Africa. The book comprises a collective effort by the editors, assisted by a number of evaluation specialists that have contributed various inputs on different themes and issues. These specialist contributions ensured that the views of prominent actors in the field were captured and contextualised in the book. At the same time, however, the editors attempted to provide a conceptual coherence and integration of all material based on specific themes that are clustered in the respective chapters. The following clusters of themes are dealt with in the book: • The Context of Evaluation Management • Historical Development and Practice of Evaluation • Theories of Change and Programme Logic • Evaluation Approaches and Models • Evaluation Designs and Methods • Indicators for Evaluation • Institutional Arrangements for M&E • Evaluation Professionalisation and Capacity building • Evaluation in SA and in different African countries • Lessons for Integrated Evaluation Management

xvi

Each of the identified clusters of themes above contains discussions of theoretical issues illustrated with one or more short case studies, while selected longer case studies and other relevant documentation are also taken up in annexures at the end of the book. The book therefore comprises a blind peer reviewed guide to best M&E practices for purposes of systematic policy, programme and project evaluations. It is suitable for both professional M&E instutionalisation and capacity building projects as well as for evaluation information dissemination and education at different levels in the public, private and voluntary sectors in society, especially in a developmental context.

Fanie Cloete, Babette Rabie and Christo de Coning Editors 21 April 2014

xvii

Acknowledgements The Editors wish to acknowledge the substantive support and contributions received during the conceptualisation, writing and compilation of Evaluation Management in South Africa and Africa. In many ways our evaluation networks that contributed to this book should rather be called a Community of Practice, as the development of this book has emerged almost in a natural way through developing partners and good cooperation between ourselves and researchers as well as practitioners in the field in many countries over the last 10 years. The emerging evidence-informed paradigm of policy analysis, management and evaluation has been very influential in directing and shaping our approaches to attempt to improve the processes, content and results of policy decisions aimed at sustainable development, generally in developing countries and especially in Africa. The emerging nature of this paradigm implies that it is not always practised everywhere, and that significant awareness and marketing of the need for more rigorous approaches to policy decision-making, implementation, evaluation and learning therefore have to be undertaken to ensure optimal efficiency and effectiveness of good governance in these settings. The openness and willingness of various African governments, business enterprises and NGOs to adopt and prioritise these perspectives and approaches indicated to us that the time is right if not long overdue for a book like this. These developments in public policy processes led to our increasing involvement in the consolidation and promotion of SAMEA and AfrEA in various ways as well as the publication of three editions of the book Improving Public Policy since 2000 because M&E has become increasingly prominent. This book especially came about through the facilitation of the Executive Capacity-Building Programme on Monitoring and Evaluation in the Public Sector by the School for Public Leadership (SPL) at Stellenbosch University and the graduate Policy Evaluation Degree Programmes at the SPL and the Department of Public Management and Governance at the University of Johannesburg. Through these programmes we have been involved in the training and establishment of M&E capacity and systems in almost all National Government Departments and Provincial Governments in South Africa over the last decade, responsive to the demands of notably the South African public sector where rapid progress in the field of M&E has been apparent, especially through the initiatives of DPME and SAMEA. We have also been able to follow international trends and good practices and have incorporated these important research and practices in Evaluation Management in South Africa and Africa. Our involvement in M&E also empowered us to develop generic frameworks, processes and methods for application in NGO environments. The editors have been involved in the design and establishment of M&E systems in a number of NGOs locally and abroad and this has allowed us to broaden our Community of Practice to a cooperative governance level where the relationship between state and civil society regarding M&E design and practise have shown that intergovernmental as well as society-based approaches have become essential requirements for results-based M&E. Our interaction with evaluation specialists and managers in all fields have been tremendously valuable in shaping the direction and content of this book in our attempt to assist with the creation or improvement of capacity in these circles to empower individuals and organisations to understand and apply

xviii

good governance processes and outcomes more successfully through better monitoring and evaluation. In this process we hope that we have contributed with this book to some extent to clarifying the relatively complicated issues around evaluation as a higher order management function and how to synchronise the theoretical foundations of good monitoring and evaluation with concrete practices and results, especially in developing contexts on the African continent. As the reader will note from our list of contributors and their affiliations, a large number of researchers and practitioners contributed to this initiative. We want to express our gratitude to all of them for their ideas, assistance and sacrifices in producing the necessary results to enable us to achieve the publication of this book. Particular thanks go to Zenda Ofir for her willingness to comment and provide critical ideas on our initial drafts and to write the Foreword. Also to Ian Goldman and his colleagues at DPME for their participation, as well as to the various Board members and other members of SAMEA and AfrEA who had contributed in a number of ways to the end result. Our gratitude is also extended to the international organisations involved in evaluation who allowed us to use some of their case studies and other content on evaluation. A special word of appreciation goes to CLEAR-AA, EVALPARTNERS, UNICEF, UNDP, WB-IEG and DFID in this regard. We also want to thank the anonymous reviewers who undertook a blind peer review of the semi-final version of the manuscript, and whose constructive remarks for further improvement of the text have been accommodated as far as possible in the final version of the book. We further wish to explicitly acknowledge the cooperation of the Department of Public Management and Governance at the University of Johannesburg that allowed Prof Cloete to write, compile and edit the final manuscript during his six months sabbatical leave from January to June 2014, as well as the School of Public Leadership at Stellenbosch University for its logistical management support for this project. A final word of thanks goes to the publishers of the manuscript, SUN MeDIA Stellenbosch whose innovative business model made it relatively easy to market and assess the book both in an e-book format and in a paper version printed on demand across the world and especially for African countries. All the above-mentioned contributions have only improved the book, while the editors take full responsibility for any remaining weaknesses. We found it in many instances for many reasons very difficult to decide what to include or exclude from the text, and in the end we have to acknowledge using our discretion in the interests of space, time and costs, with the explicit understanding that we will revisit these decisions for purposes of a revised edition after feedback from our readers. We would for example like to include in future versions more African evaluation contributors, thought, case studies and results as well as dealing in more detail with appropriate African and other generic evaluation methodologies which we unfortunately could not fit into this first version of the book. However, we trust that the current version of the book will be useful to promote more systematic and rigorous capacity-building for and understanding, practices, management and outcomes of evaluation in African countries, and especially in South Africa.

Fanie Cloete, Babette Rabie and Christo de Coning Editors 22 December 2014

xix

Biographical and Contact Details of Editors Fanie Cloete Fanie Cloete is Emeritus Professor of Policy Management in the Department of Public Management and Governance at the University of Johannesburg. Before this appointment he was an Associate Dean in the Faculty of Economic and Management Sciences at Stellenbosch University, Professor of Public Policy Analysis and Director of the School of Public Management and Planning at that University, where he is still an Extraordinary  Professor. He has had extensive career experience in the South African public sector, inter alia as Chief Director of Legal Administration in the Department of Development Planning, and Director of Constitutional Planning in the Office of the President in Pretoria. He has travelled, studied and published widely on public management issues on every continent. He is an advocate of the SA Supreme Court, a former member of the Presidential Review Commission on the Restructuring of the Public Service in South Africa, the former Chair of the Association of Southern African Schools and Departments of Public Administration and Management (ASSADPAM), of the national SAQA SGB on Public Administration and Management, and of the Western Cape Provincial Demarcation Board. He is also a former Chair of the SA Monitoring and Evaluation Association (SAMEA) as well as of the Board of Trustees of the Centre for Policy Studies (CPS) and a former M&E Curriculum and Training Advisor to PALAMA. He is a general consultant and training specialist in both the public and private sectors on strategic and operational policy monitoring, evaluation and other management issues.

Contact details Department of Public Management and Governance, University of Johannesburg E-mail: [email protected] http://uj.ac.za/publicgov/

xx

Babette Rabie Dr Babette Rabie is a Senior Lecturer in the School of Public Leadership, Stellenbosch University. She specialises in public sector monitoring and evaluation, performance management and policy evaluation. She is involved in numerous formal academic and practical executive training programmes on public sector performance management, including the design of M&E systems, and advanced courses on indicator development and evaluation design and management. Training delegates include parliamentarians, strategic and programme managers and evaluation specialists from all spheres of government, as well as the NGO and private sectors. As part of these training programmes she has reviewed the M&E systems, policies and indicator frameworks from various departments, programmes and units and has suggested changes and improvements to these. She has published several articles on public sector performance management, including an integrated framework for managing performance at local government level, a proposed typology of M&E approaches, an exploration of the emerging M&E policy framework in South Africa, and a framework for evidence-based local economic development policy. The core contribution from her doctoral thesis entitled “Outcome and Output Indicators to measure the success of Local Economic Development Strategies” was presented at the 2010 Conference of the European Evaluation Society and was recognised as one of the three best papers presented at the conference. She served on the Board of Directors of SAMEA (the South African Monitoring and Evaluation Association) for the period 2010-2013 and is the immediate past-Chair of the association (2012-2013). She is the current head of the Masters Programme in Public Administration at the School of Public Leadership at Stellenbosch University.

Contact details School of Public Leadership Belpark Campus, Stellenbosch University PO Box 630, Bellville, 7500, South Africa E-mail: [email protected]

xxi

Christo de Coning Christo de Coning is a Professor Extraordinaire at the School of Public Leadership (SPL), Stellenbosch University. He is a founding Director of the Foundation for Sport, Development and Peace and of the Institute for Sport and Development. He is also a senior researcher and Professor Extraordinaire at the Interdisciplinary Centre for Sport Science and Development (ICESSD) at the University of the Western Cape. He supervises and teaches Masters and PhD students in the fields of public policy, strategy and implementation, M&E, as well as development management at ICESSD and ISD (UWC), at CPUT and at the SPL. He is a former elected Board Member of the South African Monitoring and Evaluation Association (SAMEA) and serves on the Standing Committee with SAMEA & DPME in the SA Presidency. He was previously Professor and the Chair of Policy and Governance at the School of Government (SOG), University of the Western Cape (June 2004 to May 2009). Prior to that, he was a Professor at the Graduate School of Public and Development Management (P&DM, University of the Witwatersrand) from June 1997 to May 2004 and more recently, the CEO of the Elgin Learning Foundation, a Private FET College and a developmental NGO in the Grabouw/Overberg area from June 2009 to April 2011. Prior to Wits, he served in various management and specialist capacities at the Development Bank of Southern Africa (DBSA). As an academic and practitioner in evaluation, he has designed and offered Masters classes at Wits, UCT and UWC and has conducted a number of evaluations. In the more recent past these included the Quinquinial Reviews for the Council for Higher Education (CHE) of the Graduate School of Public and Development Management (P&DM) and the Stellenbosch Graduate School of Public Management and Planning (SOPMP). He has successfully completed a number of Medium Term Reviews including the GTZ PSRP in Mpumalanga, the evaluation of the African Governance Facility (Australian Aid) and acted as the Principal Evaluator for the National Treasury in the assessment of the Infrastructure Development Improvement Programme (IDIP).

Contact details Foundation for Sport, Development and Peace School of Public Leadership, Stellenbosch University PO Box 2440, Durbanville, 7551 E-mail: [email protected]

xxii

Chapter Contributors Akhalwaya, Ismail: Head: Management Performance Assessments in DPME. He started in the field of public works, working on a major programme in South Africa using manual labour to support public works. He joined DPME as the Outcomes Facilitator for the public service outcome, and developed the system for Management Performance Assessments. ([email protected]) Auriacombe, Christelle: Research methodology specialist and former Head of the Department of Public Management and Governance at the University of Johannesburg. She has published extensively in and edited numerous editions of academic journals on a variety of public administration issues. She is the former Editor of Administratio Publica, the journal of the Association of Southern African Schools and Departments of Public Administration and Management (ASSADPAM). ([email protected]) Engela, Ronette: DDG in DPME, currently seconded to the National Treasury where she is working on developing a system of expenditure reviews. She has previously worked at the National Treasury and joined the Presidency in 2006, taking responsibility for M&E. She was co-responsible for initiating the outcomes approach and DPME, as well as the data system that lies behind the outcomes. ([email protected]) Gasa, Nolwazi: DDG responsible for the Outcomes M&E Branch. She started off in research in the health field, moving to the Development Bank of Southern Africa as a health specialist, and into DPME as the Outcomes Facilitator for Health, and later the DDG covering all the outcomes. ([email protected]) Goldman, Ian: Head: Evaluation and Research in DPME responsible for developing the national evaluation system. He has worked with the different levels of government and in the NGO sector, specialising in rural development, decentralisation, local economic development, community-driven development. He has focused on action learning approaches, and from that has taken on evaluation as a key tool for learning in government. ([email protected]) Leon, Bernadette: Head: Front-Line Service Delivery (FSD) Monitoring in DPME, including the Presidential Hotline and Citizen-Based Monitoring. She worked in the National Department of Provincial and Local Government and the National Treasury in the area of urban renewal and township renewal. She joined the DPME in 2011 where she was responsible for establishing the frontline service delivery monitoring systems. ([email protected]) Mketi, Tumi: DDG of the Institutional PM&E branch in DPME. She has previously worked in the Department of Traditional Affairs and before that in the Department of Provincial and Local Government. ([email protected])

xxiii

Mohammed, Hassen: Outcome facilitator in DPME responsible for local government. He worked in the NGO sector as a planner, joining the Presidency as a planning specialist where he was responsible for the long-term spatial planning approach. As well as being responsible for the local government outcome, he is currently developing the system of local government monitoring. ([email protected]) Mouton, Charline: Centre for Research on Evaluation, Science and Technology (CREST), Stellenbosch University. As a former Senior Researcher at Impact Consulting she conducted numerous monitoring and evaluation studies, covering a variety of disciplines. Her MPhil thesis in Social Science Methods comprised a review of the History of Programme Evaluation in South Africa. ([email protected]) Mouton, Johann: Professor in and Director of the Centre for Research on Evaluation, Science and Technology at Stellenbosch University and the DST-NRF Centre of Excellence in Scientometrics and STI Policy. He is a prolific scholar and research manager in the social sciences, including in evaluation. His main research interests are the philosophy and methodology of the social sciences, higher education knowledge production, sociology of science, scientometrics and science policy studies. ([email protected]) Phillips, Sean: Director General of DPME from its inception in April 2010. He started as an engineer, and was head of department in the Limpopo Department of Public Works and COO in the national Department of Public Works. He has also worked as a consultant on improving government performance. ([email protected]). Podems, Donna: Research Fellow at the Centre for Research on Evaluation, Science and Technology at Stellenbosch University, where she teaches in their evaluation programme, and the Director of OtherWISE: Research and Evaluation. She implements evaluations in Africa and Asia for governments, donors, and non-profit organisations. She is a former member of SAMEA and a current member of the board of the AEA. ([email protected])

xxiv

Case Contributors Amoatey, Charles: Management Consultant and Lecturer, Ghana Institute of Management and Public Administration (GIMPA), Accra, Ghana. ([email protected]) Asingwire, Narathius: Director, Socio-economic Data Centre and Senior Lecturer, Department of Social Work and Social Administration, Makerere University (MUK), Kampala, Uganda. ([email protected]) Bamberger, Michael: Independent evaluation consultant and former World Bank evaluation specialist. ([email protected]) Basson, Raymond: Independent evaluation consultant and former Professor of Education, University of the Witwatersrand, Johannesburg, South Africa. ([email protected]) Bencheikh, Ahmed: President: ([email protected])

Moroccan

Evaluation

Association,

Morocco.

Bhikoo, Asgar: Participant in the MPhil in Programme Evaluation, Section of Organisational Psychology, University of Cape Town. ([email protected]) Bisgard, Jennifer: Director: Business Development and New Initiatives, Khulisa Management Services. ([email protected]) Boubacar, AW: Director General of CESAG and Coordinator of Project CLEAR, Senegal. ([email protected]) Buonaguro, Lucilla: Department of Psychology, University of Cape Town (UCT), South Africa. ([email protected]) Byamugisha, Albert: Commissioner and Head of Department – Monitoring and Evaluation in the Uganda Office of the Prime Minister, Kampala, Uganda. (abyamugisha@ gmail.com) Cardin, Fred: Senior Research Governance Specialist at RTI International, international Development Group and former Director of Evaluation, International Development Research Centre. ([email protected]) Chilisa, Bagele: Educational Foundations, University of Botswana, Gaborone. ([email protected]) Cloete, Fanie: Professor of Public Management and Governance, University of Johannesburg, South Africa. ([email protected])

xxv

Coetzee, Marisa: Lecturer: Department of Economics, Stellenbosch University, South Africa. ([email protected]) Diop Samb, Ndeye Fato: Project management consultant, specialising in M&E and women’s entrepreneurship, Senegal. Diop, Maguette: Economist and Planner, Coordinator, PM&E of policies and programmes, National Commission for Population Human Development and the Commission for M&E of Projects and Programmes in the Planning Department. Dakar, Senegal. Djidjoho, Aristide: Ministry of Development, Policy Evaluation and Coordination of Government Action, Benin. El-Kabbag, Nivine: M&E Specialist, UNICEF, Cairo, Egypt. ([email protected]) El-Said, Maha: EREN Deputy Chairperson, Cairo, Egypt. ([email protected]) Gado, Boureima: ReNSE Coordinator and Director of Economic Affairs of NNJC (Nigeria – Niger Joint Commission for Cooperation, Niger. ([email protected]) Goldman, Ian: Deputy Director General & Head: Evaluation and Research in the Department of Performance Monitoring and Evaluation (DPME), Office of the President, South Africa. ([email protected]) Himbara, David: Independent ([email protected])

Development

Strategist,

Johannesburg.

Hirschowitz, Ros: Former Deputy Director General: Statistics South Africa, Pretoria, South Africa. ([email protected]) Hopwood, Ian: Development consultant and Teacher, University of Dakar, Senegal. ([email protected]) Houinsa, David: Ministry of Development, Policy Evaluation and Coordination of Government Action, Benin. ([email protected]) Ijeoma, Edwin: Chair and Head of Department of Public Administration, University of Fort Hare, South Africa. ([email protected]) Kinda, Ousseni: Economist; intern, ENDA, Senegal. Koster, Jan: Advisor to National Treasury & Former Programme Manager: Infrastructure Delivery Improvement Programme (IDIP), Development Bank of Southern Africa, Johannesburg. ([email protected])

xxvi

Kouakou, Samuel: Monitoring and Evaluation Expert at the Directorate of Monitoring and Evaluation, Cote D’Ivoire & Chairman Thematic Group on Agriculture, Food Security, Environment and Sustainable Development, Côte d’Ivoire. (jourdainsion-oint@ yahoo.com) Letsebe, Anne: Former Deputy Director General: Office of the President, Pretoria, South Africa. ([email protected]) Lomeña-Gelis, Monica: M&E Officer at the Regional Office of the UN Capital Development Fund, Senegal. Louw, Johann: Professor in Psychology at the University of Cape Town, South Africa. ([email protected]) Louw-Potgieter, Joha: Director of the Institute for Monitoring and Evaluation and Head of Organisational Psychology at the University of Cape Town (UCT), Cape Town, South Africa. ([email protected]) Machuka, Samson: Director for Monitoring and Evaluation, Ministry of Devolution and Planning, Nairobi, Kenya. ([email protected]) Malunga, Chiku: Director: Capacity Development Consultants (CADECO), Malawi. ([email protected]) Mathe, Jabu: Director for Evaluation and Research, DPME, Office of the President, South Africa. ([email protected]) Mathuva, James Mwanzia: Chief Economist, Ministry of Local Government, Nairobi, Kenya. ([email protected]) Morkel, Candice: Chief Director: M&E in the Office of the Premier, Eastern Cape Provincial Government, South Africa. ([email protected]) Muteti, Francis: Principal Economist, Monitoring and Evaluation Department, Ministry of Devolution and Planning, Nairobi, Kenya. Mutua, Jennifer: ESK Chairperson, Nairobi, Kenya. ([email protected]) Ndiaye, Momar: Senegal. ([email protected]) Ngabina, Guennolet Boumas: M&E Officer at Heifer International, Senegal. Norgah, Samuel: Regional Head of Strategy, Plan International, Eastern & Southern Africa, Nairobi, Kenya.

xxvii

Nyangaga, Julius: Regional Monitoring, Evaluation and Learning Manager, International Institute of Rural Reconstruction (IIRR), Nairobi, Kenya. Okumu, Boscow: Economist, Monitoring and Evaluation Department, Ministry of Devolution and Planning, Nairobi, Kenya. ([email protected]) Rugh, Jim: Coordinator: Evalpartners. ([email protected]) Simwa, Viviene: Senior Communications Officer, Monitoring and Evaluation Department, Ministry of Devolution and Planning, Nairobi, Kenya. ([email protected]) Somé Faye, Soukeynatou: Project Evaluation Manager, Chief of Administration and Finance, the Senegalese Institute of Agricultural Research, Senegal. Sow, Moctar: President of the Association Sénégalaise d’Evaluation (SenEval), Dakar, Senegal. ([email protected]) Stephenson, Zoe: Evaluation Advisor, Department for International Development (DfID),  UK. Traore, Issaka: Senior M&E Specialist, National Democratic Institute for International Affairs (NDI), Burkina Faso. [email protected]. UNDP: United Nations Development Programme. (http://erc.undp.org/ evaluationadmin/manageevaluation/viewevaluationdetail.html?evalid=7086) UNICEF: United Nations Children’s Emergency Fund. (http://www.unicef.org/ evaldatabase/files/Final_Ipelegeng.pdf) Van der Berg, Servaas: Professor of Economics, Stellenbosch University, South Africa. ([email protected]) Vogel, Isabel: Independent Consultant, UK. ([email protected]) Wally, Nermine: Monitoring and Evaluation Officer at World Food Programme, Cairo, Egypt and Past AfrEA President. ([email protected]) World Bank – IEG: World Bank: Independent Evaluation Group, Washington, DC. ([email protected])

xxviii



PART I Conceptual Approaches to Evaluation Project AEA

Cluster Triangulation

Summative Accountability

Evidence-based Attribution Review SAMEA DPME Lessons Monitoring Meta-evaluation Judgement Assumptions APRM Formative QualityActivity Merit Relevance Feedback Target group Management Logframe Appraisal

Evaluation

Values Theory-driven

Performance Evaluation design

Efficiency Governance

Measurement Integrity Effectiveness Base-line Findings

Indicators Policies

Beneficiaries Counterfactual AfrEA

Empowerment Audit Benefits Reliability Stakeholders Feasibility Partners Outcomes Realist

Contribution Evaluability

Priorities

Development

Complexity Sustainability Data Validity

Results Integrity Goal

Inputs Risks

M&E Costs Equity



Chapter 1 The Context of Evaluation Management 1 Babette Rabie and Ian Goldman 1.1 Introduction This book deals with the management of evaluations of public policy, programmes and projects – i.e. interventions carried out by public sector agencies to achieve particular desired objectives – although the book deals with many generic issues and principles that are also largely applicable to evaluations undertaken in the NGO and business sectors. Evaluations involve systematically reflecting on and learning lessons from the nature, processes and consequences of decisions and actions by an organisation in order to improve the performance or results of a particular intervention. Whether the intervention is a policy, programme or project, an evaluation always involves the systematic assessment of the envisaged or implemented response of a decision maker to improve a perceived “problem” or to take advantage of an opportunity. The evaluation could focus on a policy area – what has been decided to do in order to improve a situation that the decision maker wishes to change. An example would be what needs to be done by the government to reduce the negative consequences of smoking, by an NGO to assist refugees to adapt better to their new environment, or by a business enterprise to contribute more effectively to social transformation issues. On the other hand, evaluation can focus on how the intervention is implemented and the processes involved. This may be through a project, e.g. a governmental project to educate the public about the negative consequences of smoking, through financial assistance by the NGO to refugees or through an intensive marketing initiative by the business enterprise of the product it wants to sell. The evaluation can also focus on a coordinated series of projects or activities in the form of a policy programme by the government to supplement education projects with higher prices for tobacco products and restrictions on places where smoking is allowed, or by supplementing the NGO’s financial assistance to refugees with information and training in different skills and languages, and by product improvements and more retail outlets by the business concerned. Programme and project evaluation often focus on how effectively and efficiently resources (inputs) are converted through various processes (activities) into concrete deliverables/results (outputs) and ultimately outcomes. 1 The M&E part of the chapter draws heavily on Rabie (2011) and Rabie & Cloete in Cloete & De Coning (2011). The section on the public sector draws heavily on Goldman (2001) and through the emergent work of South Africa’s Department of Performance Monitoring and Evaluation.

3

The Context of Evaluation Management A third possibility is that evaluation focuses on the consequences of the intervention (the intermediate sectoral outcomes or cross-sectoral impacts). Evaluations may focus on more than one of these elements or on all these stages of the public policy process. The evaluation of development interventions focuses on the complex characteristics and dimensions of development, such as poverty alleviation, globalisation, global warming, inequality, empowerment, equity and dealing with the remnants of war. Development evaluation asks the critical so what and why questions of government programmes and policies, often in the context of weak or corrupted information systems, and strives to improve general performance and results (Morra Imas & Rist, 2009:xv). Therefore policy, programme and project evaluations are strategies to reflect on an intervention, how it was implemented, what the results of specific decisions and actions are, to explain why these results happened, and thus to recommend how such interventions can be strengthened. Evaluation is therefore not an isolated activity but is an integral part of the effective management of organisations, whether in the public, non-governmental, or private sectors. This chapter starts by defining and differentiating monitoring and evaluation (M&E). It then shows how M&E fits into the public policy cycle, as well as the management process. The different types of public sector organisations are described as well as some approaches to strengthening these. Performance management is a key element of such approaches which involves M&E. Some of the challenges of applying M&E in the public sector are discussed and examples provided of how evaluation and M&E more generally can play a key role in strengthening public policy and its implementation.

1.2

What is Monitoring and Evaluation?

Monitoring and evaluation are management activities that are necessary to ensure the achievement of policy goals in the form of concrete results. Systematic planning, design and implementation for the purpose of improving the quality of policy outputs and outcomes will be to no avail if one is unable to assess whether one has hit the intended target; or whether one missed it, by what margin, and why? At the core of M&E is the view that there is tremendous power in the accurate measurement of results (See Box 1.1). Box 1.1: The power in measuring results If you do not measure results, you cannot tell success from failure. If you cannot see success, you cannot reward it. If you cannot reward success, you are probably rewarding failure. If you cannot see success, you cannot learn from it. If you cannot recognize failure, you cannot correct it. If you can demonstrate results, you can win public support. (Osborne & Gaebler in Kusek & Rist, 2004:11)

4

Chapter 1 Monitoring and evaluation may be needed to decide: • which issue among a number of competing issues should be prioritised for attention, or which approach(es) or strategies should be adopted to deal with a particular issue (diagnostic or formative evaluation); • whether the implementation and management of an ongoing intervention is on track, should continue or change direction (ongoing or process monitoring, or process/implementation evaluation); or • what the results or consequences of one or more interventions are, and why (outcome or impact evaluation). The result of the evaluation might be to adopt a different strategy or intervention, to continue with an existing one, or to curtail, expand, change or terminate it. Monitoring is defined by various authors as: • “… a continuous function that uses the systematic collection of data on specified indicators to provide management and the main stakeholders of an ongoing development intervention with indications of the extent of progress and the achievement of objectives and progress in the use of allocated funds” (Organisation for Economic Cooperation and Development (OECD) in Kusek & Rist, 2004:12). • “… the systematic and continuous collecting and analysing of information about the progress of a piece of work over time. It is a tool for identifying strengths and weaknesses in a piece of work and for providing the people responsible for the work with sufficient information to make the right decisions at the right time to improve its quality” (Save the Children, 1995:12). • Monitoring “implies two essential components, (namely) a documented plan, and action consistent with the information contained in the plan” (Owen & Rogers, 1999:24). Monitoring in essence thus tracks progress against an adopted plan to ensure compliance to aspects contained in the plan. • Output monitoring gathers data on service delivery and policy implementation, whereas outcome monitoring gathers and presents data on the value or worth of the intervention (Ho, 2003:68-70). Evaluation in its Latin root valére means “to work out the value” (of something) (Shaw, Greene & Mark, 2006:6). Whereas informal evaluation informs daily decisions on how good or bad, desirable or undesirable something is, formal evaluation reaches its conclusion in a systematic and rigorous way, with appropriate controls to ensure the validity and reliability of the conclusions. Evaluation is defined by various authors as: • “the systematic and objective assessment of an ongoing or completed project, program, or policy, including its design, implementation, and results. The aim is to determine the relevance and fulfilment of objectives, development efficiency, effectiveness, impact, and sustainability” (OECD in Kusek & Rist, 2004:12). • “the systematic assessment of the operation and/or outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy” Weiss (1998:4).

5

The Context of Evaluation Management • Assessing “whether the objectives of the piece of work have been achieved, and whether it has made an impact” (Save the Children, 1995:99). • The investigative discipline “which encompasses consideration of the costs, comparisons, needs, and ethics; political, psychological, legal and presentational dimensions; the design of studies, and a focus on the techniques for supporting and integrating value judgements” (Scriven, 1991:141). • Evaluation “extends beyond the tracking and reporting of programme outcomes into examination of the extent to which and the ways in which outcomes are caused by the programme” (Ho, 2003:70). Evaluation often involves the following stages:2 • Identifying what is/was the intention – the objectives and indicators which were the measure of the policy intent (whether explicit or implicit) and potentially the theory of change by which those were supposed to be achieved; • Gathering data to describe the performance of the intervention being evaluated as well as the outputs, activities undertaken and resources used; • Synthesising the data and integrating it in an analysis of what was achieved and how this compared to the intended objectives and implementation process and then applying a judgement of merit or worth to this; • Analysing why and how the results happened or did not happen, often relating this to problems in the theory of change; • Lastly, developing recommendations about how the intervention could be strengthened (or possibly terminated). If utilisation of evaluation is a key focus, there are also extensive stages pre- and post-evaluation to ensure ownership of the evaluation process, and to ensure that recommendations are followed up. Because of the complementary nature of monitoring and evaluation, they are often portrayed as one concept (“M&E”). However, in practice, the two functions have different objectives and use substantively different methods, and evaluation has more in common with research than with monitoring. The relationship is summarised in Table 1.1. At project or programme level, monitoring is concerned with tracking activities and outputs, while at policy level it is more concerned with establishing progress in realising the intended outcomes of the policy (outcome monitoring). Evaluation is the process by which defensible, evidence-based judgements are presented for real-life questions through value clarification using applied social research. Evaluation may focus on the situation prior to designing an intervention, ongoing or completed projects, programmes, or policies, and tries to provide answers to specific questions in a 2 This differs according to whether the evaluation is being used for a diagnostic prior to an intervention or during implementation, or to look at achievement after the intervention – see DPME 2011. Also see Scriven 1991, Rossi, Freeman & Lipsey 2004:16, 17, 70, 174, Fournier in Owen 2006:9 for different, early, alternative conceptualisations of the main stages of evaluation.

6

Chapter 1 systematic and objective way. The evaluation may focus on the design (the original plan), the implementation processes, the results (outputs and outcomes) of the intervention, or aspects such as value for money. Table 1.1: The relationship between Monitoring and Evaluation Monitoring

Evaluation

Routinely collects data on indicators, targets and actual results.

Interprets the collected data and draws conclusions about the findings regarding linkages between targets and actual results.

Systematises, classifies, validates and stores data.

Processes, mines and refines the stored data to extract the most accurate information needed to identify relevant issues or to fill gaps in the data.

Reports comparisons, differences and similarities between comparable earlier and later datasets in formats that can indicate changes over time.

Interprets the data, assesses and makes value judgments about the extent of progress or lack thereof, and to what extent the results are good or bad. Analyses how the results were achieved, how this compares to the intended objectives and process, the causal linkages, and makes recommendations for the next steps (termination, adaptation, etc.) and how the intervention can be strengthened. (Table adapted from Kusek and Rist, 2004:14)

Therefore monitoring is generally an ongoing activity that tracks progress on a particular intervention, while evaluation analyses and interprets what has been achieved and why, and makes recommendations for changes (see also Guijt, 2008 and Kessler & Tanburn, 2014 for separate in-depth analyses of the nature and use of good monitoring practices for evaluation). Systematic monitoring can take place without evaluation, but the reverse is more difficult. In some cases evaluation is also used to analyse an existing situation – what is happening and why – and to suggest policy options to intervene (sometimes called ex-ante or diagnostic evaluation). In this case it precedes intervention design or redesign. Over time, phrases like policy evaluation and impact assessment have developed as standardised phrases to denote closely related phenomena. The basic terms assessment and evaluation will be used throughout this book as synonyms.

1.3

Monitoring and Evaluation as Integral Parts of the Public Policy and Programme Life Cycle

De Coning and Wissink in Cloete and De Coning (2011:7) define public policy as “a government’s program of action to give effect to selected normative and empirical goals in order to address perceived problems and needs in society in a specific way, and therefore achieve desired changes in that society”. A simpler definition is that policy is “a statement of intent” from a decision maker to do something or not to do anything (De Coning & Wissink in Cloete & De Coning, 2011:4). Within this context, public policy refers to both formally adopted legislation, white papers and other policy statements, as well as their resulting strategies, programmes and action plans.

7

The Context of Evaluation Management Smith in Owen and Rogers (1999:24) defines a programme as “… a set of planned activities directed toward bringing about specified change(s) in an identified and identifiable target audience”. De Coning and Gunther in Cloete & De Coning (2011:173) define programme management as the “management of the relationship between … a portfolio of related projects … in order to achieve programme objectives and outcomes”. Projects may be defined as “a temporary endeavour in which human (or machine) material and financial resources are organised in a novel way, to undertake an unique scope of work, of given specification, within the constraints of cost and time so as to deliver beneficial change defined by quantitative and qualitative objectives” (Burke & Turner in Cloete & De Coning, 2011:178). Policies, programmes and projects have a generic life cycle comprising of the following largely linear and consecutive, logical steps: • Identification of a problem or opportunity that needs government attention; • An assessment of the problem/opportunity to determine its nature and root causes and the level of prioritisation that it deserves in comparison with other issues; • Breaking down the problem or opportunity into its constituent components; • Identifying and assessing alternative, competing or supplementary policy responses to reduce, regulate, improve or resolve the current situation; • Adopting the most feasible policy strategy that maximises strategic benefits and minimises costs in order to achieve the goals of the intervention concerned; • Operational planning for implementation of the adopted strategy in the most costeffective manner; • Implementation, including tracking physical progress and resource use; • Learning deeper lessons through systematic evaluation of the current situation with the intervention and its environment; • Based on the lessons, revising the policy response, whether of the intervention or its  environment. This policy life cycle is depicted graphically in Figure 1.1. While the first wave of evaluation theory focused only on the final results from an intervention, evaluation theory quickly expanded to gather and interpret data throughout the life cycle of the intervention (Rist in Boyle & Lemaire, 1999:4). During the formulation phase, the problem and previous solutions are analysed and information is evaluated to inform the design of the policy, programme or project (Boyle & Lemaire, 1999:117-119). In the implementation phase, to optimise results from the intervention, an evaluation can track how effectively and efficiently activities and outputs are implemented and resources used (Boyle & Lemaire, 1999:120). Finally, the evaluation may assess the anticipated and unanticipated outcomes of the intervention (Boyle & Lemaire, 1999:122-123).3

3 In South Africa the National Evaluation Policy refers to diagnostic, design, implementation and impact evaluation (DPME, 2011).

8

Chapter 1 Figure 1.1: The policy life cycle Review for improvement

Problem structuring and causal linkages

Problem

Policy options generation

Summative evaluation

Implementation (Ongoing/ process evaluation)

Feasibility assessment (formative/diagnostic evaluation)

Policy decision

(Source: Adapted from Parsons, 1995:77)

Figure 1.2 depicts a typical programme or project life cycle. Figure 1.2: Rakoena’s typical programme/project controlling process

Issue fresh instructions

Monitor implementation progress against plan

Execute the tasks

Plan and assign tasks

Decide on corrective action(s)

Identify variances in performance

Then we are home and dry!

Tasks successfully achieved

(Source: Rakoena, 2007)

An adapted version of O’Sullivan’s (2004:3) summary of the different foci of evaluation is presented in Table 1.2. Table 1.2: Possible evaluation activities during the life of a programme Programme Conceptualisation ▪▪ Reviewing relevant literature about the  situation ▪▪ Assessing needs ▪▪ Identifying root causes ▪▪ Identifying options ▪▪ Analysing cost effectiveness of options

Programme Planning

Programme Implementation

Programme Completion

▪▪ Developing theory of  change ▪▪ Developing objectives, indicators and targets ▪▪ Developing an operational plan including personnel, timelines, costs, etc. ▪▪ Identifying delivery and procurement alternatives

▪▪ Monitoring programme activities ▪▪ Developing databases ▪▪ Assessing programme functioning and identifying areas for improvement ▪▪ Determining shortterm  outcomes

▪▪ Assessing long-term impact ▪▪ Determining programme strengths and weaknesses ▪▪ Identifying areas for subsequent improvement ▪▪ Assessing cost effectiveness

(Source: Adapted from O’Sullivan, 2004:3)

The next question is how monitoring and evaluation fit into the policy management  process.

9

The Context of Evaluation Management

1.4

Monitoring and Evaluation as Integral Functions of Management

Management is normally defined as the process of getting things done, effectively (attaining the goals through the right actions) and efficiently (the cost-effectiveness of the goal attainment), through and with other people (Robbins & Decenzo, 2001:5). In this definition, management processes comprise of the following activities (Robbins & Decenzo, 2001:6-7): • Planning: formulating organisational goals and designing a strategy to achieve the  goals; • Organising: determining the tasks to be done and by whom; • Leading: including the motivation, directing and instructing of personnel actions through communication and conflict resolution; • Controlling: monitoring performance, comparing it with goals and correcting deviations. Thompson and Strickland (1998:3) regard strategic management as the process of (1) forming a strategic vision; (2) setting objectives; (3) designing a strategy to achieve the desired outcomes; (4) implementing the chosen strategy; and (5) evaluating performance and taking necessary corrective actions. The process of setting objectives entails converting the strategic vision into specific performance targets against which the performance of the organisation can be measured. A strategy consists of specific actions and targets that will constitute progress towards the objectives. During implementation, internal progress and external changes are monitored constantly and changes are made to the strategy in terms of the gathered information to ensure ultimate performance and goal attainment (Thompson & Strickland, 1998:4-16). From the above management tasks the public manager’s role may be summarised as instilling and establishing strategies that will enable the organisation to improve its performance and thereby achieve its stated goals, perform its mission efficiently and make progress towards its vision. For this purpose, the manager requires a constant stream of upto-date, reliable performance information to enable informed decisions on how to maintain or improve performance. Monitoring of government performance through a system of key performance indicators, targets, tools and techniques is a management function that enables and assists the public manager to perform the more basic management tasks. Evaluation allows the manager to move to a deeper level of understanding and is a more advanced management function, requiring more sophisticated skills. The utilisation of evaluation findings enables the manager to plan, lead, organise and control better and thereby improve the organisation’s performance in terms of its adopted vision, mission, stated goals and objectives. One may thus conclude that monitoring and evaluation can be described as a management function that enables managers to perform their core management functions better. While monitoring is part of basic management of implementation, evaluation is a reflective function, distinguished from the planning and implementation functions of  management. Table 1.3 systematically links the main strategic planning and management concepts of a vision, mission, programmes, projects, activities and resources to deal with problems, to the policy management concepts of policy, strategy, business plan, process plan and resource plan, and to the programme logic concepts of impacts, outcomes, outputs, resource conversion activities and inputs. The table also indicates how typical evaluation

10

Chapter 1 foci, indicators and time frames can systematically be linked to the above processes in order to clarify frequent confusion among these concepts and processes and explain the logical relationships among them. Table 1.3: Relationships among selected evaluation-related concepts and processes Strategic planning level

Policy planning focus

Programme level

Evaluation focus

Evaluation indicator area

Time frame

Vision

Policy (What to do?)

Intangible impacts/ outcomes

Final multisectoral/ integrated goals/ consequences

Empowerment, growth, equity, redistribution, democracy, stability, sustainability, etc.

Long term

Mission

Strategies (How to do it?)

Intangible outcomes & concrete progress

Intermediate Sector-specific results/goal achievement

Improved education, health, economy, community, politics, culture, environment, etc.

Medium term

Programmes/ Projects

Operational plan (What deliverables?)

Concrete Outputs: Deliverables/ Products/ milestones

Quantity, Diversity, Quality

More & better ranges of houses, pass rates, clinics, roads, technology, harvests, jobs, electricity

Short to medium term

Activities

Process plan (what, who, when, how?)

Resource conversion processes

Relevant, appropriate levels of work

Efficiency, effectiveness productivity, scheduling, participation, timeliness, costs, benefits

Short to medium term

Problems & Resources

Required resources for problem (with what?)

Inputs: Concrete ingredients to address problem

Optimal use of resources

Availability, feasibility, risk, adequacy of funds, people, supplies, time, priorities, information

Short to medium term

(Source: Rabie & Cloete in Cloete & De Coning, 2011:208)

Crucial to a correct understanding of the different types of evaluation and how they are integrally linked to normal policy management activities, is the realisation that an organisational vision is in essence a transformative long-term, integrated multi-sectoral impact that is envisaged in the organisation’s environment through specific strategic and tactical policy interventions – a theory of change of some sort whereby the organisation will achieve its objectives. Similarly mission statements can be equated to sector-specific implementation strategies that are intended to result in intermediate sectoral outcomes, changing that specific sector’s direction towards the longer term transformation envisaged in the vision statement. Programmes and projects are operational mechanisms to provide concrete short to medium term deliverables as outputs, to achieve the outcomes envisaged by the organisation. They are the milestones, outputs or objectives achieved in project management terminology towards realising the programme or project goals. Activities comprise the resource conversion processes to achieve the outputs, while the resources available to the programme or project are the inputs. The assumptions are critical in suggesting which external factors may or may not affect the operation of the intervention. All of this comprises a theory of change of how the intervention is planned to work, and

11

The Context of Evaluation Management to convert resources and action into results. Indicators provide measures for each of these stages or levels of conceptualisation and link these processes and actions to the final results. Some of the challenges in the public sector include that theories of change are frequently: • not well articulated, • of high level outcomes at a government-wide level, • outcomes of specific departments, • their higher or lower level outputs, • the equivalence of programmes to these outputs, and • projects as components of these outputs. These problems emerge when trying to do evaluations, where the interventions and their linkages may be poorly described (DPME, 2013).

1.5

Managing the Public Sector

Managing in a public sector context presents additional challenges given competing views as to how government can be made effective and efficient. It is important to understand that the public sector is not monolithic, that there are many different types of organisations within it, with different organisation cultures, and therefore the application of M&E differs. This brings challenges in trying to develop national M&E systems which have to apply across these different types of organisations. This section outlines some of these different types of organisations and Table 1.6 indicates how M&E would be applied in these different cases (Goldman, 2001). Formal bureaucracy developed in ancient times in countries like Sumer, ancient Egypt and China. The archetypal bureaucracy is often referred to as the rational model (after Weber) which developed in Europe from the 18th century to overcome the corruption, nepotism, unequal access and lack of accountability of emerging European states, and to organise the administration to mass produce services (Wooldridge and Cranko, 1997). Most state bureaucracies have been modelled on this model, based on a belief in stability, permanence and consensus. However, this type of bureaucracy is unable to respond quickly and effectively to rapid changes in the environment, such as those which have occurred in developing countries. Mintzberg (1996) uses the following categorisation of government: • Government as a machine: The keywords in this type of system are rules, regulations and standards. • Government as a network: The keywords are flexibility, connections, communication and collaboration. There is a loose, interactive, complex network of temporary relationships, as happens with projects. • Virtual government model: The keywords are privatise, contract and negotiate. At the extreme the assumption in this model is that the best government is no government. • Normative control model: The keywords are values, service and dedication. • Performance control model: The keywords are define, measure and control. The performance control view tends to be the most conventional Western approach to

12

Chapter 1 reform and is widespread in the public services in many former colonies, including South Africa and other African states. The first model of government as a machine has been discredited. Government is not a closed clockwork system like a machine, but a dynamic open-ended system continuously under the influence of its ever-changing environment, and having to adapt perpetually to these changes in order to try to find a new congruence of equilibrium between the system and its dynamic environment. Government as a network can be seen in economic development consortia or research and development organisations. It can also happen where political connections dominate decision-making and so the network becomes more important than the formal, rational  process. In the virtual government model, line departments are dismantled, delayered, management given more control over staffing and the purchaser/provider model introduced to separate the enabling role from direct provision of services, with direct provision outsourced wherever possible. The difficulty of this approach is that it assumes a range of skills and capacities in government which are not necessarily in place in developing countries (McLennan, 1997). The normative model emphasises the concept of service muting the effects of bureaucracy, as in the caring professions, but which has increasingly been lost as the performance control and virtual models have intruded. According to Mintzberg some characteristics of the normative model are that people are chosen by values and attitudes and not just credentials; guidance is by accepted principles and values rather than imposed plans or targets; all members share responsibility; members feel trusted and supported; inspiration replaces so-called empowerment and performance is judged by experienced people who sit on representative boards. The performance control model aims to make government a functional business structure, with the vision communicated and implemented lower down the structure, so that the strategic apex (superstructure) plans and controls while the operating divisions (microstructures) execute. This approach often reinforces the machine model and tightening of control (performance management), which often comes at the expense of creativity. There is increased professionalisation, and the view that discretion is best safeguarded by professional “impartial” experts, who are neutral and capable of value-free judgements. It also entails a sharp divide in the perceived role of administration and management. It gives preference to management techniques like “management by objectives” and “value for money” considerations. The danger with this model is that it can provide a barrier to democratisation and the manager can be distanced from his/her clients and the outside world. It also raises the issue whether managerialism is decreasing accountability, as officials are empowered to make decisions rather than politicians or clients (Isaac-Henry, 1993). Such approaches tend to emphasise administrative rather than political roles, and have led to the “unbalanced growth of bureaucratic power at the expense of extra-bureaucratic capacity to monitor and control public bureaucracy” (Riggs, 1990).

13

The Context of Evaluation Management The conclusion of Mintzberg (1996) as well as of Peters and Savoie (1994) is that the rational performance/machine models are fine for routine services such as the issuing of passports, but less successful in those parts which are complex with many goals and activities and high policy content in their work. The network model is useful in high technology research, but can also be restrictive, for example as in France, where elites in different spheres form a controlling “old boy” network. A major shift to the normative model is needed within client-oriented professional services such as health and education. In practice, one may predict a more gradual shift characterised by a mix of performance control and normative commitments.

1.6

Approaches to Public Sector Reform  

1.6.1 New Public Management In pursuance of the aim of better government performance, various approaches have advocated how this may be achieved in practice. Financial, accountability and government reform pressures in the 1980s gave rise to the “New Public Management” paradigm. “The new public management actively emphasizes the significance of performance measurement as a management tool in government” (OECD in Bouckaert & Van Dooren, 2003:127). “NPM, compared to other public management theories, is more oriented towards outcomes and efficiency through better management of public budget. It is considered to be achieved by applying competition, as it is known in the private sector, to organisations in the public sector, emphasising economic and leadership principles. New Public Management regards the beneficiaries of public services much like customers, with citizens becoming the shareholders” (Wikipedia, 2009: “New Public Management”). NPM has been criticised severely in subsequent years as insufficiently addressing the complexity of “wicked problems”, delimiting the citizens’ role as too thin and consumerist, and unable to align strategies and policies between agencies and between sectors and not just to internal objectives (Bovaird & Loffler, 2003:315). However, the aim of New Public Management to reform or modernise public sector practices to be more efficient, cost-effective or “market orientated” has become widespread. To enable the implementation of management instruments aimed at increasing organisational performance as implied by the NPM approach, accurate performance information is required (Hatry in Bouckaert & Van Dooren, 2003:127-128). Chapter 9 below looks at the evolution of M&E in South Africa and in other African states. The South African experience highlights the directions in which the public sector paradigms developed, ranging from the Treasury’s approaches focusing on efficiency and effectiveness, to the White Paper on Transforming Public Service Delivery. These “tools” are, broadly, the tools of the “new public service management” (SA-DPSA, 1997). In essence these are: • Assignment to individual managers of responsibility for delivering specific results for a specified level of resources and for obtaining value for money in the use of those  resources; • Individual responsibility for results matched with managerial authority for decisions about how resources should be used;

14

Chapter 1 • Delegation of managerial responsibility and authority to the lowest possible level; and transparency about the results achieved and resources consumed.

1.6.2 Evidence-based Policy-making and Management In the 1990s evidence-based policy and practice entered the lexicon, arising largely from experience in the health sector. Evidence-based policy making (EBPM) can be defined as “a process that assists policy makers to make better decisions and achieve better outcomes. It is concerned with using existing evidence more effectively, commissioning new research and evaluation to fill gaps in the evidence base, and assisting the integration of sound evidence with decision makers’ knowledge, skills, experience, expertise and judgement” (PSPPD, 2011). Research and evaluation play a key part in this. Hard, factual evidence is used for improvement purposes by identifying “what works” and which interventions and strategies should be used to meet specified (policy) goals, and identifying client needs (Davies in Boaz & Nutley, 2003:226). Evidence is used to improve the design, implementation and impact of an (policy) intervention, and to identify new strategic directions for the organisation (Boaz & Nutley, 2003:226). As such, Nutley refers to the evidence-based problem solver who uses evidence to solve day-to-day problems and the reflective practitioner who uses monitoring data to provide strategic direction for the future (Nutley in Boaz & Nutley, 2003:231). Segone and Davies identify an emerging shift from “opinion-based” policy practice that relied either on selective evidence or on untested views of individuals or groups, through “evidence-influenced” to “evidence-based” policy practices that place the best available evidence from research at the heart of policy development and implementation (Segone, 2009:17; Davies, Newcomer & Soydan, 2006:175; Davies, 2008:3). Segone attributes this shift to a movement towards more transparent governance and better technical capacity to produce quality, trustworthy evidence (Segone, 2009:18). This paradigm shift has also been driven by international initiatives like the Millennium Development Goals, World Bank initiatives, and donor funding that emphasise the need to evaluate the success of public policies and programmes in order to protect donors’ investments in these programmes (Kusek & Rist, 2004: 3-11; Valadez & Bamberger, 1994:5-7). Internal fiscal constraints, pressures for public accountability and the failure of past programmes to produce results also led to pressures for systematic evaluation to ensure that resources are allocated to the most pressing policy problems and the most effective and efficient programmes and projects to address those problems (Rossi & Freeman, 2004:15; Boyle & Lemaire, 1999:3 & 181). The logic is that the continuous generation of quality, trustworthy and timeous evidence that can inform policy and management decisions will prevent decision makers from using unreliable information because credible information is not available (Segone, 2009:19). In practice many sources of evidence are used by policy makers, ranging from evaluation and research in the natural, social and economic sciences, evidence from thinktanks, professional associations, lobbyists, pressure groups, ideologues, belief-based organisations, scientific media, general media and the internet (PSPPD, 2011).

15

The Context of Evaluation Management From a bureaucratic point of view, evidence “inclusion” or “exclusion” is often based on the content expertise of the policy maker, and the active policy area interests of senior managers. In many instances, when research is undertaken within a government, the information remains in silos and is not coordinated within or across departments. Even where evidence is produced, the evidence is sometimes ignored as it does not suggest a clear implementation path, the ideological position of the researcher is not acceptable, or the evidence does not fit with preconceived positions (sometimes called “policy-based evidence making”) (PSPPD, 2011). This has historically been the case with evaluations, with rare recent examples such as Mexico, Colombia and South Africa where repositories of evaluations have been established to provide access to this evidence. In practice policy makers use a range of sources in making a decision, ranging from “rigorous” evidence such as research or evaluation, values, their own experience and judgement, as well as resource constraints. However, evaluation has a key potential role to play in systematising evidence generation and use in decision-making (e.g. see DPME, 2011).

1.6.3 Performance Management Monitoring and evaluation may be conceptualised and applied as part of the broader term “performance management”. Venter (1998:45) regards the underlying philosophy of performance management as striving toward maximised (policy or programme) performance through continuous measurement against clearly defined and agreed upon standards. As such, policies and their supportive programmes and projects become the basis for the development of performance management strategies which outline “... the interrelated processes which ensure that all the activities and people in a (local) government contribute as effectively as possible to its objectives, (and which systematically reviews the activities and objectives) in a way which enables a (local) authority to learn and thereby improve its services [and results] to the community” (Rogers, 1999:9). Performance management “... may be narrowly viewed as a set of tools and techniques which can be used by managers and politicians to manage the performance within their own organisations, or it can be viewed more widely as a pattern of thinking that results from a wide-ranging set of changing political, economic, social and ethical pressures that have impacted on (local) authorities in ways that are more extensive than simply the deployment of specific techniques” (Rogers, 1999:2). Performance management has a simple basis – that an “organisation formulates its envisaged performance and indicates how this performance may be measured by defining performance indicators. Once the organisation has performed its tasks, it may be shown whether the envisaged performance was achieved and at what cost” (De Bruijn, 2007:7). De Bruijn (2007) presents a rich analysis of performance management and how to apply it in a nuanced manner in different environments from which this section draws. Table 1.4 summarises the benefits suggested for performance management/M&E and the potential perverse effects. Although the focus in this chapter is mainly on the public sector, performance management is a process that that applies across sectoral boundaries in any organisation. The principles of and approaches to performance management are generic across sectors and only differ among sectors according the different regulatory environments and contexts within which

16

Chapter 1 these practices occur (see inter alia Blanchard, Robinson & Robinson, 2002; Chevalier, 2007; Hale, 1998; International Society for Performance Improvement, 2010; Robinson & Robinson, 2008; Mager & Pipe, 1970). Table 1.4: Benefits and perverse effects of performance management/M&E Benefits

Perverse effects

Leads to accountability and transparency on results which can be an incentive for performance and productivity.

Can be an incentive for focusing on targets against professional judgement  (gaming). Can lead to punishment of performance, e.g. good performance leads to higher targets, or efficiency leads to being given a lower budget.

Rewards performance and prevents bureaucracy by focusing on outputs and outcomes, not inputs and  activities.

Drives out the professional attitude leading to lower quality, focuses on elements rather than overall responsibility for the system, more bureaucracy.

Promotes learning between and within organisations by giving feedback on performance.

Blocks ambition – can mean managing inputs to improve results (e.g. persuading marginal grade 11 students to drop out rather than take matric  exams). Can block innovation by focusing on outputs for which there are rewards and promoting copying, not learning.

Enhances knowledge and in turn intelligence.

Veils actual performance, and the nuances in the performance, especially where information is aggregated and sight is lost of the causal links on  performance. (De Bruijn, 2007:9-28)

NPM models of public sector management have drawn from private sector management and tend to assume that you can isolate activities from direct authority, that all services can be measured, and that the best guardian is the professional manager rather than the empowered citizen. The challenge is that performance of public sector organisations has to take multiple values into account, is usually achieved through co-production with others and so attributing contribution can be difficult, and often the period between an intervention and its final effect may be lengthy. There is therefore an ongoing challenge for the public sector of it being easy to measure and report on activities and outputs which government controls, but much more difficult to measure outcomes which may have many influences, but are ultimately much more important in terms of the experience of the clients of public services (De Bruijn, 2007). For example, it is easy for the government to monitor the number of patients seen, but much more difficult to monitor and report on the quality of health care provided, or the impacts on health outcomes. This results in a tendency for government to focus on activities and outputs, rather than the impacts on society which citizens and politicians expect. For this reason there has also been a move in some countries such as the UK, South Africa, Malaysia and Indonesia to move the government to focus on high-level priority outcomes rather than lower-level activities, implying a significant culture change as is explained in  Chapter  9.

17

The Context of Evaluation Management Table 1.5 summarises the conditions which make performance management/M&E easier or more difficult. Clearly performance management is easier in simpler systems such as producing identity documents, or installing reticulated water where a machine or performance control model may work, rather than complex or dynamic sectors such as health care, where a normative model based on values is ideal. Table 1.5: Conditions under which performance management/M&E is possible and more challenging Performance management easy

Performance management more difficult

Products are single value.

Products are multiple values (they have to take differing values into account).

An organisation is product-oriented (e.g. ID books).

An organisation is process-oriented (e.g. research).

Production is autonomous.

Co-production – products are generated with others – so attribution is difficult.

Products are isolated.

Products are interwoven (but performance management can reinforce silos).

Causalities are simple and known.

Causalities are complex/unknown.

Quality is definable in performance indicators.

Quality is not definable in performance indicators.

Products are uniform.

Products are varied.

Environment is stable.

Environment is dynamic. (Adapted from De Bruijn, 2007:14)

The discussion so far has highlighted the fact that there are a wide variety of sometimes overlapping approaches to managing the public sector, and that it is frequently not possible to use a single system when there is such a wide variety in the complexity of tasks, degrees of professional expertise, attitudes of staff, etc. Table 1.6 shows how core M&E concepts could be applicable in the different governmental models referred to above, and the differences in application among these different contexts. Underlying these models lie different beliefs about what motivates people, and how they work best in different situations. Malcolm Knowles (1980) undertook research in adult learning theory and individual development stages, where he asserted that learning and growth are based on changes in self-concept, experience, readiness to learn, and orientation to learning. There is also a feedback loop – the model of government and leadership applied creates a particular attitude by staff, e.g. one of compliance, or one where M&E is internalised to improve performance. The situational leadership approach suggests that effective leadership and management is contingent and task-relevant, and the most successful leaders are those that adapt their leadership style (Hersey, Blanchard & Johnson, 2007): • To the maturity of the individual or group they are attempting to lead or influence (their willingness to take responsibility for the task, and relevant education and/or experience of an individual or a group for the task); • To the task, job or function that needs to be accomplished, and the degree of repetitiveness, complexity and degree of context specificity of the task.

18

Chapter 1 Table 1.6: Examples in South Africa of the use of M&E in Mintzberg’s different models of government Element

Machine mode

Performance control mode

Virtual government

Network

Normative mode

Key phrase

People must clock in.

People must know targets, manage by objectives and monitor against the targets.

Government must outsource and control service quality, using a purchaser/ provider model.

Solutions emerge and M&E must stimulate interaction and rapid learning.

To provide the best service to citizens we need to understand how we are doing and learn how to serve them more effectively.

Example in SA

ID/passport office.

Annual Performance Plan and performance management system. Norm in government.

Relationship between Dept. of Transport and South African National Roads Agency (SANRAL).

Operation of teams working on R&D or IT, units within DPME.

Default system for doctors, and in good schools for teachers.

Decisions

Made by those on top.

Politicians decide on outcomes. Implementation is best safeguarded by professional “impartial” experts, who are neutral and capable of value-free judgements.

The team is involved in action-learning, makes frequent decisions, while operating within a wider context where high level decisions may be taken by others. A lot of discretion is left to the team.

A lot of discretion is left to the team or the professional’s judgement. In the optimal versions there is ongoing actionlearning and development.

A lot of discretion is left to the professional’s judgement. In the optimal versions there is ongoing actionlearning and development. In a closed shop version, professionals are exclusive and refuse to be challenged.

Approach to monitoring

People must be constantly monitored otherwise they will not do what they are supposed to do.

We must monitor regularly to see that we are going in the right direction, and that people are performing as they have been asked to do.

We must monitor regularly to see that our service providers are performing as per contract.

We need to monitor on a very short cycle and adapt immediately based on emerging findings.

We need to monitor our clients regularly and act immediately based on the findings.

19

The Context of Evaluation Management Element

Machine mode

Performance control mode

Virtual government

Network

Normative mode

The role of evaluations

To understand if management has structured systems appropriately and whether the system is value for money.

To understand if we are achieving the impacts we want (measure performance), if the logic chain is working as expected and we are implementing appropriately, or if we are being cost-effective.

To understand if we are achieving impacts, if the logic chain is working as expected, if services providers are implementing as expected, and costeffectiveness.

More focus on learning. To understand if we asked the right questions and approached the problem appropriately. May need to be frequent and rapid.

More focus on learning. Don’t evaluate my performance as what I do is not standard. Help me to understand what is working and not working and why.

(Goldman, adapted from Mintzberg, 1996)

The way one uses M&E needs to respond to the maturity of the staff and the task at hand. A key question is whether staff can be responsible for their behaviour, in which case a normative model can be applied. If there is no responsibility, then a more controlling mode may be needed, at least until maturity has developed. How can one tap into the inner motivation of the teacher not turning up for school? What tools does the head teacher need to motivate and hold their teachers to account? What about managing junior and mature doctors? How can the M&E system show this flexibility, so staff is held accountable, but not be dogged with bureaucratic reporting? Even in the case of installing water systems, if the operatives are driven by a sense of professional pride, the quality of the work is likely to be much better than purely striving to achieve a target. These requirements point to the tensions in applying a single system across such a complex organism such as the public sector, from doctors to issuing passports, and the need for flexibility in management styles. As Jack (n.d.:1) concludes “... first generation performance measurement frameworks (don’t) address the full range of criteria important for performance measurement success. Users of these frameworks often become model bound and lose sight of the measures that are most important for the organisation and the associated stakeholders”.

1.7 Evaluation for Effective Management Improvement Cloete concludes that “the overall goal with assessment of public management performance is to determine whether both the end products and the processes through which they came about comply with the required or preferred standards set for them” (Rabie & Cloete in Cloete & De Coning, 2011:197). It is also important how the information can be used to improve performance, accountability, decision-making and widen the knowledge base (DPME, 2011). Figure 1.3 outlines how M&E tools can be used at different stages of the policy cycle, using a model from training for Directors-General in South Africa. However it is important not just to see M&E as having instrumental use – i.e. specific tools and analysis generating specific recommendations which are implemented or not. They may also have symbolic use (e.g. demonstrating the interest of the Cabinet in the topic) or legitimate (justification)

20

Chapter 1 use that demonstrates to “taxpayer and business community whether the programme is a costeffective use of public funds” (Owen, 2006:106; OECD, 2007:12). Figure 1.3: M&E dimensions at different stages of the policy/programme cycle

Options for addressing the problem

Theory of change

Understanding the root causes

Design

PLANNING Policy/ Programme planning and budgeting

DIAGNOSING Analysis of the problem and options What is known about the problem

Operational plan and resourcing

Document, evaluate, reflect & learn

What is the change - desired and undesired?

OUTCOME & IMPACT

OUTPUT Implementation and monitoring

Implementing the plan

Monitoring the plan, environment and budget

Value for money?

Are planned outcomes being achieved?

Review, refine and continue

(DPME, 2014)

So the challenge in applying performance management/M&E is how to apply it systemically across the public sector in such a way that the benefits are realised and perverse effects are minimised. De Bruijn (2007:34) suggests that four laws apply (see Table 1.7). This table also suggests how these impacts can be reduced, including lowering the stakes from M&E results to reduce gaming, the importance of training, keeping systems simple and under review, not having too many targets, and above all, getting people to see the benefits for them, so they are internally motivated. De Bruijn (2007:54-105) stresses the importance of process-oriented performance management, rather than one focused exclusively on the products. The former leads to better dialogue, a real time picture and strong accent on learning and trust. Ideally, evaluation should be systematically planned for and incorporated into the initial policy or programme design, with the possible elements shown in Figure 1.3. The different

21

The Context of Evaluation Management types of evaluation that are envisaged should be identified and designed appropriately at an early stage of the policy project or programme, and implemented accordingly (Shafritz, 1998:820). This will ensure systematic and effective data generation and the presence of suitable monitoring resources, systems and procedures, which will inform eventual evaluation exercises. This will also ensure the availability of sufficient financial resources to conduct the evaluations, because these will form part of the approved budget for the policy project or programme. It is important to realise that evaluations are not always appropriate, because they can be complex tasks consuming substantial resources, and sometimes are not worthwhile. The results of the policy project or programme may be clear without additional research, or the costs of the evaluation may be disproportionate to its projected results. This is why the merits of the evaluation must be determined before the decision is taken to proceed with it (Shafritz, 1998:820). Table 1.7: Mitigation measures needed to manage possible unintended negative consequences of M&E systems (adapted from DPME, 2014) “Law” proposed by De Bruijn (2007)

Proposed mitigation measures

Law of Decreasing Effectiveness – if the impact of monitoring information is very high (e.g. results in large bonus/ humiliation) then there may be strong incentives for gaming rather than learning.

Ensure that the direct impacts from performance information are not too large, and that a nuanced approach is taken. Ensure that organisations are not judged on M&E results, but on whether they address the findings of M&E. This will minimise gaming. Set targets in consultation with, and preferably by reaching consensus with, those whose job it is to meet the targets. Encourage professionalism wherever possible. Encourage self-monitoring or internal monitoring for the purpose of performance improvement as much as possible.

Law of Mushrooming – M&E systems become bloated and lose their simplicity in the process.

Continuously review M&E systems to ensure that they remain as simple as possible and continue to add value. Do not attempt to set measurable targets for everything. Set targets for a few key indicators with realistic targets based on available funding. Where appropriate, involving the beneficiaries of the services (citizenry) during the design of M&E systems could serve as an incentive to ensure relevance and simplicity.

Law of Collective Blindness – the targets do not give a full picture.

Train managers and officials on development of good plans, indicators and targets. Choose targets carefully. Present results against targets in the context of the bigger picture. Warn of the limitations of focusing only on the targets and do not encourage departments to only focus on issues for which targets have been set.

Law of Preservation of Bloated Systems – perverse systems are often resistant to change, either because they become a ritual or because a system of external stakeholders grows to maintain the system. Law of Decreasing Political Attention – there is often little political benefit in abolishing systems, and so political interest can wane, meaning that systems continue by default.

22

Continuously review M&E systems to ensure that they continue to add value. Promoting the use of evidence by citizens would assist in ensuring that such systems respond to the needs of the users.

Chapter 1 However, there are some challenges for evaluations specifically which can inhibit an effective evaluation and use of the evaluation findings. This has implications for the overall culture of organisations, as well as the specific mechanisms of the evaluation system. These challenges include: A Lack of ownership • If evaluations are imposed on organisations they may resist the evaluation and are unlikely to implement the findings. Therefore considerable efforts are needed in the process to ensure that ownership and a willingness to face the findings are retained by the main protagonists. Problems arising from poor planning • Policy and programme goals and objectives are often absent, unclear or deliberately hidden, and may change during the project or programme life cycle (Howlett & Ramesh, 1995:169). This makes gap identification and measurement difficult or even impossible (Vedung, 1997:43). • Criteria or indicators for measuring change are sometimes insufficient (Hogwood & Gunn, 1984:224). • There may well be no theory of change to use for assessing the evaluation, reflecting both a lack of clarity but also making it more difficult to analyse how the intervention is working. • Unintended consequences, spill over or side effects may complicate the evaluation process (Vedung, 1997:45; Hogwood & Gunn, 1984:224). Realism of evaluation budgets and time • Sometimes there are far too many evaluation questions which means a lack of clarity from the commissioner. • Insufficient resources for an evaluation sometimes prevent it from being done well or at all. • Tight time-frames that do not allow sufficient time for a thorough evaluation also result in it not being done. Data problems • Lack of data on which to draw for looking at impacts. This often means that the first evaluation to be conducted may well need to focus on implementation, in the process drawing out what is needed (particularly data on the intervention and potentially counterfactual) for impact evaluation. • Insufficient planning for and monitoring of the compilation of baseline or culmination data results in incomplete databases with incomplete information, which may lead to inaccurate findings and conclusions (Hogwood & Gunn, 1984:226). Challenges in drawing useful conclusions • The cumulative impacts of different, specially integrated projects or programmes that cannot be separated from each other make sensible conclusions about cause-effect relationships very difficult and sometimes even impossible (Hogwood & Gunn, 1984:226).

23

The Context of Evaluation Management • Evaluators may not be well-placed to draw up useful recommendations. This may need close working between the evaluators and commissioners to make these recommendations useful and actionable. • Linked to this is the fact that evaluation results are sometimes unacceptable because they are too academic and not practical enough; too technical and therefore incomprehensible to decision makers; too ambiguous and therefore not very useful; too late for specific purposes and therefore regarded as useless; and too critical of decision makers or managers and therefore not wanted. Challenges in using results which conflict with policy makers’ preferred view • Evaluation results may sometimes be politically or otherwise sensitive, and evaluations are therefore either not done, done superficially or partially, or done in a biased way (Hogwood & Gunn, 1984:227) and may also not be made public. Howlett & Ramesh (1995:178) aptly remark that “policy evaluation, like other stages of the policy cycle, is an inherently political exercise and must be recognised explicitly as such”.

1.8 Conclusions Policy management consists of a series of activities, starting with problem identification and an analysis of causal linkages and options for improvement, policy objective setting, implementation with linked monitoring and eventually evaluation to apply the lessons of systematic reflection about what needs to change in future. This introductory chapter has shown monitoring and evaluation to be integral to a rational policy management process, which should lead to performance improvement. Figure 1.3 provides an example of such a cycle and the associated M&E tools. Monitoring is a basic management function and part of the implementation process, while evaluation is an advanced management function involving more sophisticated methods to understand what has been achieved and why. These tools enable the manager to manage more efficiently because it ensures a constant stream of up-to-date, reliable performance information, as well as deeper causal analysis that enables the manager to make informed choices towards improving performance and realising strategic goals and objectives. The basic principles of evaluation remain the same whether evaluating policies, programmes or projects. The nature of monitoring and evaluation and the relationship of these activities to other management activities and functions are summarised, as well as different foci for evaluation and the benefits of and constraints on systematic evaluation. In order to maximise the impact of a policy, project or programme, one has to determine the results of the project or programme in question as accurately and credibly as possible, and understand why these results have been achieved. At the same time the quality of the process determines the degree of ownership of the evaluation, the likelihood of use and therefore the likely impact on performance. The current emphasis on “monitoring and evaluation” is part of a management focus aimed at improving organisational performance. In this it relates to management theories such as organisational development, operational research, management-by-objectives, New

24

Chapter 1 Public Management, evidence-based policy making and evidence-based management. Each approach has introduced its own tools and techniques for management. The chapter shows how public sector organisations differ, but how M&E can be used in these different types of organisations. In all cases, performance management and M&E attempt to influence government performance through tracking performance, assessing achievement of results, explaining the achievement or under-achievement of these results, and seeking to complete the feedback loop to planning and implementation. There are a number of requirements for ensuring that M&E and in particular evaluation results are used, and the system is not manipulated. For example, deliberately selecting targets that are low and choosing targets that are easy to achieve but not important. The first is for managers to own the evaluation system so that they see the benefit of their performance, rather than just doing evaluations for upward accountability. Other factors to maximise the likelihood of use include not having too many targets, lowering the stakes of the consequences of M&E results, building capacity to use evaluation systems, and keeping systems simple and under continual review. If evaluation can be delivered with quality, and used effectively in public sector institutions, then managers can be empowered to improve their interventions, policy makers to make better decisions, and Parliaments to understand better what the government is doing and hold it to account more effectively. Chapter 2 deals with the historical development of evaluation.

References Blanchard, K., Robinson, D. & Robinson, J. 2002. Zap the Gaps: Target Higher Performance and Achieve it. New York, NY: Harper Collins. Boaz, A. & Nutley, S. 2003. Evidence-based Policy and Practice. In Bovaird T & Löffler E (eds.). 2003. Public Management and Governance. London: Routledge Taylor and Francis Group. Bouckaert, G. & Van Dooren, W. 2003. Performance Measurement and Management in Public Sector Organizations. In Bovaird, T & Löffler, E (eds.). 2003. Public Management and Governance. London: Routledge Taylor and Francis Group. Bovaird, T. & Löffler, E. 2002. Moving from Excellence Models of Local Service Delivery to Benchmarking ‘Good Local Governance’. International Review of Administrative Sciences, 68(1): 9-24. Boyle, R. & Lemaire, D. (eds.). 1999. Building Effective Evaluation Capacity: Lessons from Practice. New Brunswick. USA: Transaction Publishers. Chevalier, R. 2007. A Managers’ Guide to Improving Workplace Performance. Washington, DC: American Management Association. Cloete, F. & De Coning, C. 2011. Improving Public Policy: Theory, Practice and Results. Pretoria: Van  Schaik. Davies, P., Newcomer, K. & Soydan, H. 2006. Government as Structural Context for Evaluation, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). The Sage Handbook of Evaluation. London: Sage. Davies, P. 2008. Making policy evidence-based: The UK Experience. World Bank Middle East and North Africa Region Regional Impact Evaluation Workshop Cairo, Egypt, 13-17 January 2008. http://siteresources. worldbank.org/INTISPMA/Resources/383704-184250322738/3986044-1209668224716/English_ EvidenceBasedPolicy_Davies_Cairo.pdf. Retrieved 30 April 2009. De Bruijn, H. 2007. Managing Performance in the Public Sector. Abingdon: Routledge. DPME. 2011. National Evaluation Policy Framework. Pretoria: Department of Performance M&E.

25

The Context of Evaluation Management DPME. 2013. Guideline for Planning Implementation Programmes, Guideline 2.2.3. Pretoria: Department of Performance M&E. DPME. 2014. Evidence Framework. Developed for Course on Evidence-Based Policy Making and Implementation, Pretoria: Department of Planning, Monitoring and Evaluation. Goldman, I & Mathe, J. 2013. Institutionalising Monitoring and Evaluation in South Africa. Unpublished manuscript. Pretoria: DPME. Goldman, I. 2001. Managing Rural Change in the Free State, South Africa – Transforming the Public Sector to Serve the Rural Poor. PhD thesis, Graduate School of Public and Development Management, Johannesburg: University of the Witwatersrand. Guijt, Irene M. 2008. Seeking Surprise: Rethinking Monitoring for Collective Learning in Rural Resource Management. Published PhD Thesis, Wageningen University, Wageningen, The Netherlands. http://edepot.wur.nl/139860. Retrieved 28 August 2014. Hale, J. 1998. The Performance Consultant’s Fieldbook: Tools and Techniques for Improving Organizations and People. San Francisco, CA: Jossey-Bass Pfeiffer. Hersey, P., Blanchard, K.H. and Johnson, E. 2007. Management of Organizational Behavior: Leading Human Resources. 9th Edition. New York: Prentice Hall. Ho, S.Y. 2003. Evaluating Urban Policy: Ideology, Conflict and Compromise. Surrey: Ashgate Publishing. Hogwood, B.W. & Gunn, A. 1984. Policy Analysis for the Real World. New York: Oxford University  Press. Howlett, M & Ramesh, M .1995. Studying Public Policy. New York: Oxford University Press. International Society for Performance Improvement. 2010. Handbook of Improving Performance in the Workplace. Hoboken, NJ: Pfeiffer. Isaac-Henry, K. 1993. Development and Change in the Public Sector. In Isaac-Henry, K., Painter, C., Barnes, C. (eds.). Management in the Public Sector: Challenge and Change. London: Thompson Business Press: 1-20. Jack, A. (n.d.). Value Mapping – A Second Generation Performance Measurement and Performance Management Solution, http://www.valuebasedmanagement.net/articles_jack_value_mapping_ second_generation_performance_management.pdf. Retrieved 15 April 2014. Kessler, A. & Tanburn, J. 2014. Why Evaluations Fail: The Importance of Good Monitoring. Donor Committee for Enterprise Development (DCED). http://www.enterprise-development.org/ page/download?id=2484. Retrieved 29 August 2014. Knowles, M. 1980. The Modern Practice of Adult Education: From Pedagogy to Andragogy. Wilton, Connecticut: Association Press. Kusek, J.Z. & Rist, R.C. 2004. Ten steps to a Results-based Monitoring and Evaluation System. Washington DC: The World Bank. Mager, R.F. & Pipe, P. 1970. Analyzing Performance Problems or ‘You really Oughta Wonna!’ Belmont, CA: Fearon. McLennan, A. 1997. Into the Future: Restructuring the Public Service. In Fitzgerald, P., McLennan, A., Munslow, B. Managing Sustainable Development in South Africa. 2nd Edition. Cape Town: Oxford University Press: 98-133. Mintzberg, H. 1996. Managing Government – Governing Management. Harvard Business Review, May-June: 75-83. Morra-Imas, Linda G. & Rist, Ray C. 2009. The Road to Results: Designing and Conducting Effective Development Evaluations. Washington, DC: World Bank. O’Sullivan, R.G. 2004. Practicing Evaluation: A Collaborative Approach. Thousand Oaks: Sage. Owen, J.M. & Rogers, P.J. 1999. Program Evaluation Forms and Approaches. International edition. London: Sage. Owen, J.M. 2006. Program Evaluation. Forms and Approaches. 3rd edition. New York: Guilford Press. Parsons, W. 1995. Public Policy: An Introduction to the Theory and Practice of Policy Analysis. Vermont, USA: Edward Elgar. Peters, B.G. & Savoie, D.J. 1994. Civil Service Reform: Misdiagnosing the Patient, Public Administration Review, 54(5): 418-425.

26

Chapter 1 PSPPD. 2011. Evidence-Based Policy-Making (EBPM): Enhancing the Use of Evidence and Knowledge in Policy Making, Policy Brief no 1. Pretoria: Programme to Support Pro-Poor Policy Development. www.psppd.org.za. Retrieved 14 April 2014. Rabie, B. 2011. Improving the Systematic Evaluation of Local Economic Development Results in South African Local Government. Dissertation Presented for the Degree of Doctor of Public and Development Management at Stellenbosch University, Stellenbosch. Rakoena, T. 2007. Project Implementation Planning, Monitoring and Evaluation. Presentation distributed via the SAMEA Discussion Forum. www.samea.org.za. Retrieved 14 August 2007. Riggs, F.W. 1991. Bureaucratic Links Between Administration and Politics in Farazmand, A (ed.). Handbook of Public and Development Management. New York: Marcel Dekker. Robbins, S.P. & Decenzo, D.A. 2001. Fundamentals of Management. 3rd edition. New Jersey: Prentice Hall. Robinson, D.G. & Robinson, J.C. 2008. Performance Consulting, Second Edition. San Francisco, CA: Berrett-Koehler. Rogers, S. 1999. Performance Management in Local Government. 2nd edition. London: Financial Times Management. Rossi, P.H., Lipsey, M.W. & Freeman, H.E. 2004. Evaluation: A Systematic Approach. 7th edition. Thousand Oaks: Sage Publications Inc. SA-DPSA. 1997. White Paper on Transforming Public Service Delivery. Pretoria: Department of Public Service and Administration. Save the Children. 1995. Toolkits: A Practical Guide to Assessment, Monitoring, Review and Evaluation. London: Save the Children. Scriven, M. 1991. Evaluation Thesaurus. Newbury Park, CA: Sage. Segone, M. 2009. Enhancing Evidence-based Policy Making Through Country-led Monitoring and Evaluation Systems. In M. Segone (ed.). Country-led Monitoring and Evaluation System: Better Evidence, Better Policies, Better Development Results. UNICEF Evaluation Working Papers. Geneva: UNICEF. Shafritz, J.M. (ed.). 1998. International Encyclopaedia of Public Policy and Administration. Boulder, CO: Westview. Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). The Sage Handbook of Evaluation. London: Sage. Thompson, A.A. & Strickland, A.J. 1998. Strategic Management: Concepts and Cases. 10th edition. Boston: Irwin McGraw-Hill. Valadez, J. & Bamberger, M. 1994. Monitoring and Evaluating Social Programs in Developing Countries: A Handbook for Policymakers, Managers and Researchers. EDI Development Studies. Washington: World Bank. Vedung, E. 1997. Public policy and Program Evaluation. New Brunswick: Transaction Books. Venter, A. 1998. Making performance management work: Charting a Course for Success People Dynamics, 16(8): 42-46. Wikipedia. Website of Wikipedia. www.wikipedia.com. Wooldridge, D. and Cranko, P. 1997. Transforming Public Sector Institutions. In Fitzgerald, P., McLennan, A., Munslow, B. (eds.). Managing Sustainable Development in South Africa, 2nd edition. Cape Town: Oxford University Press: 322-343.

27

Chapter 2 Historical Development and Practice of Evaluation Charline Mouton, Babette Rabie, Christo de Coning and Fanie Cloete 2.1 Introduction This chapter summarises the historical development, in primarily the USA, UK, Africa and South Africa, of what has become known as professional policy, programme and project evaluation. It deals with the development of the professional evaluation perspective that was conceptualised and explained in Chapter 1 and which over time increasingly took root in the public, private and voluntary sectors of society, resulting in the emergence of a distinct evaluation profession cutting across disciplinary boundaries, governmental levels, business and societal sectors.

2.2

Origins of Evaluation as an Applied Transdiscipline

The evaluation discipline emerged as a cross-cutting, applied discipline and eventually as a distinct transdisciplinary profession, out of various other disciplines during the first half of the 20th century (Scriven 1991, Mathison, 2005:422). Transdisciplinarity according to Jahn et al. (2012), in Florin, Guillermin & Dedeurwaerdere (2014), is a “critical and selfreflexive research approach that relates societal with scientific problems; it produces new knowledge by integrating different scientific and extra-scientific insights; its aim is to contribute to both societal and scientific progress” (see also Max-Neef, 2005). It implies a totally new discipline that has developed its own ontologies, epistemologies and methodologies, distinct from other disciplines (see also Pohl, 2011; Cilliers & Nicolescu, 2012). There is disagreement among scholars about when and where the first systematic, rigorous evaluation was undertaken that kick-started the beginning of the prevailing trend that we experience today. The main reason for this lack of clarity is probably due to the hard boundaries and isolated silos within which scholarly research were undertaken in different disciplines during the early 20th century. The main sectoral contenders for the distinction of being the catalyst of the current evidence-based evaluation tsunami that is emerging as a distinct new applied scholarly paradigm across the globe are the fields of agriculture, education, psychology, medicine (especially epidemiology) and social development. The history of evaluation is briefly summarised in this chapter – commencing with the various views on when programme evaluation started. The section is dominated by the American experience where programme evaluation emerged first as a systematic activity in various sectors, although important developments in the UK will also be summarised here. This primary focus on the USA and UK origins of systematic programme evaluation

28

Chapter 2 does not deny similar and parallel developments in this field in other countries, but the dominant influence of these two country case studies on the current field across the world, and a number of additional strategic considerations, led us to restrict our current focus in this way. Canada has also played a very prominent role in the past in the field of evaluation and still is on the forefront of new developments in this area. Greene (n.d.), provides an excellent succinct summary of the historical development of programme evaluation in Canada, while the website of the Canadian Evaluation Society (CES, at www.evaluationcanada. ca), constitutes probably one of the best global examples of how a professional evaluation society should operate.

2.3

The early start: Ancient history to the Second World War

Shadish and Luellen (2005:183) state that “(e)valuation is probably as old as the human race, dating from the time humans first make a judgment about whether building campfires and using weapons helped them to survive. Indeed, evaluation is an essential human activity that is intrinsic to problem solving, as humans (a) identify a problem, (b) generate and implement alternatives to reduce its symptoms, (c) evaluate these alternatives, and then (d) adopt those that results suggest will reduce the problem satisfactorily”. Morra-Imas & Rist (2009:19) refer to the Egyptians who already monitored and assessed their agricultural crops systematically in 5 000 BC , while Shadish and Luellen (2005:183) conclude that the notion of “planful social evaluation” can be dated back to as early as 2 200 BC with government staff selection in China. Activities similar to what is now called programme evaluation had therefore been evident for centuries before “modern” programme evaluation emerged in the 1960s. Hogan (2007:4) refers to the first time that a quantitative mark was used in 1792 in Britain to assess the performance of a student assignment. Rossi, Lipsey and Freeman (2004) state that programme evaluation-like activities were already evident in the 18th century in the fields of education and public health. During the 19th century, studies were undertaken by government-appointed commissions to measure initiatives in the educational, law and health sectors. Their US counterparts – presidential commissions – examined evidence in an effort to judge various kinds of programmes. A system of regulations was also developed in 1815 in the US by the Army Ordnance Department and is recorded as one of the first formal evaluation activities (Rossi, Lipsey & Freeman, 2004). The formalisation of school performance occurred for the first time in 1845 in Boston, followed by the first formal educational programme evaluation study by Joseph Rice between 1887 and 1898 on the value of drills in spelling instruction (Madaus & Stufflebeam, 2000). In the early 1900s, the American government initiated an agricultural research project to determine, in a more rigorous way than had been the case up to that point, which agricultural practices yielded the largest crops. This is regarded by many scholars as the first government-driven evaluation study (Chelimsky, 2006:34). During the 1930s a significant change occurred in the public administration sphere. Rossi, Lipsey and Freeman (2004:11) refer to this as a time when the “... responsibility for the nation’s social and environmental conditions and the quality of life of its citizens” transferred

29

Historical Development and Practice of Evaluation from individuals and voluntary organisations to government bodies. Because federal government remained quite small up to the 1930s, very little need existed for social information – investment into social science research at that stage was estimated to be between $40 and $50 million (Rossi, Lipsey & Freeman, 2004:11). Potter and Kruger (2001) as well as Alkin and Christie (2004:17-18), recall the work of Ralph Tyler as being the catalyst in establishing evaluation as a “distinct” field. In 1941 Tyler presented the results of an eight-year study on educational objectives and the measurement of their subsequent outcomes. These ranged from the ability to recall facts to the ability to apply general principles to new situations and expressing ideas effectively (Tyler as cited in Madaus, 2004:71). The period between 1947 and 1957 was a time of industrialisation and euphoria in terms of resource expenditure. In fact, most scholars of programme evaluation’s history identify the Second World War as the main take-off point, when the US federal government’s vast expenditure in different societal sectors required more systematic and rigorous reviews of government spending priorities and practices. The advancement of programme evaluation was therefore directly tied to the fiscal, political and economic policies of the times. It is interesting to note that up to this stage programme evaluation activities were foremost undertaken at local agency level. Although the federal government was increasingly taking on responsibility for Human Service, it was not yet engaging with programme evaluation. Two very distinct phases can be identified when studying the rise of programme evaluation in the United States: the boom period which occurred between the 1960s and 1980s, and a period of fiscal contraction post 1980s that had a detrimental effect on programme evaluation.

2.4

The Development of Evaluation in the USA

2.4.1 The Evaluation Boom in the US: After World War II until the 1980s Following the Great Depression of the 1930s, the United States government adopted greater responsibility for the general welfare of its citizens and dramatically expanded social programmes in health, education and housing. Government support and spending increased even further after World War II, with funding available due to rapid economic growth (Shadish & Luellen in Mathison, 2005:184). Morra-Imas & Rist (2009:21) relate how the first Hoover Commission recommended a new budgeting approach in the US government in 1949 that later became known as “performance” budgeting, while the Planning, Programming, and Budgeting System (PPBS) was introduced in 1962 by the then Secretary of Defence in the US government, Robert McNamara. This new approach expanded the performance budgeting movement and transformed it into the “management by objectives” approach which later turned into the “monitoring for results” approach to good  management. The evaluation field exploded in the 1960s and 1970s in the USA with the expansion of social policies and programmes aimed at affecting various normative and empirical goals to promote socio-economic development as the main weapon in the new “war on poverty”. This social war marked a drastic escalation in social programme funding to combat the

30

Chapter 2 negative effects of poverty. Consequently, funds for social welfare problems almost doubled during this time and, concomitantly, the need emerged to have these programmes assessed (and documented) in a more systematic manner. The second trigger, and perhaps taking a more supportive role, was the strong base of applied social scientists that existed in the US. The history of social sciences in the US has strong ties with Germany. In fact the first cadre of social scientists (1820-1920) was trained in Germany. This led to the adoption of the German graduate school as a model by many US universities as well as a strong reliance on German theories of social change (House, 1993). The first formal entity to be established in the social science discipline was the American Social Science Association which came about in 1865 (House, 1993). During this period, numerous evaluations were performed in response to federal, state and local programme managers’ mandates. Cost constraints and concern about the success of social programmes with regard to achieving outcomes fuelled the evaluation profession (Shadish & Luellen in Mathison, 2005:185; Shadish, Cook & Leviton, 1991:22). Chelimsky confirms that the main aim of evaluation efforts was to rationalise resource allocation and the management of programmes (Chelimsky, 2006:34). The 1970s was marked by an increasing resistance to the expansion of social development programmes, partly as a result of the increased funding needed to sustain these programmes and the apparent ineffectiveness of many initiatives (Freeman & Solomon in Rossi, Lipsey & Freeman, 2004:14). Legislative efforts that contributed to the persistence of programme evaluation included the socalled Sunset Legislation which was introduced in 1976 (Adams & Sherman, as cited in Derlien, 1990). The Sunset Legislation stipulated that regulatory agencies be reviewed every six years to determine which agencies would be spared from automatic termination. The legislation included a set of criteria against which organisations/agencies would be judged. This led to agencies affording more importance to evaluating the attainment of their own goals in terms of legislation (Hitt, Middlemist & Greer, 1977). Two government agencies in particular took the lead in conducting evaluations at federal government level during this time. The General Accounting Office (GAO) and Bureau of the Budget (BoB) were both established in 1921 by means of the Budget and Accounting Act of 1921. A brief overview of how these agencies’ mandates shifted progressively towards programme evaluation as leadership changed are shown in the enclosed text boxes. The early years of modern evaluation was therefore characterised by strong policy support and the institutionalisation of supporting bodies to ensure evaluation becomes an embedded and continuous effort. The support included financial assistance, with approximately $243 million appropriated towards the evaluation of social programmes in the 1977 fiscal year (Wholey, 1979). By 1984 it was estimated that evaluation units employed more or less 1179 people, with one quarter of the 1689 studies being conducted externally (Derlien, 1990). The “high water mark” of this era (according to Rist & Paliokas, 2002) occurred in 1979 with the release of the Office of Management and Budget (OMB) Circular No. A-117, titled “Management Improvement and the use of Evaluation in the Executive Branch”. This circular typifies the

31

Historical Development and Practice of Evaluation formalisation of programme evaluation in the US public sector with the executive branch making the assessment of all government programmes compulsory in order to better serve the public. Box 2.1: The General Accounting Office (GAO) and the Bureau of the Budget (BoB) General Accounting Office (GAO)

Bureau of the Budget (BoB)

The bulk of the GAO’s work consisted of checking and reviewing the accounts of federal disbursing officers and all the supporting documents attached to these accounts. For the first few decades of the GAO’s existence, federal departments conducted their own studies into the effectiveness of their programmes.

The Bureau of the Budget’s evolution into the conduct of programme evaluation came about in a different manner than its twin (GAO). Under the directorship of General Dawes in the 1920s, the agency focused all its energy on economy and efficiency. The activities of the BoB at the time of establishment in the 1920s were first and foremost geared towards a reduction of government expenses (Mosher, 1984).

A strong focus on accountancy persisted for more than a decade under the two comptroller generals of the mid-1940s to the mid-1960s, Lindsay Warren and Joseph Campbell, as they employed mainly accountancy college graduates and experienced accountants from the private sector (Mosher, 1984). Towards the end of the 1960s a drastic increase in programme evaluation activities came about. Congress, not wanting to rely solely on the executive branches’ results, required, through the Economic Opportunity Act of 1967, that the GAO extends its reach to also assess programmes (Derlien, 1990). The Impoundment Control Act in 1968 further afforded the comptroller added responsibilities such as developing evaluation methods, setting up standardised budgetary and fiscal information systems and creating standard terminology (Mosher, 1984). Elmer Staats who was formerly employed by the Bureau of the Budget emphasised these new types of activities to the GAO and made some key appointments to strengthen the focus on programme evaluation. The growth in this field led to the establishment of the Institute for Programme Evaluation (later renamed the Programme Evaluation and Methodology  Division).

In 1939, the BoB was moved to the Executive Office of the President and later (1970) reorganised into the Office of Management and Budget (OMB) during Nixon’s term as president. The different philosophies during that time included programme-budgeting systems in the 1960s, management by objectives (MBO) in the 1970s and zero-based budgeting (ZBB) in the late 1970s (Mosher, 1984). It is the Planning, Programming and Budgeting Systems (PPBS) philosophy that sparked the interest in programme evaluation. The director of the OMB during Nixon’s rule, George Shultz, established the Quality of Life Review Programme which took this one step further and required from the Environmental Protection Agency to submit regulations in draft to the OMB before public dissemination (Mosher, 1984). This led to this monitoring activity being added to the OMB’s task list. Under the presidency of Carter, internal review procedures for regulatory agencies were established and all of this was overseen by the OMB.

With this rise in programme evaluation studies, a strong demand for professional programme evaluator expertise emerged. Due to a lack of trained evaluators and the reigning economic management paradigm of that time – the Planning, Programming and Budgeting Systems – accountants, economists and management consultants remained in key “earmarked evaluator” positions for some time. For many evaluators, programme evaluation was a secondary discipline. For example, in 1989 only 6% of evaluators listed in the American Evaluation Association’s membership directory considered themselves to be evaluators (House, 1993). This is very much an indication of the newness of the field at that stage.

32

Chapter 2 Box 2.2: New Public Management (NPM) The New Public Management (NPM) originated from a fusion of the public choice economic theory, the pertinent role that management consultants started to play during the 1980s and 1990s, the weak state of the macro economy that persisted since the 1970s and the general discontent among the citizenry of the way in which the State managed public funds (Foster & Plowden, 1996). The last point in particular sparked the emergence of the NPM and the accompanying emphasis placed on performance measurement. NPM is defined by Barberis (1998) as follows: NPM is used to describe a management culture that emphasises the centrality of the citizen or customer as well as accountability for results. It also suggests structural or organisational choices that promote decentralised control through a wide variety of alternative service delivery mechanisms, (including quasimarkets with public and private service providers competing for resources from policymakers and donors). The general loss of confidence in the government’s ability to spend tax money properly remains the most commonly stated reason for the emergence of the new public management approach. Dissatisfaction with vague justifications for poor performance paved the way for a much needed revival of concepts such as effectiveness and efficiency. NPM was seen as a solution to address poor performance and reinstate the citizens’ trust in the public sector’s management abilities. It is viewed as a post bureaucratic movement whereby competition for resources was effectively increased and accountability measures to determine effectiveness of outputs were employed. It also suggests structural or organisational choices that promote decentralised control through a wide variety of alternative service delivery mechanisms, including quasi-markets with public and private service providers competing for resources from policymakers and donors (Barberis, 1998). Hoggett (1996) posits that restructuring attempts has led to expansion in the development of performance management and monitoring – like initiatives and activities. The new public management movement has caused the government to put in place systems that can answer not only questions around efficiency but also effectiveness. The spotlight on outcomes and impact led to a shift in favour of programme evaluation as opposed to monitoring.

The lack in formal programme evaluation training was initially addressed by US Congress in 1965 with funding being appropriated towards graduate training programmes in educational research and evaluation (Rist, 1987). In the executive branch some of the policy analysts were familiar with evaluation methodology and therefore conducted some of the research in-house. During the 1980s the GAO recruited from universities and research agencies in order to gain staff with solid programme evaluation experience (Rist, 1987). However due to the magnitude of these studies, all too often the evaluation function was outsourced to an external researcher which encompassed government-controlled institutions, independent academic centres, private companies or quasi-public agencies such as the National Academy of Science. The extent of effort needed to conceptualise those first programme evaluation courses was immense. The developers of these courses consulted various resources including research literature, theoretical models, job descriptions of evaluators, evaluation textbooks, surveys on what evaluators actually do, as well as decision makers’ ideas around effective practice  (Davis, 1986).

33

Historical Development and Practice of Evaluation

2.4.2 The Contraction Phase in the US: The Mid 1980s- 2000 The 1980s saw a decline in evaluation activities under the budget cuts of the Reagan administration (Cronbach in Shadish, Cook & Leviton, 1991:27). By the 1990s, fiscal and social conservatism started to thwart further expansion of government programmes, leading also to a decline in funding available for evaluation studies (Shadish & Luellen in Mathison, 2005:186). During the Nixon administration, initial indications of support soon turned as Nixon’s interest shifted towards New Federalism. The Reagan years were not any less challenging. The Reagan administration reaffirmed the very real link between the fiscal situation and evaluation funding and activities. The golden years of evaluation came to an abrupt halt with severe budget cuts in the field of education and other social programmes enacted by the Reagan administration. Reagan’s appropriation of block grants to states led to a decreased need for evaluation activities as justification for funding became redundant (Worthen, 1994). The effect on programme evaluation was by no means small and federal evaluation offices suffered. Where the Department of Education’s Office of Planning, Budget and Evaluation conducted 114 evaluations in 1980, only 11 were carried out in 1984 (Shadish, Cook & Leviton, 1991). A GAO survey in 1984 confirmed these findings with 47 programme evaluation units ceasing their activities. Financial resources also took a knock with spending declining by 37% and staff resources declining by 22% (Havens, 1992). The implications of the limited programme evaluation activities had a significant ripple effect on the quality of national-level data. Health, education and labour statistics were significantly cut virtually no longitudinal data being collected during the Reagan years (Havens, 1992). The enhanced focus on audit-like financial data has forever left a void at federal level which cannot be regained. Throughout this period of declining programme activities, desperate pleas came from the GAO to rectify the situation. Charles Bowsher’s letter (1992:11-12) to federal government reflects his concern with the current situation: “Officials in both executive and legislative branches need quality evaluation to help them reach sound judgments. Without this capability, executive branch policymakers are in a weak position to pursue their policy objectives with the Congress, to justify continuation of their programmes and to eliminate wasteful unnecessary initiatives, because they lack supporting data”. According to Joseph Wholey (1979), the need for public sector improvement in the USA became particularly crucial during this time as the constant tension between “better services” and “lower taxes” escalated. Hope for the future of evaluation was revived with the rise of New Public Management (NPM) in the mid-1980s (see short description in enclosed text box above). The popularity of NPM cannot be attributed to one single source but rather a fusion of ideas as well as the dire fiscal situation at the time. The fiscal crisis was fuelled by a collision of rising public expenditure and strong public opposition to higher taxes. In the US, right-wing attempts to reduce public expenditure was met with resistance. In order to accommodate the US citizenry, substantial tax cuts were made, but with very little accompanying reduction in public expenditure (Foster & Plowden, 1996).

34

Chapter 2 President Clinton is regarded as the pioneer in introducing NPM-like principles and practices into federal government in the US. The ten principles as developed by Osborne and Gaebler in their 1992 publication titled Reinventing government caught Clinton’s attention. He believed that accountability and performance-based management would transform government into a more cost-effective entity. This culminated in the vice-president, Al Gore, appointing Osborne as adviser in implementing their ideas in federal government. The 1992 publication is viewed as the main influence for the National Performance Review that followed in 1993. In practice, all civil servants were encouraged to deliver services more economically and effectively. This could be done relatively painlessly if the entrepreneurial government model was applied. In essence, this entailed reaching a level where the same outputs could be produced with fewer inputs or the same inputs could produce more outputs (Foster & Plowden, 1996). The National Performance Review demonstrated that public management reform was a presidential priority. This led to a change in the federal procurement system, the implementation of the Government Performance and Results Act (GPRA) of 1993 and a reorganisation of the Office for Management and Budget (Barzelay, 2001). The 1993 Government Performance and Results Act formalised the implementation of performance measurement and reporting in the United States Federal Government. In order to reduce budget deficits, government departments were streamlined and very clear measurable goals were set. Although the GPRA advocated for the incorporation of programme evaluation results into strategic and annual performance plans, findings from Wargo (1995) suggest that this did not happen. He set out to determine whether GPRA had any influence on programme evaluation activities and found that the GPRA legislation did not stimulate these activities. Rather, he discovered that participation and implementation of the GPRA was extremely limited in 14 of the most active evaluation offices in the executive branch. He also found an alarming reduction in non-supervisory and supervisory evaluation staff. Suffice to say, both the GPRA and National Performance Review revealed an interest in financial, short term data by programme managers and policy makers as opposed to longer term, in-depth, programme evaluation studies. Melkers and Roessner (1997) came to the same conclusion stating that the US focuses on periodic monitoring rather than programme evaluation. The long term effect of this decision-making process is that potentially good programmes will be discontinued due to a misrepresentation of their effectiveness (Bowsher, 1992). A 1998 GAO study showed a continuing downward trend in programme evaluation activities. The document titled “Number of Evaluation Offices in Non-Defence Departments and Independent Agencies” considered federal evaluation activities in 23 government agencies. A total of 81 offices reported spending financial and human resources on programme evaluation. The findings (based on the 1995 financial year) were disheartening (Rist & Paliokas, 2002:230): • Evaluation activities were small, totalling 669 staff and only amounting to $194 million.

35

Historical Development and Practice of Evaluation • 45 of the 81 offices conducted five or less evaluations, whilst 16 offices accounted for two thirds of the 928 total studies. Six of the 23 agencies did not conduct any evaluation activities in the 1995 financial year. • The total number of evaluations conducted, 928, is 55% of those conducted in 1984  (1689). • The primary users of results were found to be programme managers, as opposed to the results being used for direct programme improvement. • A decline was also noted in terms of the number of evaluations conducted in-house, the cost of evaluation studies and the duration of the studies compared to 1984. In 1996 the Programme Evaluation and Methodology Division of the GAO closed its doors for the last time. The reason for its closure relates to budget constraints and the decline in staff numbers over the last few years (Grasso, 1996). The end of the millennium showed two contradictory forces at work in programme evaluation in the United States – on the one hand, there was the Clinton administration policy (NPR) where downsizing and budget constraints took precedence. On the other hand, Congress through the GPRA required executive agencies to focus on results in order to inform budget decisions. Despite the decline in evaluation activities, the United States still remains the most advanced evaluation system for the following reasons (Rist, 1990): • Programme evaluation is a clearly distinguishable field from, for example, auditing and  accounting. • The evaluation system is firmly institutionalised within the bureaucracy and legislative  branch. • The US has taken the lead in influencing other countries to introduce programme  evaluation. The US has not only taken the lead in establishing evaluation as a discipline, but also promoted the professionalisation of the field through the following: • The establishment of the Evaluation Research Society and the Evaluation Network during the 1970s serves as a further confirmation of the professionalisation in the evaluator workforce. These two organisations later amalgamated to become the American Evaluation Association. • Development of standards for evaluation practice by the Joint Committee on Standards for Educational Evaluation in 1994 and other subsequent bodies. • Specific graduate and postgraduate training programmes for evaluators. • The increase in books covering programme evaluation and the first journal of evaluation (Evaluation Review) being launched in 1976. Today, there are more than a dozen journals on evaluation (Rossi, Lipsey & Freeman, 2004). • Significant contributions in the development of evaluation theories and methodologies. We will now turn our focus to the UK to juxtapose the development of programme evaluation in the USA. This comparison will assist in deriving at some similarities but also differences in the manner in which programme evaluation evolved in these countries.

36

Chapter 2

2.5

Evaluation in the UK1

Gray and Jenkins (2002:131) suggest that two forces pushed evaluation in the UK: “First an administrative determination to install effective mechanisms to control and prioritize departmental spending decisions and secondly a political desire to raise the profile of public management and to assist more rational and collective decision making”. This was especially evident in the Conservative Heath era (1970-1974). The way in which programme evaluation evolved in the UK was therefore directly aligned to the economic and fiscal situation at the time, as also happened in the USA. The British experience with programme evaluation can best be assessed in five phases generally reflecting successive political knowledge and ideological paradigms.

2.5.1 Phase 1: Applied Social Research: Before 1963 Prior to the 1960s, systematic evaluation activities were typically categorised as applied social research. Examples of early applied social research include the work done by The Royal Commission on the Poor Laws in 1832 as well as a survey on the London poor led by Charles Booth in 1890 (House, 1993). Studies such as those undertaken by the Clapham Committee focused only on the development of social science within universities. One of the main reasons for this state of affairs is that successive British governments did not like the idea of assessing the value of its policies and programmes. Resistance to make programme information available for scrutiny was initially restricted by various legislative regimes: “This general reluctance to open policies to public scrutiny is not confined to the British government, but the operating styles of successive British administrations, aided and abetted by the structures of the Official Secrets and Public Records Acts, make public evaluations of the policy programmes of central departments problematic” (Jenkins & Gray, as cited in House, 1993:43). In the mid-1960s the British government at the time established a committee under the chairmanship of Lord Heyworth to investigate the promotion of science. The Heyworth Committee for the first time in 1965 pointed to a desired link between social sciences and policy making (Blume, 1987). This committee also played an instrumental role in the establishment of the Social Science Research Council in 1965 which afforded greater prominence to the broader social science discipline (House, 1993). The first officially recognised programme evaluations in Britain were the Nuffield Foundation funded science curriculum projects in the field of curriculum development in the early 1960s, where initiatives attached to curriculum innovation were made subject to evaluations (House, 1993).

2.5.2 Phase 2: 1963-1979 The successive political administrations of this period – Conservative in 1963, Labour in 1966 and again Conservative in 1970 – all linked public expenditure plans to economic growth targets (Hogwood, 1987). The Public Expenditure Survey (PES) System, with its roots in the Plowden report of 1961, supported a focus on evaluation. The Plowden report 1 This section comprises a revision and expansion of a section in Mouton (2010).

37

Historical Development and Practice of Evaluation called for a more holistic long term focus of public expenditure in relation to resources, and entailed a detailed analysis per department as well as expenditure presented by the policy area. In this manner, cross cutting policy areas’ expenditure could more easily be collated and examined. In order to regulate the PES, the Public Expenditure Survey Committee (PESC) was formed. For clarity purposes, this committee’s impetus was not to determine resource allocation but to consider the outcomes of present policies three to four years down the line (Hogwood, 1987). Although reported not to have strengthened the collective responsibility of cabinet, the PES placed emphasis on the longer term implications of total public expenditure. The PES, although rigorous in scrutinising public expenditure, lacked an analytical instrument against which to assess policy impact and effectiveness. It also did not bring together ministers to collectively consider the various expenditure proposals (Hogwood, 1987). The Policy Analysis Review (PAR) system was introduced by the Conservative Heath administration to fill this gap. This output orientated approach was borrowed from the American PPB strategy (planning, programming and budgeting). Gray and Jenkins (1982:429) conclude that this was the only attempt ever by Whitehall to institutionalise rational policy analysis. In essence it was expected that these reviews would instil a culture of regular assessment of departmental and interdepartmental programmes. Initially, the PAR reviews were avidly supported by the Prime Minister and informed policy as was anticipated with substantial contributions to the fields of higher education and school expenditure. It can be gathered therefore that PAR was more inclined towards policy appraisal than programme assessment and never quite put “its stamp on departmental review activities” (Derlien, 1990). However, the advent of the second Wilson Labour administration in 1974 and a shift to financial control turned the tide for the PAR review system. Another Heath administration reform measure in 1971 (which lasted until 1983 when Thatcher abolished it) was the Central Policy Review Staff (CPRS). The CPRS, commonly referred to as the “Think Tank”, supported ministers in critically assessing whether policy and programme decisions would reach the predetermined long term goal to “... enable (Ministers) to take better policy decisions by assisting them to work out the implications of their basic strategy in terms of policies in specific areas, to establish relative priorities to be given to different sectors of their programme as a whole, to identify those areas of policy in which new choices can be exercised and to ensure that the underlying implications of alternative courses of action are fully analysed and considered” (Reorganisation of Central government, CMND 4506 as cited in Pollitt, 1994). The members of this elite group consisted of civil servants, businesses, academics and international organisations with a disciplinary bias towards the economic and business experience. The contribution made by the unit remains questionable which is probably due to a number of factors (Pollitt, 1994), including the following: • Bodies under scrutiny did not want to reveal too much to an agency with such close affiliation with the Prime Minister.

38

Chapter 2 • The impossibility of executing the required scope of activities with the limited number of staff members. Detailed analysis and expertise in a number of fields would necessitate expansion in staff numbers which in turn would compromise the non-bureaucratic nature of the CPRS. • Very limited perceived successes due to the fact that only two influential reports were produced during 1970 and 1974. The Labour government came into office in 1974, first under the leadership of Wilson for a second time, followed by Callaghan in 1976. This period was characterised by dire fiscal problems: public expenditure sky-rocketed which influenced the inflation rate and the calculation of public sector inputs (Hogwood, 1987). It was clear that public expenditure needed to be curbed drastically.

2.5.3 Phase 3: The Thatcher-Major Conservative Government Era: 1979-1997 During the 1979 elections the Thatcher Conservative government was elected into office. Thatcher’s leadership marked the shift towards greater management of resource consumption and performance measurement, as opposed to policy analysis in the public sector management sphere. The words “reduction”, “control”, and “limit” were increasingly heard in the halls of Whitehall – there was a new fad in town and it was called resource management. One of her strategies was an immediate recruitment freeze and the setting of subsequent targets to reduce the size of the civil service (Foster & Plowden, 1996). Mrs Thatcher was intrigued by private sector practice and how a more systematic and comprehensive management framework could improve public service delivery. Bureaucracies were broken down and staff reduced in order to diffuse institutional power. A number of initiatives were launched during this time to tighten the control on resource expenditure. Her support for “value for money” programmes and the managerialist policy approach led to the revival of evaluation in that new policies were subjected to routine subsequent evaluations (Derlien, 1990). The need for state auditing and enhanced regulation in the UK led to the establishment of a number of evaluative agencies (House, 1993) which included the following: • Inspectorates: A distinction can be made between enforcement and efficiency inspectorates. The former is concerned with operations in the private sector bearing in mind public protection, while the latter promotes efficiency and standards in the public  sector. • Peer Review Mechanisms: The assistance of reputable institutions and professionals were often sought to contribute to decisions around resource allocation. • Audits: Audits were undertaken to determine whether the value for money principles were adhered to and that good financial practices (such as efficiency) were being  enforced. Four national agencies are tasked to execute these varying regulatory activities: The National Audit Office, the Audit Commission, an audit body for Northern Ireland and one for Scotland (Bowerman, Humphrey and Owen, 2003). Even though peer review is

39

Historical Development and Practice of Evaluation mentioned as a regulatory activity, the development of performance indicators became the more dominant evaluation methodology (House, 1993). The National Audit Office (NAO) headed by the Comptroller and Auditor General was established in 1983 by means of a private member’s bill. This agency reports to Parliament and is usually reviewed by the Public Accounts Committee (PAC). Its main concern is economy, efficiency and effectiveness which are mainly done through Value for Money (VFM) studies. These studies typically consist of a planning, investigation and reporting phase. Although these studies reflect evaluation-like activities in that the achievement of objectives is also being assessed, the work of this agency is still predominantly financial. This is not the case with the Audit Commission (AC) whose activities often extend beyond the financial issues to consider implementation challenges and programme staff’s opinions. This body is indicative of government’s support for evaluative practices in the 1980s. The need for an independent body to oversee the poor performing local government, although initially encountering some resistance by the Conservative government in 1979, eventually came into being under the Local Government Finance Act in 1982. The AC oversees the adherence to national policies and is directly linked to the rise of the New Public Management (NPM) movement. The emphasis placed on accountability, greater citizen satisfaction and value for money led to a rise in the performance measurement movement (Kelly, 2003). The three E’s (Economy, Efficiency and Effectiveness) became the mantra of the UK public sector. This agency enjoys considerably more freedom than the NAO and, although independent of central and local government, close liaison takes place with Ministers and government departments. The Social Service Inspectorate, established in 1985, originated out of the Social Work Service of the then Department of Health and Social Security. Prior to this change in name (and invariably focus) the Social Work Service delivered professional and advisory services to the Department of Health and Social Security (Henkel, 1991). With the expanding concern around public expenditure during the 1970s, inspections became a much needed activity in ensuring that the principles of efficiency and value for money were being adhered to by local authorities. In 1982 a study commissioned by the House of Commons and headed by Treasury and the Civil Service Committee was conducted on the Efficiency and Effectiveness in Civil Service. The committee’s concerns were centred on the breadth of goals and it was thereby suggested that more concrete objectives be set to measure progress along the way. In order to address effectiveness, quantifiable measures needed to be developed, viz. performance criteria. Efficiency was tackled through careful consideration of resource expenditure. It was propagated that each programme should as far as possible aim to produce the same quantity targeted outputs without compromising on quality. The committee concluded that other countries’ success (i.e. efficiency and effectiveness) could directly be attributed to the competence of government officials in executing their “day to day management” (House of Commons Third Report, 1989).

40

Chapter 2 Box 2.3: Two examples of reform initiatives “Raynerism” named after Lord Rayner signified a greater drive in carrying out existing activities as effectively as possible. The Efficiency Strategy became the method of the moment (Gray & Jenkins, 1982). The review of policies did not feature at this stage, as all effort was geared towards supporting a culture of resource management (Hogwood, 1987). In strengthening the resource management culture, the Financial Management Initiative (FMI) was developed “as an instrument of management change” (Gray, Jenkins, Flynn & Rutherford, 1991). The Fulton committee reportedly had a significant historic influence on the management of civil service and subsequently the thinking behind the FMI. The FMI was launched in 1982 and had three aims (Gray, Jenkins, Flynn & Rutherford, 1991:47): • To develop clearly defined objectives for each department; • To clarify scope in terms of resources and operations; • To provide the necessary support needed to execute set responsibilities.

Foster and Plowden (1996) are of the opinion that the focus on economy and efficiency surpassed the importance placed on effectiveness as is evident by the numerous reform initiatives of that time. Two initiatives – the Efficiency Strategy and Financial Management Initiative – were in direct contrast to their predecessors (described in the adjacent text box). The top down focus on policy had been replaced with a bottom up managerial approach in order to more effectively manage resources. Programme evaluation in the latter years of the 1980s was utilised to enhance resource management and reduce expenditure. Very little contribution was made in these years towards policy evaluation. The UK government’s transition from welfare state to regulatory state provides numerous examples of the increasing “watchdog” function it has taken on. Not only did the focus shift towards auditing and accounting practices as instruments for making and executing decisions, but regulatory activities in general experienced considerable growth. When considering regulatory activities within the UK government, it emerges that more funding is allocated to regulation activities (staff and public spending) than privatised  utilities: • The number of regulatory bodies in the core public sector and mixed public/private sector increased by 22% between 1976 and 1995. • In 1998, regulatory staff figures stood at 20 000 and running costs totalled more or less £1 billion at the top end (Hood, James, Jones, Scott & Travers (1998)). Towards the end of this somewhat bleak era in programme evaluation, a sliver of hope for the advancement of programme evaluation came through the formation of the Joint Management Unit (JMU) in 1985. This unit replaced the Financial Management Unit and had as its aspiration the “development of more systematic evaluation within British government” (Hogwood, 1987). The unit maintained the conviction that British government lacks knowledge in conducting evaluations. The unit therefore proposed that it is made compulsory that new policy be accompanied by evaluation. These mandatory requirements were agreed upon by ministers and introduced in 1985. The JMU also investigated current

41

Historical Development and Practice of Evaluation evaluation practices within government in an effort to develop good practice guidelines for Whitehall departments. The tight fiscal conditions continued under the Conservative leadership of John Major. A significant reform of this phase was the introduction of a series of fundamental expenditure reviews initiated by Treasury. These reviews occurred at departmental level and required an assessment of all programmes in determining “whether activities needed to be done at all, provided in another way, or continued at different levels of resources” (Gray & Jenkins, 2002:134). Another significant report of that time titled “Improving the Management of Government, the Next Steps” led to the emergence of quasi business principles in government. Already operating successfully in the health service sector at that stage, this approach separated the purchasing of service from the provision of services. Concurrently, government opened up public utilities for sale to stakeholders. This was regulated and sustained by way of the Citizen Charter. The six principles of the Charter are taken from the Pollitt report (1994:9): a. Setting, monitoring and publication of explicit standards b. Information for and openness to the service user c. Choice wherever practicable, plus regular and systematic consultation with users d. Courtesy and helpfulness e. Well-publicised and easy-to-use complaint procedures f. Value for money The Citizen Charter is viewed by some as an attempt by the then Prime Minister, John Major, to put his stamp on the advancement of the public sector. A preliminary analysis of the success of the charters by Pollitt present ambiguous findings: in certain service sectors the Charter was believed to have made a contribution, but in many respects the Charter was viewed as being conceptually too complex and lacking clarity on punitive measures. The emphasis throughout this decade remained on targets, measurement and accountability leading to active monitoring only being done by regulators such as National Audit Office and the Audit Committee. This time period however did not extend to support the field of programme evaluation any more than the previous era (Gray & Jenkins, 2002).

2.5.4 Phase 4: The Blair-Brown Labour Government Era: 1997-2010 A range of initiatives and the abolition of prior reform measures by the New Labour Party after 1997 indicated a movement towards a stronger evaluation focus. These included the  following: • Gordon Brown, the Labour Party’s Chancellor of the Exchequer, “broke free” from the Public Expenditure Survey and replaced it with three year allocation settlements, more commonly referred to as the Comprehensive Spending Review (CSR). Through the CSR, finances are awarded according to assessment of existing departmental activities indicating a clear attempt to introduce “evaluation-led management of resources” (Gray & Jenkins, 2002:137). • The Public Service Agreements (PSAs) introduced by Brown in 1998 instructed departments to link objectives to outputs achieved.

42

Chapter 2 • The new party also inherited a range of public utilities that have been privatised under the old regime. This enhanced the need for evaluation, as the success of these publicprivate partnerships had to be measured. • The Modernisation programme, introduced by the newly elected government in 1997, among other things aimed to “improve performance in meeting needs and providing services” (Stewart, 2003). The previous system of compulsory competitive tendering (CCT), which in essence enhanced the focus on resources and economic bias instead of service delivery, was replaced by a system where effectiveness, quality and best value for communities took the forefront. • The Local Government Act of 1999 provided guidance on how to strive towards continuous improvement. This guidance included i) a framework against which local authorities could practice best value principles; ii) requirements when conducting regular reviews; and, iii) content of annual performance plans (Stewart, 2003).

2.5.5 Phase 5: The Conservative-Liberal Democrat Era: 2010 to date The increasing global emphasis on a systematic evaluation of governmental policies, programmes and projects through evidence-based policy making and implementation processes has advanced strongly in the UK, which was instrumental in the development of programme evaluation as summarised above. Evaluation is now institutionalised at virtually all levels in the UK government, and driven by the respective auditing agencies.

2.6 Assessment of the Western origins of programme  evaluation2 2.6.1 Comparing the USA and UK experiences There are both similarities and differences between the United States of America and the United Kingdom when considering the emergence of programme evaluation in those systems. Similarities are found in: a. The drivers of programme evaluation: In both the UK and the USA, central government has been the primary driver of programme evaluation. This is particularly true in the case of the USA where federal government, over the various periods under discussion, took the lead in promulgating programme evaluation. In the UK, this is not so evident because for many years government operated in secrecy and was reluctant to become more transparent. b. The fiscal situation caused the scale to tip either in favour of programme evaluation or against it: For example, the booming 1960s and unlimited budgets in the USA led to an upsurge in programme evaluation studies. The purpose of evaluation at that time was mainly to gather information – whether on existing programmes or with the aim of informing future programmes. The mid-1970s in both countries was characterised by dismal fiscal situations, and this was reflected in the purpose of the evaluation studies conducted. Programme evaluation now had to inform resource allocation and 2 This section comprises an expansion and integration of material originally published in Mouton (2010), Rabie and Cloete (2009) and Rabie (2011).

43

Historical Development and Practice of Evaluation was used to justify certain policies and programmes. Overall, not many evaluation activities were undertaken at that stage in either country. c. The way in which the various political administrations’ agendas influenced the advancement of programme evaluation: In the USA, the Kennedy and Johnson Democratic administrations supported programme evaluation whereas Reagan and Nixon, with their more conservative Republican notions, focused on performance measurement in order to cut back on public expenditure. In the UK, the Thatcher regime was convinced that better resource management would improve service delivery and lead to a more efficient public sector. Under the leadership of Reagan and Nixon, and Thatcher and Major, programme evaluation stagnated. An upsurge was again experienced in the late 1990s. Gordon Brown’s modernisation programme and the implementation of the Government Performance and Results Act are the two main initiatives that sparked interest in programme evaluation towards the end of the 20th century. d. In both countries under discussion, auditing institutions were initially tasked with the evaluation function. The GAO in the American legislative branch and The National Audit Commission in the UK for example were mainly staffed by accountants. The GAO was specifically established to balance the power of the executive branch that up to then was the sole undertaker of evaluation studies. e. In both the USA and the UK, the onset of the New Public Administration movement in the late 1980s and 1990s has affected the importance attached to programme evaluation. During this phase, programme evaluation evolved into performance monitoring and evaluation which potentially has a much wider focus and scope. The main differences between the USA and the UK in terms of programme evaluation pertain to: a. The extent of uptake: The strong social science tradition that had existed in the USA for many years strongly supported programme evaluation’s advancement. Another enabler was the strong backing received from federal government through, firstly, the investment made into programme evaluation during the 1960s and, secondly, the issuing of certain legislation to formalise programme evaluation’s place in the policymaking process. It can be concluded that this initial strong support in terms of existing social science expertise and avid support at federal government level provided a strong base for the evaluation discipline. b. The contribution made by the USA in professionalising the field: In terms of professionalising the field, the USA unequivocally took the lead. The list of evidence is extensive: the USA introduced the development of formal programme evaluation training programmes; the country established many evaluation journals; the majority of evaluation theorists and the main paradigms originated from this country; and the American Evaluation Association and its predecessors were the first of its kind and, over the decades, continues to support, educate and stimulate its members. Programme Evaluation was introduced in the United States during the Great Society era in the 1960s, when major investments were made in social reform to combat the negative

44

Chapter 2 effects of World War II. This, and the commitment of successive American governments to programme evaluation, created a stimulating and enabling environment in which the field could flourish and establish itself. These increasing trends in evaluation practices in the different policy sectors observed in the USA and Britain during the middle and later parts of the 20th century, had their origins in paradigm changes in two related scholarly fields, namely the development of the policy sciences focus, mainly in Political Science, that proceeded parallel to new behavioural thinking, approaches and methods in social science research, especially in Psychology and Sociology. In the policy field, the emphasis changed from ideologically determined, opinion-driven policy choices to what is now called more rational “evidence-based” policy-making approaches. Parallel to this development, advances in behavioural social research theories and methods and the application thereof to societal development problems in the mid-1900s brought greater rigour and sophistication to the field of evaluation research. A brief synopsis of these two paradigm changes that led to the emergence of systematic evaluation and the development of the evaluation profession, is provided below.

2.6.2 Evidence-based policy making The concept “policy sciences” is traced back to Harold D. Lasswell (1943): “Over several decades, Lasswell and his collaborators … reviewed the intellectual tools needed to support problemoriented, contextual, and multi-method inquiry in the service of human dignity for all. In response to the requirements of practice, and with the waning of positivism in the natural and social sciences, other parts of the policy movement are gradually converging on the problem-oriented, contextual, and multi-method outlook of the policy sciences. Thus the policy sciences set the standard for the rest of the policy movement, and will continue to do so for some time” (Brunner, 1997:191). Public policy research in the 20th century was characterised by a move away from “sterile academic parlour games” to become problem and solution oriented, focused on the “real world”. As such, “it makes clear its commitment to particular values, thus avoiding the value neutrality stance that social science ought to be totally objective” (Ascher, 1986:365) and emphasising that the search for solutions to problems should not be lost in “scientific analysis” (Ascher, 1986:370). Policy analysis thus became more “than simply addressing big theoretical questions” and encompassed the difficult task of “clarifying goals, trends, conditions, projections and alternatives” within the social environment (Ascher, 1986:371). The devastating negative impact of the Second World War on the rest of the world was the main driver of this paradigm change to a more scientific approach to policy making and governance. In the late 1940s and early 1950s, Lasswell and a number of his colleagues in Political Science in the USA started to act on their concerns about the failure of their discipline to predict the potential outbreak of the war and to minimise the tragic suffering and loss of life and property that characterised both the First and Second World Wars. They were strongly influenced by the establishment of the new United Nations organisation which was aimed at bringing order and more rationality into international relations. They decided to develop a new theoretical approach to policy making based on more explicit, rational, scientific foundations in order to try and prevent such tragedies from occurring again. The start

45

Historical Development and Practice of Evaluation of a shift to multi-, inter- and eventually transdisciplinary approaches to policy issues is already evident in these early developments. The behavioural research revolution that led inter alia to the findings of how ineffective prevailing government programmes proved to be, based on the isolated sectoral programme evaluations in the USA referred to above, also reinforced the resolves of Laswell and his colleagues to develop new approaches and methods to improve the outcomes of domestic government policies at the same time. The miserable failure of the prohibition policies in the USA between the two World Wars, and the rise of the American Mafia which was linked to this aborted policy experiment, contributed in no insignificant way to the new thinking about more rational, scientific approaches to policy making and implementation. An interesting example of this new approach is the popular definition by Easton in 1951 describing politics as “the authoritative allocation of values for a society, and [stating] that politics essentially is making moral decisions about what is good and what is bad” (in Vestman & Conner, 2006:226). This definition ties politics and its resulting policies close to evaluation, with evaluation regarded as a process of systematic, rigorous data gathering to make more informed value judgements. Segone has recently reinforced this early link between policy making and evaluation further with his description of “evidence-based policy practices”. He distinguishes evidence-based policy practices from what he calls traditional opinion-based policy practice, “which relies heavily on either the selective use of evidence (e.g. on single surveys irrespective of quality) or on the untested views of individuals or groups, often inspired by ideological standpoints, prejudices, or speculative conjecture” (Segone, 2009:17; Davies, Newcomer & Soydan, 2006:175). Half a century later, the opinion-based policy-making approach is now eventually and increasingly being replaced by “a more rigorous approach that gathers, critically appraises, and uses high-quality research evidence to inform policy-making and professional practice” (Gray in Davies, Newcomer & Soydan, 2006:175), while “(e)vidence-based policy helps people make well informed decisions about policies, programmes and projects by putting the best available evidence from research at the heart of policy development and implementation” (Davies, 2008:3). The quest for evidence-based policy making should not be pure technical analysis, but should allow for divergence and various detailed policy options. “This means that policy making is not just a matter of ‘what works’, but what works at what cost and with what outcomes” (Segone, 2008a:34-35). Evidence-based policy making contributes to policy making in achieving recognition of a policy issue, informing the design and choice of the policy, forecasting the future, monitoring the policy implementation and evaluating impact (Segone, 2008b:7). Chelimsky concurs that evaluation information and evidence inform policy formation, policy execution and accountability in public decision making (Chelimsky in Vestman & Conner, 2006:229). “Evidence-based government means integrating experience, expertise and judgement etc. with the best available external evidence from systematic research” (Davies, 2008:6). Evidence-based policy decision making may, at times, compete with the narrower and normatively-based interests of lobby groups, pressure groups, think tanks, opinion leaders and the media, and

46

Chapter 2 also with pragmatic matters such as parliamentary terms, time tables and procedures with, at times, limited capacities and unanticipated contingencies to influence policies (Segone, 2008a:34-35; Davies, 2008:20). In this regard, the distinction may be drawn between “policy makers’ evidence”, which constitutes any information that seems reasonable and is communicated clearly and in good time, and researchers’ scientific, neutral, proven, theoretical information (Davies, 2008:19). Segone (2009:19) thus advocates that good evidence is technically sound, of good quality and trustworthy, as well as relevant and timely, as policy makers may be forced to use unreliable information if that is all that is available. To encourage policy makers to take ownership of policy evidence and to use the information, statisticians, evaluators and researchers who produce evidence need to respond to demands from policy makers, package the information in a usable format, effectively disseminate results, and provide pull-and-push incentives to encourage the utilisation of evidence in policy making (Segone, 2009:21-22). Segone attributes the emergent shift from opinion-based to evidence-influenced approaches to a movement towards more transparent governance and better technical capacity to produce quality, trustworthy evidence (Segone, 2009:18). Other forces that are strengthening, supporting and driving the monitoring and evaluation of public sector policies and programmes are international initiatives like the Millennium Development Goals, European or African Union Membership, Transparency International and donor funding that emphasise the need for M&E to improve internal fiscal efficiency and accountability. It has proven to be an effective tool to reduce past failure rates and to concentrate resources on the most pressing problems and programmes that demonstrate effectiveness and efficiency (see Kusek & Rist, 2004:3-11; Valadez & Bamberger, 1994:5-7; Rossi & Freeman, 2004:15; Boyle & Lemaire, 1999:3&181). These internal pressures for programme improvement are complemented by external pressures for decentralisation, deregulation, commercialisation and privatisation, and by changes in government size and resources that have proven successful in the past to improve results (see Kusek & Rist, 2004:3-11). Another driving force that explains the 50 year lag time between the recognition of the views of the behaviourists in the early 1950s and the delayed take-off of the evidence-based paradigm in the early 21st century, is the explosion of the Information and Communication Technology (ICT) revolution only in the early 1980s when IBM, Apple and Microsoft made the processing of large datasets of information a practical, cost-effective and affordable reality. This means that the hardware technology that did what the behaviourists wanted to do in the 1950s only became available and feasible in the early 1990s, while appropriate software Decision Support Systems (DSS) programmes only started to emerge at the turn of the century and is now only consolidating and being integrated into general management information systems.

2.6.3 Advances in social research designs and methods Growth and refinement of improved social science theories and models in the first half of the 20th century, and the more successful application thereof to solve

47

Historical Development and Practice of Evaluation problems in education, political science and psychology, have also contributed to the modern era of evaluation. The apparent success of early education, social health and psychology researchers to solve social problems in their natural setting, gave hope that social science research could mimic the success of physical science research in solving technological problems in the social arena (Shadish & Luellen in Mathison, 2005:184). Although programme evaluation studies within the education and health fields had been undertaken since the mid-18th century, programme evaluation only became an accepted social research enterprise in the 1960s, with the attainment of the necessary levels of sophistication in social science methodology, especially in terms of measurement, sampling and statistics facilitated by the new ICTs towards the start of the new 21st century (Mouton, 2007:492). The Collins Paperback English Dictionary defines research as a “systematic investigation to establish facts or collect information on a subject”, while Vaishnavi and Kuechler define research as “an activity that contributes to the understanding of a phenomenon” (in Manson, 2006:156). Wikipedia states that “basic research (also called fundamental or pure research) has as its primary objective the advancement of knowledge and the theoretical understanding of the relations among variables”, while “applied research is done to solve specific, practical questions; its primary aim is not to gain knowledge for its own sake” (in Manson, 2006:156). A strong argument is presented for defining evaluation as applied social research that draws on the methodology of social sciences to provide answers to real-life evaluation questions. For example, Bickman defines evaluation research as the assessment of the strengths and weaknesses of an intervention, identifying ways to improve them, and determining whether desired outcomes are achieved. It may be descriptive, formative, process-, impact-, summative- or outcome- oriented (Bickman in Mathison, 2005:141). Rossi and Freeman define evaluation research as “the systematic application of social research procedures for assessing the conceptualisation, design, implementation and utility of social intervention programmes” (in Mouton, 2007:491). Weiss describes evaluation studies as finding out about the success of interventions in the world of practice where people are affected with the aim to improve the social, economic and cultural conditions of society (Weiss, 2004:154). Scriven, however, states that evaluation research is much more than just applied social research. While evaluators need a repertoire of empirical research skills, they also require additional evaluative skills that enable them to search for side effects (that may dramatically change the final evaluation conclusion) and determine relevant technical, legal and scientific values as well as synthesis skills to integrate evaluative and factual information (Scriven, 2003:7). Evaluation research differs from other social research in that evaluation studies imply a concrete judgement regarding the phenomena in question to fulfil its purpose, whereas social research may have inconclusive findings and refrain from a final judgement. This view of Scriven eventually resulted in his identification of evaluation as a distinct transdisciplinary profession (1991). The aim of social research is “limited exclusively to producing knowledge but not to producing value judgments or evaluative conclusions” (Caro in Vestman & Connor, 2008:47). Social

48

Chapter 2 research bases conclusions on factual, proven and observed results only and strives to be value free. Evaluation research, however, is value laden and establishes standards and values that, together with factual results, produce evaluative conclusions (Scriven, 2003:7). Evaluation research requires more than the “accumulation and summarising of relevant data; [it] requires a conclusion about the merit or net benefits through the verification of values and standards” (Rossi & Freeman, 2004:17). Lastly, while research emphasises the production of knowledge, leaving the application of the knowledge to natural processes of dissemination, evaluation starts out with the intended use in mind (Weiss in Shadish, Cook & Leviton,  1991:182). Evaluative conclusions are thus blends of fact and value claims, and while the employment of rigorous methods to come to these conclusions is part of the process, it also entails discovering the right criteria and standards for comparison (House, 2004:219). Evaluation has moved from Campbell’s original methodological focus to embrace concepts such as utilisation, values, context, change, learning, strategy, politics and organisational dynamics. In proposing a system and indicators for measuring the results of local economic development in local government, this wider perspective considering not only methodology, but also the values and context that lead to alternative perspectives of what is regarded as success, is critical. Development objectives do not present a single definition of success and therefore evaluation of these policies and programmes demand the consideration of alternative perspectives and questions, and the application of different and sometimes new (transdisciplinary) methodologies to answer questions about performance and success. The transdisciplinary nature of modern systematic evaluation is contained in the sometimes unique and novel evaluation designs and methodologies that supersede traditional disciplinary boundaries and are developed specifically to answer specific evaluation questions. This is inherent in the fast emerging dominance of the socalled “mixed methods” approach to evaluation research which implies not only a mix of various qualitative and quantitative methods and techniques, but it can also imply an integration of qualitative and quantitative evaluation designs, and at an even higher level of abstraction also a new merging of social and natural sciences paradigms (Greene, 2010). This issue will be addressed again in Chapters 4 and 5.

2.7 Historical development of evaluation in other regions 2.7.1 Europe Picciotto, a former President of the European Evaluation Society (2010:2), finds the roots of systematic evaluation practices in the scientific method ideas of Descartes, Newton and the European Age of Reason. He agrees with mainstream evaluation historians that the development of new empirical behaviourist academic approaches and methodologies and the development of the policy sciences influenced the systematic assessment of development and reconstruction programmes after the Second World War. The continental European scene was also strongly influenced by the American and British dominance in the evaluation field (Morra-Imas & Rist, 2009:22). Picciotto (2010) relates how the development of the information society, that started in the military work of Rand

49

Historical Development and Practice of Evaluation Xerox and IBM, created the capacity to process large data sets and to draw faster and more accurate conclusions about the costs and benefits of social programmes than had been the case in the past. Although systematic evaluations have slowed down in the New Public Management practices in the Reagan and Thatcher eras, where the focus was on different implementation strategies rather than on the measurement of outcomes, this has now turned around, and the contested consequences of NPM practices are now under intense evaluation scrutiny. Stame (2013:355) expands on this theme and relates how modern qualitative social science evaluation foci like illuminative evaluation developed in the early 1970s in Europe in response to the dominant American quantitative experimental approaches in this regard. She regards realist evaluation as the European version of theorybased evaluation (2013:361, 367).

2.7.2 Australia and New Zealand Formal evaluation started in Australia in 1918-1921 as a result of a Commission of Inquiry into public spending (Rogers & Davidson, 2013:372). Modern programme evaluation started haphazardly in Australia and New Zealand during the middle 1960s and coalesced into systematic systems in the early 1970s (Sharp, 2003:14). The evaluation organisations and practitioners in the USA were the main driving forces behind this development (Sharp, 2003:26; Rogers and Davidson, 2013:371), as a result of the close historical and political relationships among these countries. Sharp (2003:8) refers to the following characteristics of evaluation in Australia and New Zealand as identified by Rogers: • “its emphasis on the development of capacity within organizations to conduct ongoing evaluation of ongoing programs rather than on professional discrete external evaluations; • the widespread and long standing development and use of performance measurement systems for programs; • its generally eclectic approach to evaluation methodology; • particular attention paid to the evaluation of programs for and by indigenous peoples.” Rogers and Davidson (2013:372) also provide more in-depth assessments of the specific routes that evaluation took in these countries championed by specific individuals, as has also been the case elsewhere. The development of systematic programme evaluation and cultural values in evaluation were especially stimulated in New Zealand as spin-offs of the application of respectively NPM policies and the cultural diversity in the form of western relationships with the indigenous Maori population in that country. The indigenous Australian Aborigines have also influenced the development of the field in Australia.

2.7.3 The East, Asia, Latin America and Africa It is significant that the latest edition of Alkin’s seminal work on evaluation history and development which appeared in a second edition in 2013, does not have any data on evaluation in Eastern and Asian countries, Latin America or Africa, with the exception of a brief summary of the development of evaluation in Australia and New Zealand summarised above (Rogers & Davidson, 2013:372). This confirms the dominant role of modern Western practices in this field which is aptly summarised in the Bellagio Centre

50

Chapter 2 Report (2012:12): “Evaluation theory and practice have largely evolved from Western worldviews, perspectives, values and experiences”. However, this information gap is filled to some extent by Rugh and Segone (2013) which contains summaries of the history and development of systematic evaluation in these regions (see also Furubo, Rist & Sandahl, 2002). In all cases evaluation in these regions has been stimulated by the Western experiences and practices as well as the work of international organisations which are strongly influenced by the northern hemisphere western countries. The development of evaluation in Africa is addressed below in more detail below in section 2.8.

2.7.4 International Multi-lateral and Bi-lateral Organisations At the international level, the establishment of the United Nations (UN), the World Bank and IMF after the Second World War to promote and support development initiatives in developing countries had ripple effects across the world. Over time the UN spawned a range of highly specialised autonomous international agencies like the UNDP, UNICEF, UNESCO, UNCHS, etc. who have in all cases established specialised evaluation units to assess their performance, as well as those of their clients/beneficiaries/recipients. MorraImas and Rist (2009) expand on this by also recognising the evaluation activities undertaken by the regional development banks and international regional development support agencies, for example the European Bank for Reconstruction and Development (EBRD, The African, Asian and Latin American Development Banks, as well as numerous other national development agencies like the UK’s Department for International Development (DfID), the United States Agency for International Development (USAID), the Canadian International Development Research Centre (IDRC) and the Canadian International Development Agency (CIDA), the Swedish International Development Agency (SIDA), the Swiss Agency for Development (SDC) and a range of other similar institutions. All the major international organisations have over time developed institutionalised evaluation capabilities. The important roles played by the USA and European countries in the World Bank, have influenced the activities of these specialised evaluation units directly. Scholars that have contributed to this specific area of evaluation activities include Grasso, Wasty and Weaving (2003) who provide a detailed summary, analysis and assessment of the activities of the IEG in the World Bank since its inception.

2.7.5 International Development Support Agencies The increasing emphasis by western governments on regular, systematic evaluation of their activities also had a spill-over effect on NGOs operating within those contexts. Various international and national NGOs and philanthropic foundations such as the US Rockefeller, Ford, Kellogg, MacArthur foundations, the German Eberhard, Adenhauer and Neumann were early adopters of different evaluation paradigms and practices. These foundations introduced evaluation practices to protect their sometimes major financial investments in diverse development programmes and projects across the world. Recently, new foundations like the Bill and Melinda Gates Foundation have swelled the ranks of the philanthropic involvement in global development support in the developing world (see also Picciotto, 2010). Sasaki (2006) provides a broader perspective through his assessment of the evaluation history and practice of international aid programmes.

51

Historical Development and Practice of Evaluation

2.7.6 International Evaluation Networks Four highly influential voluntary international evaluation networks that cut across different organisations and sectors are UNEG, the ECG, OECD-DAC, as well as NONIE. These bodies and networks expanded the practice of development evaluation worldwide and in many cases developed their own customised approaches and communities of evaluation practice. However, they all still largely operate within the national and international evaluation paradigms kick-started in the USA and the UK.

2.8

Historical Development of Evaluation in Africa

The historical development of evaluation practices in Africa is summarised and assessed in this section which starts with Africa in general and then proceeds to summarise the history of evaluation in South Africa.

2.8.1 Introduction: Evaluation, colonial rule and structural adjustment The history of evaluation in Africa is incomplete if one does not recognise the role of African researchers, policy analysts and evaluators in resisting colonial rule and policies. They played a critical role in providing alternative views and evaluative opinions about the impact of Western powers on African developmental efforts, especially concerning the history of the evaluation of “structural adjustment” policies and experiences. Unfortunately, limited research exists about this topic and the history of evaluation in this respect is not well recorded or generally available. Oosthuizen (1996:66) recorded the role of African policy support organisations at the time, and observes that some particular historical influences impacted on the initiation of policy evaluation work done by policy type organisations. She recalls that during the period from 1957 through to the early 1980s, the African continent was characterised by a large number of African countries gaining their independence from colonial rule, and that this was regarded as a first triumph for Pan Africanism. This development resulted in two further important spill-over consequences. The first consequence was that the future consolidation of the newly independent African countries was complicated by weak economies, financial dependence on colonial countries and severe resource constraints in terms of capital and skills, so that the “independence” for these countries was in name only. The next phase of Pan Africanism was a shift in emphasis from “independence” to “unity”. Oosthuizen (1996:66) notes that: “... (t)here were broadly two schools of thought on the question of African Unity. The first was initially less occupied with unity than with cooperation. The second group promoted African Unity above all. The year 1963 saw the establishment of the Organisation of African Unity (OAU) and although the manifest aim was a compromise between the two schools of thought, in actual fact it was a victory for the more moderate group. For the new African regimes their independence was more important than the idea of a wider union. These countries put their own interests first, to the detriment of the interests of the continent as a whole. Consequently, where the OAU, the Economic Commission for Africa (ECA) and the United Nations (UN) structures

52

Chapter 2 have a more intergovernmental focus and accommodated the more moderate Pan Africanist States, the radical group drifted towards alternative structures to express their viewpoints”. The second major consequence of independence concerns the transfer of power and the fact that the existing state structures were used to develop new government administrations. Since all citizens were expected to be loyal to the new political regime, opponents had to articulate their demands differently from those used during colonial rule. Those who wished to express their views beyond token gains soon had to learn to cope with new political systems. In this process many intellectuals drifted towards alternative structures in order to find a basis from which independent evaluative views could be expressed. Another important development for evaluation in Africa was the adoption of the Lagos Plan of Action at the first Extraordinary Economic Summit in Lagos, Nigeria in April 1980. The Lagos Plan of Action was a reaction to structural adjustment programmes imposed on African countries from the early eighties onward. The main argument was that Africa and the different regions in Africa should develop their own policy capacities parallel to the African Capacity Building Initiative (ACBI) of the World Bank and the UNDP (see African Capacity Building Foundation, 1992 and World Bank, 1991). This development emphasised regional economic independence and highlighted the need for an improvement in regional policy capacity. It resulted in the creation of the Preferential Trade Area for Eastern and Southern Africa (PTA) and the Southern African Development Co-ordination Conference (SADCC). The Council for the Development of Social Sciences research in Africa (CODESRIA), another alternative policy capacity development agency established at the time, stated that: “... (i)ndigenisation is not the notion of African leaders to create their own idiosyncratic ‘indigenous’ ideologies and then to insist that research efforts be harnessed to give respectability and coherence” (CODESRIA, 1993:19). These developments included an increasing focus on the need for more resources for local researchers to do independent policy evaluation and research. Also, the injunction, “know thyself” which gives primacy to knowledge of Africa, was regarded as an important consideration. This could indeed be regarded as the early roots of self-assessment and peer review that later translated into the African Peer Review Mechanism (APRM) as a form of African-driven evaluation. Other organisations that developed so-called independent policy capacities that engaged in policy evaluation during the 1980s in response to structural adjustment included the Organisation for African Unity and the Economic Commission for Africa (ECA). Regional organisations such as SADCC and the PTA, as well as specific research organisations also conducted independent policy analysis and evaluation. These included CODESRIA, established in 1972 in West Africa, the Southern Africa Political Economy Series Trust (SAPES) in Harare, the African Economic Research Consortium (AERC) that was established in Nairobi in 1988, the Organisation for Social Science Research in Eastern and Southern Africa (OSSREA) in 1980 and the African Institute for Economic Development and Planning (IDEP) (Oosthuizen, 1996:69-75; also see the Pan African Development Information system 1991; Sawyer & Hyden, 1993; Thomas, 1995; and Turok, 1991).

53

Historical Development and Practice of Evaluation

2.8.2 The start of systematic evaluation in Africa3 The development of systematic evaluation in Africa started late, largely as a result of the same factors identified above. A network of evaluation practitioners was created by UNICEF in Nairobi, Kenya in 1977 to enhance capacity-building for UNICEF and other evaluations in East Africa. This initiative therefore attempted to create indigenous African evaluation capacity. The first of these developments took place in countries like the Comoros, Eritrea, Ethiopia, Madagascar, Niger, Nigeria, Rwanda and Zimbabwe. In almost every case, the first meeting was initiated by UNICEF. In March 1987, a Development Assistance Committee (DAC) and Organisation for Economic Cooperation and Development (OECD) seminar brought together donors and beneficiaries of development programmes to discuss objectives, means and experiences in evaluation. The outcome was an awareness of the need to strengthen evaluation capacities of developing countries. The OECD published the summary of the discussions in 1988 in its report titled Evaluation in Developing Countries: A Step towards Dialogue. This initiative called for a series of seminars to be held at regional level (i.e. Africa, Asia, Latin America), to intensify dialogue, discuss problems unique to each region, and recommend concrete and specific actions with a view to strengthening the evaluation capacities of developing countries. Other prominent facilitators for evaluation capacity-building on the African continent during these early years, has been the African Development Bank and World Bank Operations Evaluation Departments. Two initial conferences hosted respectively by these two multilaterals in 1998 and 2000 raised further awareness around evaluation capacity development in Africa. The first seminar on evaluation in Africa, which was presented jointly by the African Development Bank (ADB) and Development Assistance Committee (DAC), was held in Abidjan, Cote d’Ivoire, 2-4 May 1990. Its objectives included the clarification of evaluation needs as perceived by African countries themselves and the exploration of ways and means of strengthening self-evaluation capacities. A follow-up seminar was carried out in 1998 in Abidjan, with the following objectives in mind: • To provide an overview of the status of evaluation capacity in Africa in the context of public sector reform and public expenditure management; • To share lessons of experience about evaluation capacity development concepts, constraints, and approaches in Africa; • To identify strategies and resources for building M&E supply and demand in African countries; and • To create country networks for follow-on work. The discussions of the 1998 seminar underlined important directions in African administration and aid agencies. Firstly, it identified the global trend towards a more accountable, responsive and efficient government. The evaluation paradigm was therefore shifting to be the responsibility of beneficiaries of funds and programmes. Secondly, the role of evaluation within individual development assistance agencies was gaining in clarity and effectiveness. The process of evaluation had been improving as more became known about 3 This section largely comprises an edited extract from Spring and Patel (n.d.).

54

Chapter 2 evaluation. Thirdly, the outlook for development partnerships across the development community had become more hopeful, given the need for mobilisation of resources. The product of evaluation – improved programmes – will eventually be emulated in the public sector, as evaluators contribute to the formulation of public sector reforms and help in the development of more efficient and transparent public expenditure (budget management) systems. The Abidjan seminars addressed demand for evaluation, as the participants were high-ranking government officials who would be able to directly influence evaluation policy and major donors interested in evaluation or accountability issues. In September 1999, the African Evaluation Association (AfrEA) was established at a groundbreaking inaugural Pan-African conference of evaluators held in Nairobi, Kenya, with 300 participants from 26 African countries. It was largely the result of efforts by Mahesh Patel, of UNICEF ESARO, who was elected as the first President of the new organisation. The Kenyan and other African country evaluation societies also played very prominent roles in this regard, financially supported by UNICEF. The theme of this conference was Building Evaluation Capacity in Africa. The main aims were to: • share information and build evaluation capacity, • promote the formation of national evaluation associations, • promote the knowledge and use of an African adaptation of the Programme Evaluation Standards, • form an Africa-wide association, promoting evaluation both as a discipline and profession, and • create and disseminate a database of evaluators. The organisation constituted an important capacity building and networking opportunities for everyone interested in systematic M&E practices on the African continent. It was the first attempt with an open invitation across sectors and institutions, including all countries and numerous policy makers. Six AfrEA conferences have been convened to date. They have been held purposefully in different regions of the continent in order to stimulate interest in those regions. The current status of AfrEA and of other national M&E associations on the continent, including that of the South African Monitoring and Evaluation Association (SAMEA), is addressed in Chapter 9. The two approaches of sensitising policy makers (the Abidjan approach, based on the World Bank framework), and of spreading general awareness and building evaluation capacity (the Nairobi/UNICEF approach) were synergistic in stimulating a home-grown African demand for evaluation. The current state of formal governmental evaluation systems in Africa is also assessed in Chapter 10.

2.8.3 The establishment of evaluation as a profession in Africa During the 1980s and the early 1990s, a diverse group of emerging evaluators in the government, NGOs and the private sector (mainly from consulting firms), showed a steady increase in number in response to increasing evaluations by donors and multi-lateral institutions. Since the early 2000s, the momentum built up since the AfrEA conference and the increasing formation of evaluation networks increased the profile of evaluation as

55

Historical Development and Practice of Evaluation a profession on the continent. According to Dr Zenda Ofir, a former President of AfrEA and also the main driving force behind the establishment of South African Monitoring and Evaluation Association SAMEA, the inaugural AfrEA conference in 1999 represented a watershed moment for evaluation in Africa when more than 300 representatives from 35 countries converged in Nairobi to establish a continental association as platform for interaction between all Africans interested in evaluation. This conference was the genesis of AfrEA, whose activities over the past 15 years are seen by many as having been pivotal in the emergence of evaluation as a profession in Africa (Ofir, 2014; see also Segone & Ocampo, 2006; Traore & Wally, n.d.). Until the 1980s, evaluation on the continent was largely driven by international actors – aid agencies, large NGOs and evaluators. It also manifested in the work of African community activists, political analysts, and later through policy analysis, research and policy evaluation. Since there was little indigenous evaluation capacity at the time as it had little prominence as a field of work or profession, subject specialists fulfilled this role (Ofir, 2014). Over the past 15 years the practice and profession of evaluation developed and expanded exponentially on the continent and elsewhere in the developing world. In 1999 there were only six national African evaluation associations. By 2002 this number had grown to 14, stimulated and supported by the new continent-wide community of evaluators. At the end of 2013 there were 143 verified evaluation associations and networks worldwide (now called VOPEs, see http://www.ioce.net/en/index.php), of which 26 were in Africa. This period also saw a significant increase in the interest of national governments in establishing M&E systems. The establishment of the African Evaluation Association (AfrEA) was the consequence of the vision and energy of Mahesh Patel, its first President. It quickly flourished during these first few challenging years of its existence. “We owe a debt of gratitude to Mahesh in particular, and the pioneers who supported him during that period. It is also important to note that AfrEA brought in many international experts for training, supported national evaluation association growth, established an Africa-wide community both virtually through a listserv and conferences. AfrEA also gave African evaluators a formal voice on international platforms where evaluation was being shaped among others in organisations such as the International Organisation for Cooperation in Evaluation (IOCE), the High Level Meetings on Development Effectiveness, in arguing for a broader set of designs for impact evaluation during the formative stages of NONIE and 3ie, etc.”  (Ofir, 2014). A South African evaluation community also emerged during this time, directly as a result of the establishment of AfrEA. Although there were some evaluation specialists in South Africa, few regarded it as a full-time occupation except in a few specialised units such as at the Development Bank of Southern Africa (DBSA) where a dedicated M&E unit was established in 1996, while the first M&E unit in the South African government was established by Dr Indran Naidoo, then working in the Department of Land Affairs. After attending the first AfrEA conference, Ofir established Evalnet, one of the first consulting companies in South Africa that explicitly specialised in evaluation. In 2002 she invited Michael Quinn Patton to South Africa. At one of his courses a decision was taken to establish the South African Evaluation Network (SAENet), also led by Ofir as an informal network with members connected via Listserv.

56

Chapter 2 At the second AfrEA conference in Nairobi in 2002 she was nominated as the second AfrEA President, and as a result co-coordinated in 2004 with Dr Indran Naidoo, then Chief Director responsible for evaluation in the South African Public Service Commission, the Third AfrEA Conference in Cape Town. This was the first time that AfrEA partnered with a national government. More than 550 participants attended, with 87% coming from 36 African and the rest from 20 other countries, including high level government officials from several African countries. It helped give evaluation in SA a high profile, and further sensitised the SA government as to the potential and possibilities of evaluation. At this conference a process was also initiated to formalise SAENet as the South African M&E Association, SAMEA. The historical development of evaluation in South Africa is covered in more detail below, while more details of SAMEA and selected other national evaluation societies in Africa are summarised and assessed in Chapter 10. Ofir (2014) describes the evolution of evaluation into a profession in Africa as “a wave that gathered momentum during the 1990s and the 2000s”, but that still needs concerted, strategic efforts to develop it further if it is to fulfil its promise as a profession that can and should help accelerate the development of the continent. Its past and future growth should be considered against the background of the “colonialisation of evaluation”, where for decades foreign teams flew into the continent to evaluate African performance against measures and through processes often not understood or owned in Africa. Although much has been, and is still being learnt from international agencies and from the many committed international evaluators who have had African interests at heart, the practice and profession in Africa are now increasingly being shaped by local evaluators, and African evaluators are present in increasing numbers at international conferences. However, their work still tends to be less visible than desired, in part because only limited research on evaluation is being done at academic centres and by evaluators across the continent. Ofir (2014) emphasises that “… more innovative, and especially, visible scholarship and thought leadership in theory and practice from Africa is needed to push the frontiers of evaluation in support of our own as well as global development”. This is now, inter alia, addressed by the publication of AfrEA’s new mouthpiece, the African Evaluation Journal (http://www.aejonline.org/index.php/aej). The focus of the Bellagio Report on the African Thought Leaders Forum on Evaluation and Development: Expanding Leadership in Africa by the Bellagio Centre (2012) provides a very well-articulated view of the concept of evaluation that is driven by and rooted in Africa. This meeting was the result of a special initiative taken at the 2007 AfrEA conference in Niamey, Niger, where a day-long special session with support from NORAD led to a formal statement encouraging Africa to “Make Evaluation our Own”, later transformed by AfrEA into a “Made in Africa” strategy for evaluation. The stream was designed to bring African and other international experiences in evaluation and in development evaluation to help stimulate the debate on M&E. The following key issues were identified that provides important insight into the intent of African-driven evaluation: • “Currently much of the evaluation practice in Africa is based on external values and contexts, is donor driven and the accountability mechanisms tend to be directed towards recipients of aid rather than both recipients and the providers of aim;

57

Historical Development and Practice of Evaluation • For evaluation to have a greater contribution to development in Africa it needs to address challenges including those related to country ownership; the macro-micro disconnect; attribution; ethics and values; and power-relations; • A variety of methods and approaches are available and valuable to contributing to frame our questions and methods of collecting evidence. However, we first need to re-examine our own preconceived assumptions; underpinning values, paradigms (e.g. transformative v/s pragmatic); what is acknowledged as being evidence; and by whom, before we can select any particular methodology/approach” (AfrEA Special Stream Statement, 2007). The Bellagio Report encourages evaluators to explore what “Africa driven and Africa rooted evaluation” mean to them. It has been suggested that the African evaluation community should in the next phase of its evolution focus on developing substantiated theories and practices that illuminate the question “if evaluation had originated in Africa, what would it be like?” (Ofir, 2014). It is clear that a need exists to further explore what “indigenous evaluation” means, and its origins, as is already being done elsewhere in the world. In the late 1990s, increasing concern started to consolidate among African participants in these processes about the nature and impacts of the structural adjustment programmes of the World Bank and the IMF, as well as about the Western-dominated evaluation paradigms underlying the evaluations undertaken by non-African agencies and individuals in Africa. In a bibliographic review of the evaluations undertaken in Africa, Spring and Patel (n.d.), found that the majority were found to have been requested by donors and international agencies. The majority of the first authors are not African. Of the original 133 articles that were reviewed, for example, three quarters had a first author with a Western name, 15% were clearly African, and it was not clear in 12% of the cases. African author participation was acknowledged as second or third author in 12% of the total. There is some room for confusion as many of the authors and reviewers are African, but with names of European or Asian origin. While the authors are mostly non-African, the reviewers, however, are nearly all African, by conscious design of the authors (Spring & Patel, n.d.).

2.8.4 Current Status of Evaluation in Africa Evaluations in Africa are still largely commissioned by non-African stakeholders who mostly comprise international donor or development agencies who run or fund development programmes on the continent. This is still a sensitive issue for many African evaluators, because perceptions have emerged in circles both in Africa and outside the continent that African evaluators have to improve their international competitiveness compared to their northern hemisphere counterparts, because the profession in Africa is relatively new and there is much room for improvement. There is a major problem with the visibility of the profiles of African evaluators since there is not enough time to write about it for public consumption, or enough resources for large numbers of evaluators to travel to international conferences and other international events. Also there just are not enough evaluation scholars. However, this situation is changing fast and African evaluators are becoming increasingly competitive on an international level.

58

Chapter 2 The Bellagio Report (2012) noted that the lack or low profile of “thought leadership”in evaluation in Africa has to be addressed: “Considering development contexts, frameworks and trends, and their implications for the evaluation profession provides a starting point for such thought leadership. Influential evaluation findings lead to new development approaches. As development strategies evolve, so do evaluation approaches. The African evaluation profession therefore occasionally needs to take stock of how the development context is influencing – or should influence – the direction of their theory and practice … Participants discussed the development-evaluation interface and its implications for evaluation in Africa over the next decade, engaging with: • The unfolding context for development and evaluation; • The core belief in the value of Africa rooted evaluation for development; • First steps towards a framework for Africa rooted evaluation; • The notion of Africa driven evaluation for development; and • Potential strategies for action, change and influence”. The view expressed by the group who met in Bellagio is that this debate is not yet prominent or visible enough in the intellectual sphere in Africa. Chilisa and Malungu (2012) undertook a seminal exercise in preparation for the 2012 Bellagio meeting to identify, explain and contextualise different indigenous evaluation attributes that could be considered in future for African-rooted evaluations. These ideas are contained below in the extract from the Bellagio Report of this forum which considered, inter alia, the historical significance of evaluation, i.e. its historical roots in Africa, the spiritual identity of Africans and how it relates to evaluation, the importance of empowerment approaches to evaluation, group participation and thinking in evaluation, as well as African decision-making processes and methodologies, such as decision making by consensus (see Malunga 2009a, Malunga 2009b, Malunga & Banda 2011, Chilisa, 2012, Chilisa, Mertens & Cram, 2013, for recent expansions of the ideas contained in Chilisa & Malunga 2012). As remarked earlier, it is for these reasons that participative evaluation, also through self-assessment and peer review, has become such an important approach in acknowledging the inherent value that lies in African evaluation.

2.8.5 Specific Strategic Trends in Evaluation in Africa The foreword to the seminal Bellagio Report (2012) summarised what needs to be done to establish a more explicit recognition of the African context within which evaluations in Africa take place and a more explicit Africanisation of evaluation designs and methodologies, as follows: “... in addition to the 1. dire need for sufficient capacities and the application of evaluation standards in order to conduct good quality evaluations across the board, 2. thought leadership in evaluation theory and practice, by many disciplines and sectors, and the application of the resulting synthesized new knowledge are urgently needed in order to position Africa as a continent from which innovative frameworks, models and practices in evaluation emerge that are suitable for the challenges faced by the continent,

59

Historical Development and Practice of Evaluation 3. while in parallel, strategies are needed to enhance the influence and power of the profession and the work of its thought leaders in development, thought leadership in theory and practice is urgently needed in priorities that include: i.  understanding the role of changing and complex contexts in evaluation, and using systems thinking for holistic solutions, ii. the role of norms and values in development and in evaluation,

iii. the need for Africa rooted and Africa led evaluation,



iv. p olicy coherence from national to global levels, to be analysed in tandem with the micromacro disconnect,



v. mutual accountability in development financing programs and in development  interventions,



vi. evaluation beyond an obsession with “impact”, to include a stronger focus on “managing for impact” (which includes ongoing monitoring for impact, learning and adaptive management); concepts such as vulnerability, sustainability and resilience; and a nuanced interpretation of “value for money”,



vii. engaging with sensitive issues such as macro political trends, the often mindless rhetoric around concepts such as democracy and human rights, and the ongoing obscuring of truth in ‘evidence’, and their role in the effectiveness of development strategies,

viii. searching for unintended consequences and unexpected impacts,

ix. synthesis that produces useful knowledge,



x. evaluation in priority content areas, such as • climate change, food and water, security • human security • power and empowerment • relationships, especially in partnerships, coalitions, networks, platforms • creativity, innovation and entrepreneurship • institutional systems for good governance, including the elimination of corruption • impact investing, social bonds and other influences of the private sector”.

Ofir (2013:583-584) comments on the space that has developed recently in non-western countries to shift the practice of evaluation in those contexts into a more appropriate direction, taking into account contextualised indigenous values and practices. She encourages the allocation of more resources across different disciplines for this purpose and identifies the need for powerful thought leadership in developing countries to initiate this change. She contextualises the general nature of the problems experienced in developing countries as follows: “The severe vulnerabilities and power asymmetries inherent in most developing country systems and societies make the task of evaluation specialists in these contexts both highly challenging and highly responsible. It calls for specialists from diverse fields, in particular those in developing countries, to be equipped and active, and visible where evaluation is done and shaped … The agenda would include studying the paradigms and values underlying development interventions; working with complex adaptive systems; interrogating new private sector linked development financing modalities; and opening up to other scientific disciplines’

60

Chapter 2 notions of what constitutes ‘rigor’ and ‘credible evidence’. It would also promote a shift focus from a feverish enthrallment with ‘measuring impact’ to how to better manage for sustained impact”. It is clear that considerable effort has already gone into attempting to identify ways to improve the integration of current evaluation knowledge and practices more effectively into different contexts across the globe. For the purposes of this book the focus remains on Africa, where the latest attempt by the Bellagio meeting in 2012 resolved to pursue the following strategies (extracted from the Bellagio Centre Report, 2012:14):  eveloping capacities for innovation in African evaluation, while respecting the principles D of capacity development as an endogenous process. Such strategies can be based, among others, on government goals for evaluation that go beyond responsiveness to challenges, to determining accountability for value for money, with key goals that  include ... • governance and accountability to citizens and to those who provide support, • the development of learning nations and groups for informed reflection, innovation and change, • stimulation of African thought leadership in evaluation, in particular through analytically oriented institutions (research and evaluation centers; universities) to enhance their role as independent evaluation institutions, centers of expertise and think tanks on evaluation, • knowledge development and contributions to global knowledge. Expanding the pool of evaluation: Knowledge generated from within Africa could include the following specific actions: • Generate, compile and classify a transparent repository of knowledge on African evaluation. • Map capacity building initiatives in evaluation in Africa. • Move the compiled repositories and maps to the wider African public. • Gauge demand from specialist universities, think tanks and evaluation projects to partner in order to generate original knowledge, by drawing lessons learnt and best practices on the theory, perception and application of Africa rooted evaluation. • Document and disseminate results of strategies to improve the status of evaluation, and capacities on the continent. • Document and disseminate the approaches and results of research into evaluation theory and practices done on the continent. Catalyzing a strong, movement towards ‘thought leadership’ that can enhance the evaluation profession in Africa, and support development policy and strategy: African evaluators and other stakeholders need to commit to advancing monitoring and evaluation theory and practice. More specifically, they need to engage better with: • key frameworks, policies and strategies at national and regional levels; • international aid and other global policy and regimes that influence African development; • the diversity of new actors and development funding modalities; • the belief- and value-laden nature of both development and evaluation; • evaluation theory and practice rooted in Africa.

61

Historical Development and Practice of Evaluation “Civil society could play a leading role in canvassing ideas and fostering thought leadership in development evaluation by acting as a ‘broker of evaluative knowledge’ among different sectors. Such movements require not-for-profit actors that are credible, with a measure of independence. Dynamic, continuous dialogues could take place guided by evaluation thought leaders within a liberal thinking space in order to inform policies and enable institutionalized, sustainable, effective systems in government, including in evaluation.” At the 2014 Conference of AfrEA in Yaounde in Cameroon, the African Evaluation Journal (AEJ) which has been in the pipeline for a number of years, was launched, while a formal mapping of key evaluation individuals, organisations, networks/coalitions and initiatives in Africa is in progress with the purpose to engage them in different ways to develop a broad-based consensus about the next steps to implement the Bellagio resolutions. These steps include follow-up forums to work out the details of implementation, to establish “a network/community of practice of African evaluation ‘thought leaders’ (on theory and practice) who are prepared to advance work on key concepts related to Africa-rooted and Africa-led evaluation”, as well as a resource repository for this purpose (The Bellagio Centre Report, 2012:14). A number of case studies on the current state of institutionalisation of M&E systems in selected African countries that can be regarded as the most advanced on this continent, and examples of the most established African VOPES to promote and support evaluation capacity-building on the continent, are summarised in Chapter 10. The next section summarises the main elements in the historical development of evaluation in South Africa.

2.9

Historical Development of Programme Evaluation in South Africa4

This section provides a time line of the development of programme evaluation in South Africa. The rise, growth and establishment of this sub-discipline stretches across various sectors. Three time periods can be distinguished in the emergence of evaluation in South Africa: • Up to 1994, systematic programme evaluation in South Africa was most advanced in the NGO sector. Programme evaluation gained entry due to international donor agencies that channelled firstly solidarity funding and later Official Development Assistance to South Africa. • After 1994, programme evaluation’s prominence in the NGO sector grew as donor agencies increasingly introduced this as an accountability mechanism. The evolution and sophistication of programme evaluation are therefore directly linked to these developments in the NGO sector. • In the mid-2000s the public sector came on board in support of systematic programme evaluation – stimulated by the third AfrEA Conference in Cape Town in 2004. This period onwards marks the most significant growth and establishment of programme evaluation in South Africa as the public sector institutionalised programme evaluation as a key accountability mechanism. Another contributing factor to this development 4 This section largely comprises an edited extract from Mouton (2010).

62

Chapter 2 during this period has been the increased offering of dedicated M&E training courses by both higher education institutions and other training providers.

2.9.1 Emergence of Programme Evaluation in the NGO sector pre-1994 Donor funding as a mechanism to achieve better accountability was an important stimulus for programme evaluation in South Africa. There is a direct link between international donor funding (international aid) and the eventual introduction of programme evaluation in South Africa. The international donor community has been the main catalyst for the introduction of accountability mechanisms (of which programme evaluation is one) in development programmes and projects in South Africa through the funding they invested there. International donors were exposed to the international driving forces behind the development of programme evaluation in the USA, Britain and other Western European countries, and introduced the principles of systematic evaluation into their programmes and projects in South Africa as well as in other African countries. Voluntary organisations in the North originated from a very different context than their Southern counterparts in the developing world. Long-time organisations such as CARITAS and Save the Children were established to alleviate the aftermaths of war (OECD as cited in Edwards & Hulme, 1996), whereas local NGOs originated from the independence struggle and the societal ills caused by the apartheid regime. Other factors that played a role in raising the profile of accountability in the NGO sector in South Africa include: • The growth experienced in the NGO sector during the late 1980s and early 1990s invariably led to a need for greater transparency. • The vast amount of funding that NGOs received during the 1980s to assist with service delivery where the government was perceived to be lacking, necessitated greater  accountability. • The general black public’s lack of trust in government and private sector also affected the NGOs. • The intrinsic nature of democracy that requires greater accountability to the citizen (Lee, 2004:305). The size and focus of support received from the above sources differ significantly when comparing the pre-1994 years with the situation post 1994. The accountability mechanisms attached to donor funding gradually became more sophisticated. Bearing in mind that many international donors’ funding originates from government, it is not surprising that accountability adopts a chain-like structure: i.e. the recipient organisation being accountable to the donor and the donor being accountable to government. Accountability measures can take various forms, for example reports, performance assessments, public participation, self-regulation by setting standards and social audits. Over time these accountability mechanisms became more sophisticated. Numerous non-governmental organisations (NGOs as they were referred to in short during this time period) were established before 1994 to address the consequences of apartheid. With the enhanced focus on civil society and their role in advancing democracy, significant

63

Historical Development and Practice of Evaluation growth was experienced in the number of NGOs concerned with democratisation. Given their close proximity to the citizenry, international donors at that stage used these NGOs to promulgate and promote democracy. Pertinent democracy-oriented organisations during that time include the Institute for Democracy in Africa (IDASA), the Institute for MultiParty democracy, the Khululekani Institute for Democracy and the Electoral Institute of South Africa (Hearn, n.d.). NGOs operating in South Africa in the 1980s and early 1990s aimed to provide a voice to the marginalised and oppressed and were viewed to be in direct opposition to the reigning apartheid government. Given their decision to side with the oppressed, it was not possible for civil society to exert much influence at that stage. The entry of donor funding into South Africa, however, changed civil society in various ways. For instance, the international donor community played a catalytic role in introducing international movements and trends to South Africa that changed the way in which civil society operated locally. A strong and active civil society balances power relationships in that it prevents the emergence of an authoritarian state where limited citizenry consultation takes place (Hearn, n.d.). Pre 1994, South Africa did not have a strong civil society. However, with the onset of democracy, NGOs increasingly started taking up a mediator role to enhance communication between opposing poor black communities and the white minority government (Fowler, 1993).

2.9.2 Evaluation in the post-apartheid period after 1994 In 1990 the political landscape started to change with the announcement that Nelson Mandela was to be released. This event marks the start of a four-year transitional period before the first democratic election took place in 1994. The NGO landscape was severely affected on almost all fronts. Following the first democratic elections, the NGO sector underwent a significant convergence. NGOs and Community Based Organisations (CBOs) came to be known as Non-profit Organisations (NPOs) which reflected their depoliticised nature and commitment to work with the new government. Another way in which the NGO sector changed shape was through the establishment of a dedicated NPO umbrella body and various networks. The South African National NPO Coalition was established in 1994 and various networks such as the Urban Sector Network and Rural Sector Network were established to organise the sector and bring about greater synergy (Gordhan, 2010). One of the most influential developments in the NPO sector post 1994 was the reorganisation of donor funding. Whereas funding pre-1994 was channelled directly to NPOs or through religious organisations, post-1994 marked a greater control and management of donor funding by the new post-apartheid government. Official Development Assistance (ODA), as the funding in this era came to be known, were now channelled via the government’s RDP fund. ODA can take three forms: • Grants • Technical cooperation • Financial cooperation The centralisation of all donor funding (through the RDP fund) allowed for greater government control over the disbursement of donor funding. However, some donor

64

Chapter 2 countries such as USA, Switzerland and Norway continued to channel funding directly to NPOs and the private sector. Accountability mechanisms gradually became more sophisticated as donors started to attach more stringent criteria to their funding. The free ride for NPOs as far as accountability is concerned, was over (Hofmeyr, 2010). This however did not occur immediately, as a 1999 study by the International Development Corporation (IDC) and United National Development Programme (UNDP) shows. A review of donor reports at that stage still reflected very little impact assessment, a focus on activities and outputs as opposed to outcomes, and difficulty in determining actual financial contributions. Another characteristic of evaluation studies done during this time is the lack of beneficiary and participant involvement (Simeka Consulting, 1999). It is noteworthy that the forerunners in the introduction of more stringent accountability requirements for recipients of donor funds came from the international government-funded donor organisations such as Danish International Development Agency (DANIDA) and EU (European Union) as opposed to international NPO agencies (i.e. OXFAM and Humanist Institute for Cooperation (HIVOS)) (Gordhan, 2010). The latter mentioned organisations only introduced programme evaluation as an accountability mechanism later on. Critique was raised on the manner in which some of the donor organisations approached evaluation studies, as some donor organisations would undertake these studies without any local partner engagement or involvement. This did not apply across the board. Organisations such as INTERFUND, the Netherlands Organisation for International Development (NOVIB) and OXFAM engaged with local programme staff and sought their involvement through the evaluation process. Many international NGOs introduced programme evaluation during the apartheid years prior to 1994. USAID frequently conducted monitoring and evaluation activities, but mainly used international experts to undertake the work. The Department for International Development (DFID), the European Union and the Netherlands also introduced programme evaluation as an accountability measure early on. The Kellogg Foundation on the other hand established a local office and employed mainly local people. The tool most commonly used to conduct programme evaluations at that stage was the logframe or logical framework approach (Crawford, 2003). Donor organisations continued to follow the logical framework approach but it did not remain static. The New Public Management Approach that gained popularity in the 1990s in the public sector led to the development of divergent models of this approach to performance assessment. The enhanced focus on performance measurement and greater efficiency brought about a stronger results-based focus (Kilby, 2004). The Canadian International Development Agency (CIDA) for instance referred to their approach as “results-based management” while USAID referred to this as “managing for results” approach. In both instances these frameworks called for a greater hierarchical alignment between activities and different levels of outcomes. Performance indicators and the attached data collection tools became an imminent part of the framework.

65

Historical Development and Practice of Evaluation Another development promoted the gradual inclusion of local expertise in the evaluation process. The Paris Declaration on Aid Effectiveness has ensured greater involvement from recipients of donor funding in programme activities (which includes evaluation studies). The declaration supports partner country involvement and calls for collaboration in terms of five principles: • Ownership: it is the partner country that should take control of development policies and activities. The role of the donor country is to support and build capacity where  needed. • Alignment: support should take cognisance of partner countries’ strategies and not dictate new systems and procedures. Instead, capacity should be built where needed. • Harmonisation: donor and partner country should work together in a transparent, harmonious manner. • Managing for results: evidence-based decision making takes place, which means resources are allocated according to results. • Mutual accountability: donors and partners are mutually responsible for results (Paris Declaration on Aid Effectiveness, published by the OECD, 2005). This greater focus on reporting and accountability has not occurred without resistance. Some NPOs have been forced to realign activities to satisfy donor requirements, sometimes to the detriment of the organisation (Moyo, 2001). It is, however, evident that accountability and the mechanisms employed to track effectiveness are here to stay. With the growth experienced in the field of programme evaluation, NPO organisations have various options when undertaking monitoring and evaluation activities. Some of the larger NPOs have appointed dedicated M&E officers to put in place M&E frameworks, while the smaller NPOs opt to outsource this task to consultants.

2.9.3 Contributions of the first generation evaluators in South Africa The first cohort of scholars and researchers involved in evaluation studies (which we will term the “first generation evaluators”) had to rely on their own resources to establish themselves in the field. A lack of formal training in M&E meant that they had to rely on their own abilities and initiatives and had to adopt their methodologies through application and practice. They include individuals like Johann Mouton (Human Sciences Research Council and currently based at the University of Stellenbosch), Ricky Mauer (Human Sciences Research Council), Jane Hofmeyr, Charles Potter and Joe Muller (University of the Witwatersrand) and Johann Louw (University of Cape Town). These individuals started to undertake systematic evaluations and supplemented their activities with various skills development initiatives, including: • Doing extensive reading and research in the evaluation field. • Participating in international conferences. • Utilising learning aids from development organisations such as the World Bank. • Establishing networks with international experts, locally and abroad, while a number of international evaluation experts also visited South Africa since 1988 for various purposes (Hofmeyr, 2010). This stimulated the further development of the field, as Table 2.1 illustrates (Mouton, 2010:135-145).

66

Chapter 2 Table 2.1: Visits of international M&E experts to South Africa Year

Initiative Detail

Initiator

1988

A Research Utilisation Seminar was presented at the University of the Witwatersrand (WITS). Carol Weiss delivered a paper at this seminar (Hofmeyr, 2010).

Jane Hofmeyr and Joe Muller

1990

Mark Lipsey came to South Africa to facilitate a programme evaluation workshop.

Johann Mouton (then employed at The Human Sciences Research Council)

1993

Carol Weiss was hosted in South Africa by the Education Foundation (Hofmeyr, 2010).

Jane Hofmeyr

Partially funded by the Human Sciences Research Council, David Fetterman was invited to deliver a keynote address at a 1993 two-day symposium on programme evaluation in Cape Town.

Johann Louw and Johann Mouton

1994

Mark Lipsey returned to South Africa for another series of seminars.

Johann Louw and Johann Mouton

1996

Prof Peter Weingart from the University of Bielefeld was invited to deliver a conference paper at the JET conference titled Quality and Validity in 1996.

Nick Taylor

2002

Michael Quinn Patton visited South Africa for the first time on invitation.

Zenda Ofir

Charline Mouton explained that “(t)he 1993 initiative by Johann Louw and Johann Mouton, where David Fetterman delivered the keynote address, had two purposes. Firstly, it marked the first attempt to establish an evaluation network in South Africa and secondly to bring together the first generation evaluators in order to establish the level of M&E at that stage (Louw, 2010). Approximately 25-30 people were invited to attend this event at the then Lady Hamilton Hotel in Oranjezicht in Cape Town. Following this event three individuals were appointed to drive the establishment of the evaluation network. Johann Louw chaired this small committee. He reports that the main reason why this network did not get off the ground was that for many of the attendees evaluation was still a ‘side issue’ and not their main area of focus. The time was not ripe for programme evaluation yet” (Mouton, 2010:144). Mouton (2010:144) concludes that “… (a) decade passed before the idea of an Evaluation Network was pursued again. In 2002, Zenda Ofir organised an event which Michael Quinn Patton attended. This event is a landmark occasion in that it brought together the biggest group of people around M&E in South Africa to date. It should also be recognised that this event marked the first step to what later became the South African Monitoring and Evaluation Association (SAMEA)”. The history of SAMEA is summarised in Chapter 9. The enhanced focus on accountability and new funding channels required greater strategic alignment when disbursing funding. Many NPOs were not geared for the additional accountability mechanisms enforced on them. This led to a huge upsurge in consultancies that aimed to equip these organisations with new skills such as planning, goal setting, indicator development and developing M&E systems. It is quite difficult to trace the origin of the first M&E consultancies, as for many years such organisations presented themselves as strategic planning, management or research consultancies. Khulisa Management Services, for example, was established in 1993 and very soon got involved in programme evaluation. Strategy & Tactics (headed by David

67

Historical Development and Practice of Evaluation Everatt in 1998) from the outset identified and communicated their M&E focus (Everatt, 2010), while Evalnet was established by Zenda Ofir in 1999, focusing only on evaluations. It was only in later years that other consultancies started advertising monitoring and evaluation products and services.

2.9.4 Institutionalisation of Programme Evaluation in the SA Public Sector: mid2000 to 2005 The closest that the South African government had come to systematic programme evaluation before 2005, was the practice of drawing up regular, internal annual departmental reports to be tabled in Parliament. These reports were mostly monitoring reports of programme and project inputs, activities and outputs, and in some cases providing an opinion-based qualitative assessment of a department’s achievements during that financial year. These regular administrative reports were also annually supplemented by regular external audit reports from the Auditor General, detailing departmental compliance with regulatory frameworks. These evaluations, however, did not normally focus on outcomes and impacts of governmental policies, programmes and projects. These internal and external annual reports were further periodically supplemented by comprehensive reports of official commissions or committees of inquiry or working groups tasked with narrow terms of reference or mandates to investigate specific issues. These reports were then based on once-off systematic data collection, analysis and assessments of such issues or problems referred to the investigation concerned for evidence-based findings and recommendations. In addition to these regular and ad hoc evaluation exercises, the South African Public Service Commission (PSC) has taken it upon itself to regularly monitor the degree to which the main values or principles, that the PSC has identified in the 1996 Constitution, has been achieved by government departments and agencies at national and provincial levels. The Constitutional principles identified by the PSC to monitor, evaluate and report on, are the following: 1. Professional ethics 2. Efficiency and effectiveness 3. Participatory development orientation 4. Impartiality and fairness 5. Transparency and accountability 6. Human resource management and development 7. Representativeness (SA-PSC, 2001, Annexure 5)

The PSC sees its task as monitoring the compliance of government departments with these values. The PSC reports on the compliance of departments with the Batho Pele White Paper on Transforming Public Service Delivery, has produced important results that could be used in further policy assessments by the SA government (SA-PSC, 2000). Similarly, the

68

Chapter 2 PSC’s Citizen Satisfaction Survey could again be used to assess the satisfaction levels of citizens with selected public services (SA-PSC, 2003). One of the ways in which the growing need for greater accountability and transparency was addressed, was through the introduction of the Government-wide Monitoring and Evaluation System (GWM&ES). This system marks a concerted effort by government to institutionalise, standardise and synchronise government monitoring and evaluation activities. The system provides a framework of how government performance measurement should be undertaken and draws on various implementing bodies (including government departments) to give effect to this system.

2.9.5 The Emerging Government-wide Monitoring and Evaluation System (GWM&ES) in SA since 20055 As suggested above, until 2005, only individual staff performance evaluations were institutionalised and regularly and systematically carried out in the South African government. Policy programme monitoring and evaluation, however, were not undertaken, managed and coordinated systematically in the South African Public Service. These activities were undertaken sporadically by line function departments for purposes of their annual departmental reports. Some departments were more rigorous than others in this process, while the Public Service Commission undertook to monitor and evaluate the SA government’s adherence to a restricted number of principles of good governance that the PSC deducted from the Constitution. The following considerations, probably contributed inter alia to a cabinet decision in 2005 to develop the GWM&ES: • A need for regular national government report-backs to the International UN Millennium Goals Initiative on the progress with halving poverty in South Africa by 2014; • the fact that SA was the host of the World Summit on Sustainable Development in 2002 and at that time did not have any national M&E system to assess sustainable development, as required by the Rio Convention of 1992 and reiterated at the Johannesburg Summit where SA was the host country; • the undertaking by the SA president to regularly inform citizens about progress with the Government’s National Programme of Action (POA) (http://www.info.gov.za/ aboutgovt/poa/index.html); • the fact that donors are increasingly requiring systematic M&E of projects and programmes that they fund, in order to protect their investments; and • the fact that institutionalising national M&E systems has, for the reasons summarised above, proved to be an international good governance practice. The above (as well as a number of other) considerations stimulated awareness in government that the state of monitoring and evaluation of governmental activities in SA can and should be managed better. In July 2005, the SA Cabinet therefore adopted a strategy to establish

5 This section comprises an edited extract from Cloete (2009:297-299).

69

Historical Development and Practice of Evaluation a Government-wide Monitoring and Evaluation System (GWM&ES) over a period of two years (SA-Presidency, 2005). The GWM&ES is intended to coordinate a systematic programme of policy monitoring and evaluation throughout the public sector in South Africa. This programme is aimed at improving general public management in the country, and will be the vehicle for reporting in 2014 on the implementation of the UN Millennium goals and targets to halve poverty according to a set of common indicators. In 2007 the initial GWM&ES proposal was revised and updated, mainly because the time frames specified in the original proposal were too optimistic, and because more clarity about how the system should be implemented had by then started to emerge (SA-Presidency, 2007a & 2007b). This revised M&E system will not only monitor internal governmental performance processes, but is also aimed at determining the nature of external governmental outcomes and impacts on South African society. It is therefore also aimed at determining the eventual longer term results of policy and service delivery interventions or a lack thereof. An important departure point of the GWM&ES is that existing monitoring and evaluation capacities and programmes in line function departments should as far as possible be retained, linked and synchronised within the framework of the GWM&ES (SA-Presidency, 2007a:19). The GWM&ES is managed from the DPME in the Presidency. It is a secondary data assessment system that will not undertake primary research or data collection itself. It will rather draw on information gained from the above and other agencies, and interpret this data in the context of the national government’s strategic Programme of Action, in order to assess progress towards those strategic goals. The updated GWM&ES implementation plan still contains no detailed implementation strategy and, this time around, no time frames to fully establish the system in South Africa. It does, however, spell out the roles and responsibilities of the various stakeholders and agencies involved in this programme. These institutional stakeholders include: • The Department of Performance Monitoring and Evaluation (DPME) who is responsible for the control, coordination, implementation and evaluation of the GWM&ES. • The National Treasury responsible for measuring the “value for money” aspects of governmental policy programmes (National Treasury, 2007). • The DPSA who is responsible for staff performance evaluations. • The Statistical Service of South Africa (StatsSA) who is responsible for data collection, storage and quality control (StatsSA, 2007). • The Public Service Commission (PSC) who is responsible for interdepartmental evaluations of those few constitutional process principles that the PSC has decided to measure, and who has published a set of guidelines about M&E in an attempt to synchronise the different perspectives of all the main governmental agencies involved in this venture (PSC, 2008). • The Department of Cooperative Government (DCG, formerly the Department of Provincial and Local Government – DPLG) who is responsible for assessing the policy programme performances of provinces and local authorities.

70

Chapter 2 • The Department of Environmental Affairs (DEA, formerly the Department of Environmental Affairs and Tourism – DEAT) who is responsible for assessing the state of the environment and sustainable development. • Other line function departments who are responsible to assess their own line function activities and report back through the channels establish by the GWM&ES. • Businesses, NGOs and NPOs who receive funding from the state and who have to report on their activities enabled by such public funding, and • The Public Administration Leadership and Management Academy (PALAMA, soon to be transformed into a new National School of Government – NSG), will probably be responsible for monitoring and evaluation capacity-building through training to improve M&E skills among thousands of officials who will be responsible for the implementation of the system (SAMDI, 2007). Chapter 9 is devoted to a detailed summary and assessment of the development of the GWM&ES in South Africa since 2005, while Chapter 10 summarises a number of African country case studies in this regard.

2.10 Conclusions The evaluation discipline developed into an independent, transdisciplinary profession from various other disciplines during the first half of the 20th century. Two related scientific fields, namely policy analysis and social research, contributed tremendously to strengthening evaluation practices observed during the final half of the 20th century. In the policy field, the paradigm shift from opinion-driven policy choices to evidence-based policy making, which puts the best available evidence from research at the heart of policy development and implementation, fuels the need for accurate evaluation of the results of public sector development programmes. The quest for evidence-based policy making should not be a pure technical analysis, but should allow for divergence and various detailed policy options. “This means that policy making is not just a matter of ‘what works’, but what works at what cost and with what outcomes” (Segone, 2008a:34-35). Similarly, advances in social research theories and methodologies and the application thereof to societal development problems in the mid-1900s brought greater possibility and sophistication to the field of evaluation research. Advances in social science methodology provide the basis for social experimentation and signified the birth of evaluation research as applied social research. Today it is acknowledged that evaluation research is much more than just applied social research: it implies a concrete judgement on the phenomena in question to fulfil its purpose and is therefore value laden, concerned with standards and values that, together with factual results, produce evaluative conclusions and the intended use of the evaluation results. It was also concluded above that the driving forces behind systematic evaluations normally are economic or fiscal and are driven by governments, with the exception of developing countries in Africa. The evaluation field is also historically and currently largely dominated by western countries, but increasingly strong voices are heard in favour of changing this evaluation imperialism towards more customised, indigenous evaluation theories and practices in non-western countries. This

71

Historical Development and Practice of Evaluation is important for the future of evaluation in Africa and in South Africa. Initiatives to create an Africa-driven approach to evaluation have been initiated and could have significant impact on the future of evaluation on the African continent. The concrete integration of these ideas in prevailing evaluation approaches, designs and methodologies and their implementation in evaluation exercises, will have to be fleshed out in more detail over the next few years, if African evaluators are really serious in contextualising systematic evaluation for their  continent. The chapter has also illustrated that programme evaluation, as many other international trends, entered South Africa via the donor community. However, this was not done without problems, such as a general lack of local stakeholder consultation, the emphasis on accountability at the cost of learning and local ownership. The 1994 election provides a natural partition when considering the characteristics of donor funding. A clear difference is evident pre- and post-1994 when considering the size, scope and origin of donor funding. There is also a vast difference in the importance attached to accountability measures pre- and post-1994. The “free ride” for NGOs came to an end through the introduction of frameworks such as the logframe approach that is used to strengthen accountability for funds received. This enabled donors to track whether their funding is being used optimally. It is also now common practice to involve local programme staff members in the evaluation studies and not to purely rely on international experts. Although Monitoring and Evaluation was introduced in the NPO sector in South Africa and experienced growth in the era post 2000, the establishment and institutionalisation of Programme Evaluation occurred in the public sector around the mid-2000s. When considering programme evaluation in isolation, the time line is even shorter. It is only in the past five years, with the introduction of the outcomes-based approach and release of the evaluation policy framework, that programme evaluation is approached and executed more systematically. Going forward, the clear mandates and policy frameworks of the various implementing government organisations are likely to deepen the institutionalisation of programme evaluation within the public sector. The historical development of evaluation elsewhere on the African continent has followed a parallel route to the South African experience, and is also detrimentally affected by the late development of the field in Africa. Although South Africa has experienced exponential growth in the field of programme evaluation during the past two decades, it is not nearly as advanced as it is in the USA and UK case studies summarised and assessed above. The main reasons for this state of affairs include the following: • Firstly, the sophisticated, technical and relatively complicated nature of evaluation as a reflective management function (as summarised in Chapter 1), necessitates highly specialised knowledge, insights, skills and practical experience as well as the resources and technologies to undertake this function effectively and efficiently. These requirements for successful evaluation do not always exist in this country, as well as in most other developing countries where even basic service delivery constitutes huge challenges for those societies. • Secondly, the fact that systematic evaluation is a relatively recent phenomenon even in more developed societies, where it started in practice only to consolidate in the 1980s

72

Chapter 2 and 1990s (compared to the European start-ups 30 years earlier in the 1950s already), imply that the evaluation paradigm is not widely known and accepted yet in many developing countries, including in South Africa. • Thirdly, the developing nature of South African society which does not always have the advanced knowledge, skills, management culture and practices found in better resourced nations like those of the US, Britain and others, has resulted in an inevitable evaluation development backlog in this country. This is also the case in other developing countries in Africa and on other continents. Evaluation in South Africa has therefore developed along a completely different historical route than has been the case in the USA and in Britain, where the government turned out to be the main champion and initiator of programme evaluations for mainly economic and financial reasons. In South Africa, systematic evaluation first took root in the NPO sector, prompted by the involvement of international donors in that sector.

References African Capacity Building Foundation. 1992. Strategy and Indicative Work Programme: 1992-1995. Harare: ACBF. AfrEA Special Stream Statement. 2007. AfrEA. 2014. Website of the African Evaluation Association. http://www.afrea.org/?q=conference. Retrieved: 3 March 2013. AfrEA. 2014. Website of the African Evaluation Association. http://www.afrea.org/?q=conference. Retrieved: 3 March 2013. Alkin, M.C. (ed.). 2004. Evaluation Roots: Tracing Theorists’ Views and Influences. Thousand Oaks, CA:  Sage. Alkin, M.C. (ed.). 2013. Evaluation Roots: A Wider Perspective of Theorists’ Views and Influences, 2nd edition. London: Sage. Alkin, M.C. & Christie, A.C. 2004. An Evaluation Theory Tree. In Alkin, M.C. (ed.). Evaluation Roots: Tracing Theorists’ Views and Influences. Thousand Oaks, CA: Sage. Ascher, W. 1986. The evolution of the policy sciences: Understanding the rise and avoiding the fall. Journal of Policy Analysis and Management, 5(2): 365-373. Barberis, P. 1998. The New Public Management and a New Accountability. Public Administration, 76: 451-470. Barzelay, M. 2001. The New Public Management: Improving Research and Policy Dialogue. Berkeley: University of California Press. Bellagio Centre. 2012. The Bellagio Report. African Thought Leaders Forum on Evaluation and Development: Expanding Leadership in Africa. The Bellagio Centre. Parktown, Johannesburg: CLEAR, P&DM Wits University. Blume, S.S. 1987. Social Science in Whitehall: two analytic perspectives, in Bulmer, M. (ed.). Social Science Research and Government: Comparative Essays on Britain and United States. London: Cambridge University Press: 77-93. Bowerman, M., Humphrey, C. & Owen, D. 2003. Struggling for Supremacy: The case of UK Public Audit Institutions. Critical Perspectives on Accounting, 14: 1-22. Bowsher, C.A. 1992. Transmittal Letter: Program Evaluation Issues. New Directions for Program Evaluation, 55: 11-12. Boyle, R. & Lemaire, D. (eds.). 1999. Building effective evaluation capacity. Lessons from practice. New Brunswick, USA: Transaction Publishers. Brunner, R.D. 1997. Introduction to the policy sciences. Policy Sciences, 30(4): 191-215. Chelimsky, E. 2006. The Purpose of Evaluation in a Democratic Society, in Shaw, I.F., Greene, J.C., Mark, M.M. Handbook of Evaluation. London: Sage: 33-55.

73

Historical Development and Practice of Evaluation Chilisa, B. 2012. Indigenous Research Methodologies. Berkeley: Sage. Chilisa, B. & Malunga, C. 2012. Made in Africa Evaluation: Uncovering African Roots in Evaluation Theory and Practice, in Bellagio Centre. Report of the African Thought Leaders Forum on Evaluation for Development: Expanding Thought Leadership in Africa. 14-17 November 2012, 32-38. Bellagio. http://www.clear-aa.co.za/wp-content/uploads/2013/09/Bellagio-Report.png. Retrieved: 2 March 2014. Chilisa, B., Mertens, D. & Cram, F. 2013. Indigenous Pathways into Social Research. Walnut Creek, CA: Left Coast Press. Cilliers, P. & Nicolescu, B. 2012. Complexity and transdisciplinarity – Discontinuity, levels of Reality and the Hidden Third. Futures, 44: 711-718. Cloete, F. 2009. Evidence-based policy analysis in South Africa: Critical assessment of the emerging government-wide monitoring and evaluation system. South African Journal of Public Administration, 44(2): 293-311. CODESRIA. 1993. Report of the Executive Secretary on the Twentieth Anniversary of the Council for the Development of Social Sciences Research in Africa (CODESRIA, 29 November-1 December 1993. Dakar: CODESRIA. Crawford, G. 2003. Promoting Democracy from without – Learning from within (Part I). Democratization, 10(1): 77-98. Davies, P. 2008. Making policy evidence-based: The UK experience. World Bank Middle East and North Africa Region Regional Impact Evaluation Workshop Cairo, Egypt, 13-17 January 2008. http://siteresources.worldbank.org/INTISPMA/ Resources/383704-184250322738/3986044-1209668224716/English_EvidenceBasedPolicy_ Davies_Cairo.pdf. Retrieved: 30 April 2009. Davies, P., Newcomer, K. & Soydan, H. 2006. Government as Structural Context for Evaluation, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). The Sage Handbook of Evaluation. London: Sage. Davis, B.C. 1986. Overview of the Teaching of Evaluation across the disciplines in Lipsey, M.W. (ed.). New Directions for Programme Evaluation. San Francisco: Jossey-Bass: 5-14. Derlien, H. 1990. Genesis and Structure of Evaluation Efforts in Comparative Perspectives, in Rist, R.C. (ed.). Program Evaluation and the Management of Government, Patterns and Prospects across eight Nations. New Brunswick: Transaction Publishers: 147-176. Everatt, D. 2010. Telephonic Interview 10 March, Cape Town. Florin, P.M., Guillermin, M., & Dedeurwaerdere, T. 2014. A pragmatist approach to transdisciplinarity in sustainability research: From complex systems theory to reflexive science. Futures. http:// dx.doi.org/10.1016/j.futures. Retrieved: 2 February 2014. Foster, C.D. & Plowden, F.J. 1996. The State under Stress: Can the Hollow State be Good Government? Philadelphia: Open University Press. Fowler, A. 1993. Non-governmental organizations as agents of democratization: An African perspective. Journal of International Development, 5(3): 325-339. Furubo, J., Rist, R.C. & Sandahl, R. (eds.). 2002. International Atlas of Evaluation. New Brunswick, New Jersey: Transaction Publishers. Gordhan, S. 2010. Personal Interview 24 February, Johannesburg. Grasso, P.G. 1996. End of an Era: Closing the U.S. General Accounting Office’s Program Evaluation and Methodology Division. Evaluation Practice, 17(2): 115-117. Grasso, P.G., Wasty, S.S. & Weaving, R.V. 2003. World Bank Operations Evaluation Department: The First 30 Years. Washington, DC: World Bank. Gray, A. & Jenkins, B. 1982. Policy Analysis in British Central Government: The experience of PAR. Public Administration, 60: 429-450. Gray, A. & Jenkins, B. 2002. Policy and Program Evaluation in the United Kingdom: A Reflective State?, in Furubo, J., Rist, R.C. & Sandahl, R. (eds.). International Atlas of Evaluation. New Brunswick, New Jersey: Transaction Publishers: 129-156. Gray, A., Jenkins, B., Flynn, A. & Rutherford, B. 1991. The Management of Change in Whitehall: The Experience of the FMI. Public Administration, 69: 41-59.

74

Chapter 2 Greene, Ian. n.d. Lessons Learned from Two Decades of Program Evaluation in Canada. http:// www.yorku.ca/igreene/progeval.html. Retrieved 29 August 2014. Greene, J. 2010. Keynote Address at the 2010 Conference of the European Evaluation Society in Prague. Havens, H.S. 1992. The Erosion of Federal Program Evaluation, in Wye, C.G. & Sonnichsen, R.C. (eds.). New Directions for Program Evaluation: Evaluation in the Federal Government: Changes, Trends, and Opportunities: 21-27. Hearn, J. n.d. Foreign Aid, Democratisation and Civil Society in Africa: A study of South Africa, Ghana and Uganda. Discussion Paper 368. London: Institute of Development Studies. Henkel, M. 1991. Government, Evaluation and Change. London: Jessica Kingsley. Hitt, M.A., Middlemist, R.D. & Greer, C.R. 1977. Sunset Legislation and the Measurement of Effectiveness. Public Personnel Management: 188-193. Hofmeyr, J. 2010. Personal Interview 24 February, Johannesburg. Hogan, R.L. 2007. The Historical Development of Program Evaluation: Exploring the Past and Present. Online Journal of Workforce Education and Development, 2(4). Hoggett, P. 1996. New modes of control in the Public Service. Public Administration, 74: 9-32. Hogwood, B.W. 1987. From Crisis to Complacency? Shaping Public Policy in Britain. New York: Oxford University Press. Hood, C., James, O., Jones, G., Scott, C. & Travers, T. 1998. Regulation inside Government: Where New Public Management meets the audit explosion. Public Money & Management, April-June: 61-68. House of Commons. 1982. Third Report from the Treasury and Civil Service Committee Session 1981‑1982: Efficiency and Effectiveness in the Civil Service. London: Her Majesty’s Stationary Office. House, E.R. 1993. Professional evaluation: Social impact and political consequences. Newbury Park: Sage Publications. House, E.R. 2004. Intellectual History in Evaluation, in Alkin, C.M. (ed.). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage. Jahn, T., Bergmann, M. & Keil, F. 2012. Transdisciplinarity: Between mainstreaming and marginalization. Ecological Economics, 79: 1-10. Kelly, J. 2003. The Audit Commission: guiding, steering and regulating local government, Public Administration, 81(3): 459-476. Kilby, P. 2004. Is empowerment possible under a new public management environment? Some lessons from India. International Public Management, 7(2): 207-225. Kusek, J.Z. & Rist, R.C. 2004. Ten steps to a results-based monitoring and evaluation system. Washington DC: The World Bank. Lee, J. 2004. NPO Accountability: Rights and responsibilities. CASIN Louw, J. 2010. Personal Interview. 1 July 2010, Cape Town. Madaus, G.F. 2004. Ralph W Tyler’s Contribution to Programme Evaluation, in Alkin, C.M. (ed.). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage: 69-79. Madaus, G.F. & Stufflebeam, D.L. 2000. Program Evaluation: A Historical Overview, in Stufflebeam, D.L., Madaus, G.F. & Kellaghan, T. Evaluation Models: Viewpoints on Educational and Human Services Evaluation. Second Edition. Boston: Kluver Academic: 3-18. Malunga, C. 2009a. Making Strategic Plans Work: Insights from Indigenous African Wisdom. London: Adonis & Abbey Publishers. Malunga, C. 2009b. Understanding Organisational Leadership through Ubuntu. London: Adonis & Abbey Publishers. Malunga, C. & Banda, C. 2011. Understanding Organisational Sustainability through African Proverbs: Insights for Leaders and Facilitators. Rugby, UK: Practical Action Publishers. Manson, N.J. 2006. Is operations research really research? Orion, 22(2): 155-180. Mathison, S. 2005. Encyclopedia of Evaluation. London: Sage. Max-Neef, M.A. 2005. Foundations of transdisciplinarity. Ecological Economics, 53: 5-16.

75

Historical Development and Practice of Evaluation Melkers, J. & Roessner, D. 1997. Politics and the political setting as an influence on evaluation activities: National Research and technology policy programs in the United States and Canada. Evaluation and Program Planning, 20(1): 57-75. Morra-Imas, L.G. & Rist, R.C. 2009. The road to results: designing and conducting effective development evaluations. Washington, DC: World Bank. Mosher, F.C. 1984. A Tale of Two Agencies: A comparative Analysis of the General Accounting Office and the Office of Management and Budget. Baton Rouge and London: Louisiana State University Press. Mouton, C. 2010. The history of programme evaluation in South Africa. Thesis submitted in partial fulfilment of a Master’s Degree in Social Science Methods at Stellenbosch University, Stellenbosch. Mouton, J. 2007. Approaches to programme evaluation research. Journal of Public Administration, 42(6): 490-511. Moyo, B. 2001. International Foundations, Agenda setting and the Non-Profit sector in South Africa. African Journal of International Affairs, 4 (1&2): 93-118. Naidoo, I. 2010. Personal Interview. 20 April, Pretoria. National Treasury. 2007. Framework for Managing Programme Performance Information, National Treasury: Pretoria. www.treasury.gov.za/publications/guidelines/FMPI.pdf. Retrieved: May 2007. OECD. 2005. Paris Declaration on Aid Effectiveness: Ownership, Harmonisation, Alignment, Results and Mutual accountability. Paris. http://www.oecd.org/dac/effectiveness/34428351.pdf. Retrieved: 14 April 2014. Ofir, Z. 2013. Strengthening Evaluation for Development. American Journal of Evaluation 34: 582- 586. Ofir, Z. 2014. Personal Interview, 28 February and 1 March. Oosthuizen, E. 1996. A reflection on African policy support type organisations in Africanus, 26(2):  60- 77. Osborne, D. & Gaebler, T. 1992. Reinventing Government. Reading MA, Addison-Wesley. Pan African Development Information system. 1991. Directory of development Institutions in Africa. Ethiopia. United Nations Economic Commission for Africa. Picciotto, R. 2010. Evaluating development philanthropy in a changing world. The Bellagio Initiative on The Future of Philanthropy and Development in the Pursuit of Human Wellbeing. http:// www.bellagioinitiative.org/wpcontent/uploads/2011/10/Bellagio_Picciotto.pdf. Retrieved: 2 March 2014. Pohl, C. 2011. What is progress in transdisciplinary research? Futures, 43: 618-626. Pollitt, C. 1994. The Citizen’s Charter: A Preliminary Analysis. Public Money & Management, AprilJune: 9-14. Potter, C. 2010. Telephonic Interview. 15 March, Cape Town. Potter, C & Kruger, J. 2001. Chapter 10: Social programme Evaluation, in Seedat, M. (ed.), Duncan, M. & Lazarus, S. (cons eds.). Commmunity Psychology Theory, Method and Practice: South African and other perspectives. Cape Town: Oxford University Press: 189-211. PSC 2008: Basic concepts in monitoring and evaluation, Public Service Commission, Pretoria, Feb 2008, http://www.psc.gov.za/docs/guidelines/PSC%206%20in%20one.pdf. Retrieved 27 August 2014 Rabie, B. 2011. Improving the systematic evaluation of local economic development results in South African local government. Unpublished dissertation presented for the degree of Doctor of Public and Development Management at Stellenbosch University, Stellenbosch. Rabie, B. & Cloete, F. 2009. New Typology of Monitoring and Evaluation Approaches. Administratio Publica, 17(3): 76-97. Rist, R.C. & Paliokas, K.L. 2002. The Rise and Fall (and Rise Again?) of the Evaluation Function in the US Government, in Furubo, J., Rist, R.C. & Sandahl, R. (eds.). International Atlas of Evaluation. New Brunswick, New Jersey: Transaction Publishers: 129-156.

76

Chapter 2 Rist, R.C. 1987. Social Science analysis and congressional uses: the case of the United States General Accounting Office, in Bulmer, M. (ed.). Social Science Research and Government: Comparative Essays on Britain and United States. London: Cambridge University Press: 77-93. Rist, R.C. 1990. The Organization and Function of Evaluation in the United States: A Federal Overview, in Rist, R.C. (ed.). Program Evaluation and the Management of Government, Patterns and Prospects across eight Nations. New Brunswick: Transaction Publishers: 71-94. Rogers, P.J. & Davidson, J.E. 2013. Australian and New Zealand Evaluation Theorists, in Alkin, M.C. 2013. Evaluation Roots: A Wider Perspective of Theorists’ Views and Influence, Second Edition. London: Sage: 371-385. Rossi, P.H., Lipsey, M.W. & Freeman, H.E. 2004. Evaluation: A Systematic Approach Seventh Edition. Thousand Oaks: Sage Publications Inc. Rugh, J. & Segone, M. 2013. Voluntary Organisations for Professional Evaluation (VOPEs): Learning from Africa, Americas, Asia, Australasia, Europe and the Middle East. IOCEEvalpartners- UNICEF. SAMDI. 2007. Capacity-building for Monitoring and Evaluation in the South African Government: A Curriculum Framework. Pretoria: Government Printer. SA-Presidency. 2005. Proposal and Implementation Plan for a Government-wide Monitoring and Evaluation System. Coordination and Advisory Services. Pretoria: The Presidency. SA-Presidency. 2007a. Policy Framework for the Government-wide Monitoring and Evaluation System. Pretoria: The Presidency, http://www.thepresidency.gov.za/main.asp?include=learning/ reference/policy/index.html. Retrieved: 14 April 2014. SA-Presidency. 2007b. From policy vision to operational reality, Annual implementation update in support of the GWME policy framework. Pretoria: The Presidency. SA-PSC. 2001. Monitoring and Evaluation System Scoping Project, Pretoria: Public Service  Commission. SA-PSC. 2002. State of the Public Service Report 2002. Pretoria: Public Service Commission. http:// www.gov.za/reports/2003/psc03report.pdf. Retrieved: 25 April 2003. SA-PSC. 2003. Overview report on citizen satisfaction survey. Pretoria: Public Service Commission. SA-PSC. 2008. Basic concepts in monitoring and evaluation. Pretoria. Public Service Commission. http://www.psc.gov.za/docs/guidelines/PSC%206%20in%20one.pdf. Retrieved: 24 March 2014. Sasaki, R. 2006. A Review of the History and the Current Practice of Aid Evaluation. Journal of Multidisciplinary Evaluation 5. Sawyer, A. & Hyden, G. 1993. Sapes trust: The first 5 Years. Stockholm. Sweden: SAREC. Scriven, M. 1991. Evaluation Thesaurus. Newbury Park, CA: Sage. Scriven, M. 2003. Michael Scriven on the Difference Between Evaluation and Social Science Research. The Evaluation Exchange, IX(4): 7. Segone, M. 2008a. Evidence-based policy making and the role of monitoring and evaluation within the new aid environment, in Segone, M. Bridging the gap. The role of monitoring and evaluation in evidence-based policy making. UNICEF Evaluation Working Papers Issue # 12. Geneva: UNICEF. Segone, M. (ed.). 2008b. Bridging the gap. The role of monitoring and evaluation in evidence-based policy making. UNICEF Evaluation Working Papers Issue 12. Geneva: UNICEF. Segone, M. 2009. Enhancing evidence-based policy making through country-led monitoring and evaluation systems, in M Segone. Country-led monitoring and evaluation systems. Better evidence, better policies, better development results. UNICEF Evaluation Working Papers. Geneva: UNICEF. Segone, M. & Ocampo, A. 2006. Creating and Developing Evaluation Organisations. Lessons learned from Africa, Americas, Asia, Australasia and Europe. International Organisation for Cooperation in Evaluation. Lima, Peru. Shadish, W.R., Cook, T.D. & Leviton, L.C. 1991. Foundations of Program Evaluation: Theories of Practice. California: Sage Publications Inc. Shadish, W.R. & Luellen, J.K. 2005. History of Evaluation. In Mathison, Sandra. Encyclopaedia of Evaluation. Sage: London: 183-186.

77

Historical Development and Practice of Evaluation Sharp, C.A. 2003. A History of Program Evaluation in Australia and New Zealand and the Australasian Evaluation Society, paper prepared for the International Evaluation Conference, Auckland, September 2003. http://www.personalresearchandevaluation.com/documents/evaluation_ theory/aes_history/AEShistNew.pdf. Retrieved: 26 Feb 2014. Simeka Management Consulting. 1999. Donor assistance to South Africa 1994-1999. Unpublished Research Report. Spring, K.A. & Patel, M. n.d. Introduction to the Bibliography. www.afrea.org/resources. Retrieved: 12 February 2014. Stame, N. 2013. A European Evaluation Theory Tree, in Alkin, M.C. 2013. Evaluation Roots: A Wider Perspective of Theorists’ Views and Influences. Second Edition. London: Sage: 355-370. Stats SA. 2007. The South African Statistical Quality Assessment Framework (SASQAF), Pretoria: Government Printers. http://www.thepresidency.gov.za/main.asp?include=learning/ reference/sasqaf/index.html. Retrieved: 23 March 2014. Stewart, J. 2003. Modernising British Local Government: An Assessment of Labour’s Reform Programme. New York: Palgrave Macmillan. Thomas, R. 1995. Restructuring Economic Co-operation in Southern Africa. The Courier, 154. Traore, I.H. & Wally, N. n.d. African Regional VOPE: African Evaluation Association (AfrEA). Institutionalisation of Evaluation in Africa: The Role of AfrEA, in Rugh, J. & Segone, M. 2013. Voluntary Organisations for Professional Evaluation (VOPEs): Learning from Africa, Americas, Asia, Australasia, Europe and the Middle East. IOCE-Evalpartners-UNICEF. Turok, B. 1991. Africa. What can be done? London: Billing and Sons. Valadez, J. & Bamberger, M. 1994. Monitoring and evaluating social programs in developing countries. A handbook for policymakers, managers and researchers. EDI Development Studies. Washington: World Bank. Vestman, O.K. & Conner, R.F. 2006. The relationship between evaluations and politics, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). The Sage Handbook of Evaluation. London: Sage. Wargo, M.J. 1995. The Impact of Federal Government Reinvention on Federal Evaluation Activity. Evaluation Practice, 16(3): 227-237. Weiss, C.H. 2004. Rooting for evaluation: A cliff notes version of my work, in Alkin, C.M. (ed.) 2004. Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks, CA: Sage. Wholey, J.S. 1979. Evaluation: Promise and Performance. Washington: The Urban Institute World Bank. 1991. The African Capacity Building Initiative: Toward Improved Policy Analysis and Development Management. Washington DC: World Bank. Worthen, B.R. 1994. Is evaluation a mature profession that warrants the preparation of evaluation professionals? New Directions for Program Evaluation, 62: 3-15.

78

Chapter 3 Theories of Change and Programme Logic Fanie Cloete and Christelle Auriacombe 3.1 Introduction Academic researchers normally seek to identify and understand the nature and scope of societal problems and the details of governmental strategies to improve or counter them (i.e. for new knowledge generation). Evaluation researchers routinely take these results further and assess whether prevailing insights into and actions to address problems have led to changes in the nature of those problems, and how the outputs and outcomes of governmental and other agencies can be improved to address the perceived problems more effectively and efficiently. In other words, evaluators tend to assess the internal activities and processes of different agencies and how these attempts to convert resource and other inputs into concrete output products as well as short to medium term sectoral outcomes and long term, integrated impact consequences that might have changed the status quo ante more in line with the initial goals of these interventions. Evaluation therefore adds an explicit problem-solving focus to the initial phase of problem identification that is not always present in academic research. Such systematic policy, programme and project evaluations therefore comprise more applied research with a clear focus on the improvement or transformation of the status quo into a new and better status quo. Although not in all cases, policy evaluation therefore frequently tests to what extent the normative and empirical assumptions of policymakers have been implemented appropriately and to what extent the programme and project interventions based on these assumptions have resolved or improved the problems that they were supposed to address. Used in this way, policy, programme and project evaluations test the causal linkages between the identified problem, policy decisions and actions adopted to resolve, mitigate, regulate, alleviate or manage that problem and the consequences of such interventions. The main question to be answered in such exercises is to what extent, if any, the interventions under scrutiny have been effective in their results and what can or should be done to obtain more efficient and effective results. This causal logic constitutes what has become known in evaluation circles as the “theory of change” underlying a policy, programme or project intervention. It refers to the reasons behind government decisions to intervene in society in a specific way aimed at problem resolution or improvement. In order to ensure that government interventions add value to the status quo and improve it, the identification of policy issues and problems must therefore be done accurately, while appropriate and feasible policy transformation goals must be drafted and policy change programmes, based on an appropriate theory of change

79

Theories of Change and Programme Logic as well as implementation strategies based on an appropriate programme logic, must be developed to achieve the transformation of the unsatisfactory status quo into more desirable future conditions. This chapter assesses the nature of theories of change, how they relate to programme logic models, how they should be employed and how they can contribute to successful evaluations. Different approaches to evaluation and models of evaluation that are also sometimes referred to as “theories of evaluation” (e.g. Alkin, Vo & Hansen, 2013) are addressed in Chapter 4 because they deal with the evaluation itself and not with the object of the evaluation which is the policy, programme or project intervention concerned.

3.2

Nature of Theories of Change

A theory of change is not simple to explain. Theories of change are alternative or competing sets of logical sequences of linked theoretical and practical assumptions, and explanations about the reasons why a specific intervention in society is necessary, and why such policy, programme or project should be successful (should achieve its outcomes) (see also in general Taplin & Clark, 2013:1; Collins & Clark, 2013; Keystone, 2014:18; Stachowiak, 2013:2; Jackson, 2013:100; Vogel, 2012:12; INTRAC, 2012; Stein & Valters, 2012:3: Coryn, Noakes, Westine & Schroter, 2011; Ortiz & Macedo, 2010; ORE, 2004). James (2011:27) defines a theory of change as: “an ongoing process of reflection to explore change and how it happens – and what that means for the part we play in a particular context, sector and/or group of  people …” • “It locates a programme/project within a wider analysis of how change comes about. • It draws on external learning about development and how change happens at different levels. • It articulates our understanding of change – but also challenges and opens it up. • It acknowledges the complexity of change: the wider systems and actors that influence. • It is often presented in diagrammatic form with an accompanying narrative summary.” Guijt and Ratolaza (2012:3) also conclude that a theory of change can on the one hand be seen as a complex phenomenon that can sometimes be an output, and on the other hand sometimes be seen as a programme logic process. For the purposes of this book it can be seen as a mental model underlying the intervention. As is already clear from the above analysis this means that a theory of change contains normative, structural and operational attributes. It should be seen as a framework of both thinking and action (James, 2011:6). Theories of change are the backbones of systematic policy, programme and project analysis and assessment. They provide intellectual and conceptual tools to researchers, analysts and evaluators to better understand the phenomena and events that they investigate (Reeler, 2007).

80

Chapter 3 Vogel identifies the following uses for theories of change: Table 3.1: Different applications of theory of change thinking Programmes, implementation organisations, grant-making programmes

Donors and foundations

▪▪ Clarifying impact pathways in multiple operational contexts and sites ▪▪ Linking activities to changes at different levels: community, sub-national, national, international ▪▪ Results-management, evaluation and impact  assessment ▪▪ Linking multiple projects to a higher-level theory of  change ▪▪ Foundation for monitoring and evaluation planning ▪▪ Identifying synergies between strategies ▪▪ Identifying trade-offs and negative or unintended  consequences ▪▪ Programme scoping and design, strategic planning

▪▪ Theory-based evaluation of large-scale programmatic  areas ▪▪ Approaches to programme design and commissioning, country, sector and thematic ▪▪ Clarifying strategies and impact pathways

Civil society organisations and international NGOS

M&E trainers, consultants, organisational development

▪▪ Clarifying links between organisational values, vision, mission, strategy and programmes ▪▪ Conceptualising impact, mapping thematic theories of change ▪▪ Country programme impact, mapping thematic theories of change ▪▪ Country programme impact pathways ▪▪ Mapping collaborative relationships and influencing  strategies ▪▪ Monitoring, evaluation and learning frameworks ▪▪ Linking multiple projects to a higher-level outcomes  framework ▪▪ Testing links in theories of change in complex programme areas ▪▪ Supporting empowerment by linking individual change to wider change

▪▪ Theory-based impact evaluation for large-scale complex programmes ▪▪ Theory of change foundation for programme design, monitoring and evaluation and learning ▪▪ Theory of change-based strategic planning ▪▪ Exploring theory of change-based methodologies for small-scale evaluations

(Source: Vogel, 2012:13)

Theories of change comprise different combinations of “subjective”, normative beliefs, values and principles as well as more “objective”, facts and figures that constitute empirical evidence to substantiate the assumptions and proposals for specific types of interventions. To develop a theory of change a basic, fundamental theoretical and practical understanding of the social problem to be addressed, is needed. A theory of change therefore needs to be grounded in, or at least be informed by, both prior basic research evidence and knowledge of good practice. Basic research should act as the foundation for applied science (Auspos & Kubisch, 2004). In this way policies could be built on a solid knowledge foundation based on what the stakeholders perceive to be working in both theory and practice (Vogel, 2012:40). Theoretical ideas as well as practical knowledge or practice wisdom could be linked together to explain underlying assumptions. Assumptions are the conclusions reached on the basis of the state of knowledge and experience that the stakeholders have about the problem or situation at a given moment before the new policy intervention is started.

81

Theories of Change and Programme Logic Differently stated, this is the “theory” or set of coherent underlying beliefs about how and why a programme or policy will work if it is designed and implemented in a specific way. These assumptions are validated by means of knowledge gained from basic research and practice wisdom (University of Wisconson, n.d.). Below are four examples of theories of change which are normally drafted in a visual, logical flow model approach to indicate the linear or non-linear causal relationships postulated between causes and effects: Figure 3.1: CLEAR Theory of Change

(Source: CLEAR, 2013)

82

Chapter 3 Figure 3.2: Theory for Change of the Hunger Project in Africa Sustainable, vibrant, healthy rural communities free from hunger & poverty Long-Term Impact End Poverty and Hunger

Universal Education

Gender Equality

Healthy Mothers

Healthy Children

Global Partnership

Environmental Sustainability

Reduced HIV/AIDS

Formation of Country-Led Strategies Effective donor / Investor cultivation

Public policies reflect THP values / approach (national, international level)

Strengthened partnerships with civil society

Gender-based, rights-based country strategies

Replication and support of THPs approach to development

Strengthened partnerships with government

Strengthened multilateral collaborations

Sustainable Community Impact - Within 8 Years 2. Women’s Empowerment Movement toward gender equality

Reduce hunger and extreme poverty

Violence against women reduced

3. Food Security

Women participate in decision making processes at all levels

4. Literacy & Education

People in partner communities have access to nutritious food

5. Health & Nutrition People, especially children, are healthy

More children in school (especially girls)

Literacy and numeracy

Good health care incl. HIV/AIDS

Good nutrition

Phase IV 6. Water, Environment & Sanitation Sustainable environment: water, soils, woodlots.

Good Sanitation practices

7. Income Generation

Good Sanitation practices

8. Good Governance

Wealth in village through Investments in better homes, livestock, etc.

Many small businesses

Wide access to / use of social goods: health, education, water, etc.

9. Public Awareness, Advocacy & Alliances

Government accountability Less corruption

Collaboration & partnerships with civil society groups at all levels

Shifts In Community Capacity & Conditions - Within 6 Years 2. Women’s Empowerment Visible Women Initiated projects

3. Food Security

Increased women’s participation in decision making

Increased gender equality within the family & community

5. Health & Nutrition Increased access / use of health services

Increased awareness of health & nutrition issues

Partners are engaged in viable economic ventures

4. Literacy & Education

Improved agricultural systems

Increased access to nutritious food at household 7 community levels

Increased adult literacy. Increased access to / use of skills training

6. Water, Environment & Sanitation Improvements in HIV/AIDS prevention & treatments

Improvements in maternal & child health

Increased access to clean water & sanitation

7. Income Generation Women have continued access to financial services

Larger, more productive farms

Increased crop diversity

Proper food storage facilities

Improved legal protection and access to land

8. Good Governance

Improved savings & credit services available to poor in THP communities

Increased local government accountability

Better material conditions and economic opportunities

Increased access to quality education for all children

Epicenter Construction

Increased awareness about environmental sustainability

Communities inflate and compete projects

Epicenter committees are functional

Epicenter bulk and operating

Phases II & III

9. Public Awareness, Advocacy & Alliances Advocate with government aid agencies, investors to support and replicate THP model

Increased awareness of political, social, economic rights (especially women’s rights)

Collaborations & partnerships with civil society groups

Shifts In Community Capacity & Conditions - Within 2 Years 1. Community Mobilisation More effective community leadrship (specifically by women)

Epicentre Strategy

Education & Leadership training

Form partnerships with local NGOs/CBOs

Deploy animators, participatory action researchers

Increased agency: Mind-shift from “I cannot” to “I can”

Local community development (health, education, agriculture)

Vision Commitment Action workshops

Stronger collaborations / alliances within and among communities

Establish local governance structures

Gender Equity work

THP Africa Strategies MOBILIZING COMMUNITY

EMPOWERING WOMEN, CREATING AGENCY

Epicentre Strategy

Communities organise regular meetings

PARTNERING WITH LOCAL GOVERNMENT

HIV/AIDS awareness training

Micro finance

Phase I

Advocate for government accountability

Government build & equips schools, health facilities, latrines, and other infrastructures

(Source: THP, 2013:162)

83

Theories of Change and Programme Logic Figure 3.3: Theory of Change for the Impact Investing Initiative

Lives of poor and vulnerable improved Housing Health care Clean water Sanitation Energy

The poor and vulnerable can capture real benefits

Availability of affordable products and services improved

Income generating activities expanded

Physical environment improved

Wage-levels permit real income gains for the poor Profitability of businesses ensure surplus for allocation to impacts

Private business corporations engaged

Small and medium enterprises engaged

Micro enterprises engaged

Local impact investment funds engaged

Social enterprises engaged

Number and size of for-profit impact investments increased Global economic conditions encourage investing Regulatory, fiscal and reputational incentives encourage investing

Pension funds public - private engaged

Foundations, endowments engaged

Private equity funds engaged

Family offices engaged

Non-profit - social / green funds engaged

Government agencies engaged

Retail investors engaged

For-profit impact investment efficiently placed by full ecosystem Cost-sharing by partners is sustained

Competing systems co-exist

Collective action platforms created

Intermediaries’ business models sustain expansion

Industry infrastructure developed

Intermediaries scaled

Coalition-building drives policy change

Policy reforms instituted

Catalysing activities undertaken

Grants for collective action platforms approved

Grants for industry standards approved

Grants for scaling intermediaries approved

PRIs for scaling intermediaries executed

Grants for research and advocacy approved

Communications outreach carried out

Brokerage and networking carried out

(Source: Jackson and Harji 2012)

84

Chapter 3 Figure 3.4: Theory of Change for Enhancing Results-based Management in Organisations

Results Chain More effective, efficient and relevant programs and more credible reporting Ultimate Outcomes

Intermediate Outcomes

External Influences: Organisational history with RBM. Funding agency needs. Influence of peers. Resources available to the organisation Organisational change

Regular use of results information in management decision making

Institutionalisation of integrated RBM systems and strategic management

Casual Links

Assumptions: Programs are completely delivered [O]. Risks: Delivery partners do not appreciate the value of results information [I]. Changes are made to organisational funding [O]. Frequent changes occur in priorities [O].

Assumptions: Senior management visibly supports using results information [I]. A robust results culture is established leading to informed demand for results information [I]. Results information is timely, pertinent and credible, and not overwhelming [I]. RBM regime is kept fresh and relevant with regular review so as not to become a burden [I]. Risks: Frequent changes in strategic priorities render results information less relevant [O].

Assumptions: Trial RBM efforts demonstrate their worth [I]. Useful and credible results information is produced [I]. Senior management support continues [I]. Measurement systems are practical and credible [DI]. Systems are user-friendly [I]. Risks: RBM initiative is sidelined by other events [O]. The RBM intervention seems at odds with other initiatives [I]. Enhanced RBM planning processes, Monitoring & Evaluation (M&E) systems, reporting and RBM capacity

Immediate Outcomes

RBM capacity of management and staff enhanced. Professional RBM capacity enhanced. Outputs

Information on RBM Training and workshops Facilitation of organisational change

Assumptions: Appropriate resources are forthcoming [I]. RBM frameworks and systems are seen as useful [DI]. Enhanced capacity is adequate to support further RBM development [DI]. Risks: Negative peer experiences undermine support for RBM [I]. Organisational culture is difficult to change [I]. Funding agency expectations are high [I].

Assumptions: Appropriate audiences in the organisation are reached [C]. Training and information is effective [DI]. Adequate acquisition of in-house professional RBM capacity takes place [DI].

Legend: [C] control, [DI] direct influence, [I] influence, [O] outside of influence

(Source: Canada, 2012:17)

85

Theories of Change and Programme Logic The scope and nature of change processes can either be small, restricted to a narrow focus, straightforward and relatively simple, or they can be wide-ranging, comprehensive and highly complex (Rogers, 2008). Rogers summarises the differences between complicated and complex programmes as follows: It “draws on Glouberman and Zimmerman’s conceptualization of the differences between what is complicated (multiple components) and what is complex (emergent). Complicated programme theory may be used to represent interventions with multiple components, multiple agencies, multiple simultaneous causal strands and/or multiple alternative causal strands. Complex programme theory may be used to represent recursive causality (with reinforcing loops), disproportionate relationships (where at critical levels, a small change can make a big difference — a ‘tipping point’) and emergent outcomes” (Rogers, 2008). At the individual or group level alternative or competing theories of change provide different explanations of and specific policy, programme and project strategies and implications for individual or collective attitudinal development and change (Retolaza, 2011). These theories of change may try to explain the basic or fundamental needs of individuals and how need satisfaction should be undertaken (Maslow, 1943; Max-Neef, 1991). They might also try to explain how individuals experience change processes cognitively, ideologically and emotionally and how these change processes in society should or can occur to achieve the best and most durable results (Coffman, 2007; Cloete, 2011:2013; Howlett & Ramesh, 2009; Stachowiak, 2013). Intermediate theories of change attempt to describe, explain or predict activities and changes at community, regional or sectoral levels. They may include alternative or competing explanations for and proposals for very specific policy, programme or project strategies to efficiently and effectively – • develop stable, vibrant and sustainable communities, • improve community participation in municipal services delivery, • alleviate poverty in a community, • reduce crime in the community, • improve regional economic development, • improve regional public health conditions, • maximise agricultural harvests, • improve educational literacy levels, • stimulate optimal economic growth, and • protect and manage the environment. For example, these intermediate theories of change attempt to explain why specific types of job creation (e.g. small entrepreneurs) should work better than other types (big corporate jobs), why visible police patrols might be more effective to reduce crime than heavy penalties for certain crimes, why outcome-based education is supposed to produce better educated entrants for the marketplace, etc. (see also Vogel, 2012:17). They will inevitably be based on some of the normative assumptions contained in the grand theory ideologies summarised below, or on more empirical knowledge, experiences, evidence and good practices that might have emerged over time in these specialised policy sectors and professions. The normative values underlying these intermediate theories of change are, however, in many cases very indirect, while strategic, tactical and other

86

Chapter 3 empirical considerations like the context within which the intervention takes place, the latest knowledge, experience, technologies and other variables like the best strategies for policy change to achieve eventual social change might be more prominent (see for example Stachowiak, 2013; Vogel, 2012:18; Cloete, 2011:230). Some theories of change are more normative and prescriptive than others, focusing on specific recommended courses of action at a macro policy level to achieve desired goals. Ideologies like capitalism, socialism, liberalism, conservatism, nationalism, democracy, theocracy, authoritarianism, feminism and environmentalism are excellent examples of prescriptive theories of change to a more desired state of affairs that may underlie policy, programme and project interventions in order to improve specified conditions in society according to the ideology concerned. One can, for example, identify socialist-type redistributive policies, programmes and projects directly controlled by government to promote more social and economic equality in order to kick-start more equitable economic growth and development in society. This is frequently contrasted with the opposite focus of other policies, programmes and projects by government just to create an enabling environment for more capitalist, free-market competition-type job creation and economic growth initiatives in the business sector, in order to generate more surpluses for a more equitable redistribution of resources in society. Other examples of prescriptive ideological theories of change include predominantly conservative policy, programme and project interventions that aim at consolidating and improving morals, ethics and “good”, stable family values and social institutions in society (the so-called interventionist, dominant, nanny state). This approach to social change might in other cases be contrasted by interventions predominantly aimed at maximising the independence, autonomy, liberty and freedom of choice of individuals through minimal state intervention and prescription in order to promote personal development, creativity, innovation and self-fulfilment (so-called minimal or absent state). The above examples of ideologies focussed more on the contents of interventions, i.e. what to do in order to achieve the desired results. Other ideologies prescribe specific processes through which social interactions and change should take place, for example more peaceful, evolutionary, gradual, negotiated reform and change in democratic processes over longer periods of time, versus more violent, authoritarian, dramatic revolutionary change processes within shorter time frames (see in general also House & Howe (1999) on values in evaluation). There are therefore many alternative or competing theories of change at different levels of organisation (James, 2011:6). In contrast to the theories of change at individual, community and regional levels which can be regarded on the one hand as grassroots or intermediate level theories, the ideologies summarised above can be applied at the “grand theory” or macro level, as they try to explain why and how change takes place in society in general or in specific sectors of society (Guijt & Retolaza, 2012:3). As explained above, some theories emphasise economic reasons for societal change (e.g. capitalist or socialist explanations for change). Others emphasise social and cultural reasons for change (e.g. development, underdevelopment, nationalism), while others focus on political power relationships that change (e.g. democracy, elite relationships that change, etc.). These theoretical explanations

87

Theories of Change and Programme Logic of change generally also follow specific processes and time frames: either fast, dramatic changes to societal structures and interactions accompanied by upheavals, conflict and frequently violence (e.g. revolutionary changes) or more gradual, evolutionary, incremental changes over a longer period that are more peaceful (e.g. negotiated, constitutional, democratic reforms in society). There are generally causal linkages among theories of change at different societal levels, because the strategies employed to change individual or community attitudes are supposed to be logically derived from intermediate and macro-level models and theories. The purpose why a cash transfer programme or a job creation or wealth distribution programme is conceived and the strategies on how they are implemented, are always underpinned by one or another competing macro-level ideological assumption, belief or viewpoint. Liberal macro-economic policy programmes and projects to stimulate economic growth are totally different from socialist-type wealth redistribution policies, programmes and projects. The first typical capitalist strategy attempts a linear redistribution of wealth through economic growth, while the second typical socialist strategy attempts to achieve growth through the redistribution of wealth. The obvious, middle of the road compromise policy position would be to synchronise growth and redistribution in a more complex, non-linear, parallel manner and not in an oversimplified, linear cause-effect relationship. This implies projects that explicitly attempts to fast-track growth, existing simultaneously with projects deliberately aimed at maximising the redistribution of scarce resources (e.g. the creation of a balance between job creation on the one hand and a social security network that does not disincentivise recipients to take up a job, but which does provide for a minimum income to poor individuals who do not have a stable income). This is obviously easier said than done, but it is the challenge of good governance facing every government. The development of theories of change at individual grassroots or intermediate community levels therefore always has to be conceptually linked to the philosophical, religious or ideological paradigm underlying such a policy, programme or project. This underlying value framework can be explicit or implicit. Retolaza (2011) explains in a practical way how a theory of change can be constructed. James (2011:26) provides a useful visual summary of the application of the principles of a theory of change at different levels of organisation: “The diagram below shows the different levels at which people explore or describe their theory of change, from macro to project level; and where some other tools or processes can fit in within the cycle of learning (in blue): from analysis to planning to implementing to review which then feeds back into a revised theory. While a broad approach involves analysis at organisational level (in green) and can link with processes like outcome mapping; logframes and associated tools are more useful at project level (in orange) to describe what you plan to do and to help review progress at that level”.

88

Chapter 3 Figure 3.5: Macro Theory of Change MACRO THEORY OF CHANGE Explore: broad social change theories

SECTOR/PEOPLE GROUP THEORY OF CHANGE Review:

Explore and analyse:

Reflect on theory of change

ORGANISATIONAL THEORY OF CHANGE

jj

• What changes achieved at what levels? • What unexpected outcomes? • What lessons learned? • Revised theory

Review: project theory & logframe

Outcome mapping

Explore and analyse:

Project

Strategic partners

Observe, reflect, analyse how change happens for a target group / context: • the actors, systems, processes that influence change • what do we know from wider learning? Outcome mapping

PROJECT THEORY OF CHANGE

Analyse: problems; actors; changes at each level; factors bringing change. Problem & objective trees

Do: baseline, implement

organisational theory of change: its contribution in light of mission, capacity, beliefs, approach: • What changes will we contribute to? Our beliefs about how change happens: prioritised actors; changes at intermediary levels and key hypotheses for change

project & monitor

Plan: Organisational strategic planning

Plan: make operational

project plan & describe in logframe

Strategic partner

Do: implement strategy,

Plan: develop operational plan & M&E framework based on strategy

Strategic partner Projects

work with strategic partners & monitor

(Source: James, 2011:26)

If the theory of change is valid and accurate, the intervention should be successful if it has been designed correctly, is feasible and is implemented as planned. If the intervention proves to be unsuccessful, the reasons for failure must lie in one or more of the following: • The theory of change is not correct (e.g. it is based on incorrect, inaccurate or incomplete assumptions, data or logic, i.e. a bad theory of change), and/or • The design of the policy, programme or project intervention has flaws (e.g. it is too narrowly focused and does not take multi-dimensional contributions of other policies, programmes and projects sufficiently into account or it did not take into account unintended consequences, i.e. it was bad planning, but not a bad theory of change), and/or • The implementation of the policy, programme or project was defective (e.g. the timeframes were wrong, resources were insufficient, contractors defaulted, i.e. it was bad management and not a bad theory of change) and/or • E xternal factors beyond the policy, programme or project designer’s control (e.g. a natural disaster, a regime change, a war, a financial disaster, etc. amounting to just plain bad luck but not to a bad theory of change). Lempert (2010) provides a useful assessment of good governance practices that can improve the chances of success of evaluations themselves.

89

Theories of Change and Programme Logic

3.2

Origins of the concept “theory of change”

There seems to be relatively widespread agreement among scholars that the theory of change approach was suggested for the first time by Carol Weiss in a 1995 publication, “New Approaches to Evaluating Comprehensive Community Initiatives”, by the Aspen Institute Roundtable on Community Change (Weiss, 1995). Weiss, a member of the Roundtable’s steering committee on evaluation, argued that a key reason why complex policies and programmes are so difficult to evaluate is because the assumptions that inspire them are poorly articulated or in some cases not even identified at all. This leads to confusion as to how the process of development and implementation will unfold and therefore places little attention on the evaluation of early and mid-term indicators (formative evaluation) that need to be implemented in order for a longer term goal to be reached. Weiss suggested an alternative type of evaluation namely “theory-based evaluation” that investigates the “theories of change” that underlies the intervention. Chen (2005) refers to the theory of change as conceptual theory and distinguishes it from what he calls action theory (also called programme logic), which comprises a series of consecutive steps that need to be taken in order to achieve the intended results. The theory of change and the programme logic, which is the strategy on how it is implemented, are inextricably linked. The programme logic in essence unpacks the mental model that comprises the theory of change in a concrete, practical way that is supposed to be measurable. Figure 3.6: Conceptual theory and Action Theory Outcomes are observed characteristics of the target population or social conditions, not of the programme [Rossi et al, 205]. It does not mean the programme targets have actually led to the change in the observed outcome... Intervention

Determinants

Action Theory success

Outcomes

Conceptual theory success

Theory of change (conceptual theory): The process by which change comes about (for an individual, organisation or community) Theory of action (action theory): How the intervention is constructed to activate the theory of change (Chen, 2005:248, Rogers & Funnel, 2011)

James (2011:2-3) provides an excellent summary of the development of the theory of change approach – how it is currently perceived and applied by different stakeholders.

3.3

Developing a Theory of Action’s Programme Logic

A theory of change leads policy designers and implementers to develop a programme logic model to explain the activities and processes of the intervention that will have to be made in order to achieve the transformation goals set. These are the desired effects or outcomes and impacts of the envisaged policy. The programme logic of the theory of change is

90

Chapter 3 therefore its action or business plan or its implementation strategy. An evaluation focuses on what activities have been decided upon, how they have been implemented and what their consequences were. This will shed light on which aspects of the policy or programme are working, which not and why? These evaluation insights can inform future decisions regarding maintenance, termination of adaptation to the policy in question, in order to improve its efficiency and effectiveness in future (Bichelmeyer & Horvitz, 2006). It is therefore important to frame the problem(s) or issue(s) to be addressed within a sound theoretical framework of change. If this basic research homework is not done, a defective understanding of the nature of the problem is possible, leading to the development of a policy programme or project to resolve the problem or improve it, but that are based on incorrect assumptions and therefore doomed to fail. Unfortunately this is a general weakness in many policy, programme and project interventions, where the explicit development of the theory of change underlying a specific intervention is frequently absent (Vogel, 2012:21). This complicates the theory-driven evaluation of such an intervention, because the originally envisaged logic underlying the decision to undertake that intervention is absent and then has to be constructed or reconstructed in order to be able to do the evaluation effectively (Taplin et al., 2013:4). The programme logic approach is generally accepted as the most useful way to unpack the practical implementation of the theory of change. The origins of the programme logic model are traditionally traced to Suchman (1967), Weiss (1995), Bennett’s (1976) hierarchy of evidence and Wholey’s (1979) evaluation techniques. The programme logic model is an analytical tool used to plan, monitor and evaluate projects. It derives its name from the logical linkages set out by the planner(s) to connect a project’s means with its ends (University of Wisconson, n.d.). The programme logic model identifies the following elements of a policy intervention: • the issues being addressed and the context within which the policy takes place; • the inputs, i.e. the resources (money, time, people, skills) being invested; • the activities which need to be undertaken to achieve the policy objectives; • the initial outputs of the policy; • the outcomes (i.e. short and medium-term results); • the anticipated impacts (i.e. long-term results); and • t he assumptions made about how these elements link together which will enable the programme to successfully progress from one stage to the next. The programme logic model assumes sequential, linear cause and effect relationships. A leads to B leads to C. For example, “capacity” building could reflect the following dynamics: training (A) leads to increased knowledge (B), which leads to employment (C). This approach offers a format for connecting levels of impact with evidence and begins by requiring specification of the overall goal and purposes of the project. Short-term outputs are linked logically to those purposes. Activities are identified that are expected to produce outputs (Patton, 1997). The language of the programme logic model can be confusing because the term goal is also referred to by some proponents of the logical framework

91

Theories of Change and Programme Logic as mission, and purposes are called objectives or outcomes and outputs are short term, end-of-project deliverables. For every goal, purpose, output and activity, the framework requires specification of objectively verifiable indicators, means of verification (types of data), and important assumptions about the linkage between activities and outputs, outputs to purposes and purposes to goals (Patton, 1997). Figure 3.7: Introduction to programme logic

• A policy programme/project is defined as a sequence of goals/objectives. • Policy programme planners depart from a desired integrated long term social vision or impact and work backwards from that impact goal to determine what should be done to achieve it. • To achieve the desired impact, different sector-specific intermediate goals (outcomes) must be reached. • To achieve the desired outcomes the program must produce specific concrete deliverables or outputs. • To produce the necessary outputs specific activities are required. • To undertake the activities, appropriate resources or inputs must be available.

If... Then...

Basic assumptions

(Adapted from http://www.audiencedialogue.net/proglog.html)

The Canadian International Development Agency bases its activities on the following logic model: Figure 3.8: CIDA logic model

CIDA

Logic model - Terms and Definitions Ultimate Outcome

Intermediate Outcome

Immediate Outcome

Outputs

Activities

Inputs

The highest-level change that can be reasonably attributed to an organisation, policy, program or initiative in a casual manner, and is the consequence of one or more intermediate outcomes. The ultimate outcome usually represents the raison d’étre of an organisation, policy, program or initiative and takes the form of a sustainable change of state among beneficiaries. A change that is expected to logically occur once or more immediate outcomes have been achieved. In terms of time frame and level, these are medium term outcomes, which are usually achieved by the end of a project/program and are usually at the change of behaviour/practice level among beneficiaries.

Why?

What?

Result: A describable effect or measurable change in state that is derived from a cause and effect relationship. Results are the same as outcomes and further qualified as immediate, intermediate and vulnerable.

A change that is directly attributable to the outputs of an organisation, policy, program or initiative. In terms of time frame and level, these are short-term outcomes and are usually at the level of an increase in awareness/skills of... or access to... among beneficiaries. Direct products or services stemming from the activities from an organisation, program, policy of initiative.

Actions taken or work performed through which inputs are mobilised to produce outputs.

The financial, human, material and information resources used to produce outputs through activities and accomplish outcomes.

How?

Development Results: Reflect the actual changes in the state of human development that are attributable, at least in part, to a CIDA investment.

Fillable form.

(Source: CIDA, 2008)

Since 1962, the International Society for Performance Improvement (ISPI, 2010) has researched, developed, implemented and evaluated organisational and individual performance improvement programmes. Its Human Performance Technology (HTP) theory of change and programme logic model is in frequent use by Certified Performance

92

Chapter 3 Technologists in over 50 countries, including annual conferences in the Europe, Middle East and Africa (EMEA) region. The ISPI Programme Logic locates the various forms of evaluation centrally, in the Human Performance Technology model (Figure 3.9). Figure 3.9: The Human Performance Technology (HPT) Model Performance Analysis

(Need or Opportunity) Cause Analysis

Organisational Analysis

Desired Workforce

Vision, Mission, Values, Goals & Strategies)

Intervention Selection

Lack of Environmental Support - Data, Information and Feedback - Environment Support, Resourcs and Tools

• Design and Performance Analysis • Cause Analysis • Selection/Design of Interventions Summative • Immediate Reaction • Immediate Competence Confirmative • Continuing Competence (Job Transfer) • Continuing Effectiveness (Organizational Impact)

- Consequences, Incentives or Rewards

Performance Support (Instructional and Non-instructional)

Gap

Job Analysis/Work Design Personal Development Human Resource Development Organisational Communication Organisational Design and Development Financial Systems

Lack of Repertory of Behaviour - Skills and Knowledge

Environmental Analysis Organisational Environment (Society, Stakeholders & Competition)

Actual State

- Individual Capacity

of Workforce

- Motivation and Expectations

Intervention Implementation and Change Change Management Process Consulting Employee Development Communication, Networking & Alliance Building

Work Environment (Resources, Tools, Human Resources Policies)

Evaluation

Work (Work Flow, Procedure, Responsibilities & Ergonomics) Worker (Knowledge, Skill, Motivation, Expectation & Capacity)

Copyright 2004. The International Society for Performance Improvement HPT model is from page 3 of Fundamentals of Performance Technology, Second Edition. All rights reserved by DM Van Tiem, JL Moseley and JC Dessinger, published by ISPI in 2004.

Formative • Performance Analysis • Cause Analysis • Selection/Design of Interventions

Summative • Immediate Reaction • Immediate Competence

Meta Evaluation/ Validation • Formative, Summative, Confirmative Processes • Formative, Summative, Confirmative Products • Lessons Learned

Confirmative • Continuing Competence (Job Transfer) • Continuing Effectiveness (Organisational Impact)

93

Theories of Change and Programme Logic The most authoritative programme logic model in South Africa is that of National Treasury in the following form: Figure 3.10: Key performance information concepts Key performance information concepts The developmental results of achieving specific outcomes

IMPACTS

What we aim to change? The medium-term results for specific beneficiaries that are the consequence of achieving specific outputs

Manage towards achieving these results

OUTCOMES

What we wish to achieve? The final products, or goods and services prodced for delivery

OUTPUTS Plan, budget, implement & monitor

What we produce or deliver? The processes or actions that use a range of inputs that produce the desired outputs and ultimately outcomes

ACTIVITIES

What we do?

The resources that contribute to the production and delivery of outputs

INPUTS

What we use to do the work?

(Source: National Treasury, 2010:3)

The programme logic model is also referred to in performance measurement circles as the results chain: Figure 3.11: Results Chain

RESULTS INPUTS

ACTIVITIES

OUTPUTS

- Human resources - Money - Technology - Physical assets

Activities, actions, tasks, processes to “produce” outputs

Results of the activity such as services delivered, events organised, clients contracted, etc.

OUTCOMES Direct change in condition or behaviour of a target group or institution that results from the outputs = ATTRIBUTABLE TO OUTPUTS (cause-effect)

Causality!

IMPACT

Long term economic worth of activities and long-term effects on people and environment as a result of intervention

EXTERNALITIES - Incentives - Finance etc - Policies etc 6. Budget and other resources

5. Activity, task, list, work plan...

3. List of expected outputs

1. List of expected outcomes

4. How measured? = Indicators and targets

2. How measured? = Indicators and targets

(Source: Canada, 2012:5)

94

Chapter 3 When a government’s evaluation system is mainly based on a linear design of cause and effect where the main links are between inputs, activities, outputs and outcomes, a reductionist approach is followed. This assumption of linear relationships between causes and effects has a restricted utility, because many public sector programmes are multi-dimensional and span across many sectors which makes it difficult to isolate sectoral programme effects from one another. Hovland (2007:4), for example, argues that focusing purely on the separate elements of the programme logic model (resource inputs and activities, outputs, outcomes and impacts) is not sufficient for policy development. Nevertheless, any complex system consists of a rich interaction between simple variables that one can evaluate at smaller scales. Important new methodologies to evaluate bigger complex systems have also started to emerge that supplements the over-simplistic programme logic model summarised above (Patton, 1997; Rogers, 2008). The process of developing the programme logic of the theory of change normally starts by first establishing the overall transformative vision or impact to be achieved in the long term and then working backwards by breaking that vision down into measurable mediumterm outcomes to be achieved in a specific shorter timeframe. These outcomes are then broken down further into a range of required measurable concrete deliverables or outputs that constitute critical milestones that provide evidence of progress towards coming closer to the envisaged outcomes and eventually the final transformative impact (Taplin et al., 2013; Keystone, 2014). In this way a logically consistent and coherent set of measurable causal linkages can be established between the baseline state of the current problem to be addressed and the desired transformative impact to be achieved. This then comprises the theory of change and its inherent programme logic. Taplin et al. (2013:11) refers to a useful video on Youtube (http://www.youtube.com/watch?v=YJSMa7AA3cU), summarising the theory of change of the University of Arizona’s intervention to increase the number of its minority students. Duignan (2009) also provides another useful explanation on Youtube of theories of change and programme logic models. Numerous other video clips on these topics are available on Youtube. In his above video clip, Duignan explains the different terms used to refer or relate to the idea of a theory of change, including “impact pathways” and “outcome mapping” (Earl, Cardin & Smutylo, 2001; James, 2011:10). Vogel (2012:17) has also compiled a useful table copied below, listing different concepts used in theory of change models. Various concrete implementation monitoring templates in the form of logical framework templates (known as logframe templates) were also developed as a measuring tool by evaluators seeking a better way to measure causality, evaluating the impact of a programme and assessing the accuracy of theories of change (Weiss, 1997). The use of this logic matrix as a planning and monitoring tool allows for precise communication about the purposes of a project, the components of the project, the sequence of activities and the expected accomplishments to be achieved (McCawley, n.d.). The logframe template is therefore normally designed in the form of a matrix combining inputs, outputs and outcomes with time frames, budgets and responsible agencies or persons (Table 3.2).

95

Theories of Change and Programme Logic Figure 3.12: Concepts used in theory of change thinking

More evaluation-informed

Complexity-informed

More social change-informed

Programme Theory / Logic

Pathways Mapping

Models of Change

Outcomes Chain

“What would it take?” approaches

Dimensions of Change

Intervention Theory Causal Pathway Impact Pathway

How History Happens Change Hypotheses

Learning / Evaluation Explore / Explain

Open Enquiry & Dialogue

Linear / Complex

Logic Model

Reflective Theory of Change Practice

Causal Model

Rich Picture

Single Programme Logic Macro/Sector Theory of Change Assumptions About Change

Multiple Outcome Pathways

Future Timeline

‘Tipping Points’

Feedback Loops

Emergent, non-linear, recursive

Road-map

Systems Thinking About Change

Beliefs About Change

(Vogel, 2012:17, as adapted from James 2011)

Most donors in the international arena use some or other version of a logframe template for this purpose. The version developed by the United States Agency for International Development (USAID) could be regarded as the most popular one for this purpose (TaylorPowell & Henert, 2008). Vogel (2012:18-21, 46) assesses the main points of contemporary criticism against the too dogmatic use of logframe templates in practice (especially in the UK), that tends to conflate these monitoring tools with a theory of change which it clearly cannot be (also see James, 2011:10; Muspratt-Williams, 2009; Örtengren, 2004; NORAD, 1999 & http://www.knowledgebrokersforum.org/wiki/item/theory-of-change-vs-logframes for other useful assessments of the use and the utility of the logframe template approach). As indicated earlier, the programme logic model itself or its implementation plan in the form of the logframe matrix should only serve as a broad systematic framework for evaluation. They guide the evaluator to collect data that can help to determine what has to happen and what actually happened during the intervention (Bichelmeyer & Horvitz, 2006). Clark (2004) provides a useful overview of how to determine the scope of a theory of change.

96

Chapter 3 Table 3.2: The logframe monitoring tool

The logframe represents a means of check-listing exhibiting key implementation planning data Hierarchical Objectives

Key Performance Indicators

Means of Verification

External Factors/ Dependencies

Measures to verify accomplishment of the GOAL

Sources of data needed to verify status of the GOAL level indicators

Important external factors necessary for sustaining the objectives in the long-run

Measures to verify accomplishment of the PURPOSE

Sources of data needed to verify status of the PURPOSE level indicators

Important external factors needed to attain the GOAL

Measures to verify accomplishment of the OUTPUTS

Sources of data needed to verify status of the OUTPUTS level indicators

Important external factors needed to attain the PURPOSE

A summary of the Project Budget

Sources of data needed to verify status of the ACTIVITIES level indicators

Important external factors that must prevail to accomplish the OUTPUTS

GOAL-OUTCOMES/IMPACT The higher order objective to which the project contributes

PURPOSE The effect or impact of the project

OUTPUTS The deliverables or Terms of Reference of the project

ACTIVITIES The main activities that must be undertaken to accomplish OUTPUTS

(Source: McCawley, n.d.)

Vogel (2012:14) summarises the core content of a theory of change as follows: “As a minimum, theory of change is considered to encompass a discussion of the following elements: • Context for the initiative, including social, political and environmental conditions and other actors able to influence change. • Long-term change that the initiative seeks to support and for whose ultimate benefit. • P rocess/sequence of change that is anticipated in order to create the conditions for the desired long-term outcome. • A ssumptions about how these changes might happen, as a check on whether the activities and outputs are appropriate for influencing change in the desired direction in this context. • Diagram and narrative summary that captures the outcomes of the discussion.” The task of evaluation is to explore the causal linkages foreseen in the theory of change along the programme logic results chain which unpacks the theory of change in measurable units from inputs through activities and outputs to impacts. Keystone (2014), a registered charity in the UK, South Africa and the US, aims to improve the effectiveness of social purpose organisations. It works with such organisations to develop better ways of planning, measuring and reporting social change. It, inter alia, specialises in the development and promotion of theories of change for evaluation purposes, and it has developed a number of tools for this purpose. One such tool is a practical guide and template to develop a theory of change (http://www.keystoneaccountability.org/resources/guides). The International Network on Strategic Philanthropy (INSP, 2005) has also developed a manual to assist

97

Theories of Change and Programme Logic stakeholders in development programmes and projects to develop theories of change for their activities. Similar guidance is inter alia available from Spreckley (2009), James (2011), Taplin and Clark (2012) and Guijt and Retolaza (2012). Keystone (2014) suggests the following practical benefits of developing an explicit theory of change at the start of the intervention:

3.4.1 Benefits and uses of a theory of change “When you have a good theory of change you have: • A clear and testable hypothesis about how change will occur that encourages learning and innovation and enables you to demonstrate accountability for your results. • A visual representation of the change you want to see in your community and how you expect it to come about. • A clear framework for developing your strategies and a blueprint for monitoring your performance with your constituents because measurable indicators of success identified have been identified. • Agreement among stakeholders about what defines success and what it takes to get there. • A justification for developing your organisational capabilities. • A powerful communication tool to capture the complexity of your initiative. You can use your theory: • As a framework to check milestones and stay on course. • To document lessons learned about what really happens. • To contribute to social learning about what works in development. • To keep the process of implementation and evaluation transparent, so everyone knows what is happening and why. • To persuade donors to invest in longer term outcomes rather than short projects. • As a basis for reports to stakeholders, donors, policymakers, boards.” (www.keystoneaccountability.org) Vogel (2012) has undertaken the most comprehensive assessment so far of the nature and different uses of the concept “theory of change”. Her generic model of theory of change thinking is the most accurate reflection yet of the issue concerned, and is shown in Figure 3.13. The ideal way to integrate the development of a theory of change into the general policy management cycle is summarised well in Figure 3.15.

98

Chapter 3 Figure 3.13: Generic theory of change thinking model Multiple and aggregated development processes contribute to long-term change and positive outcomes for people’s lives

Other factors, actors condition influence change

Context: factors, drivers, actors, capacities, institutions, structures, systems, communities...

LONG-TERM LASTING CHANGE Possible indirect benefit of programme

MEDIUM TERM CHANGES Indirect influence - policy shapers, knowledge networks, planners, practitioners, stakeholder groups Multiple impact pathways

SHORT-TERM CHANGES Behaviour changes by key actors Direct influence - partners, collaborators, Immediate programme target groups

Direct benefits of programme

Use of outputs Programme sphere of control: Activities; stakeholder engagement; outputs

Assumptions: - models of change - pathways of change - cause-effect relationships - appropriateness of strategies to support changes in this context

Outputs

Stakeholder engagement

Adapted from Montague 2007

(Source: Vogel, 2012:22) Figure 3.14: Theory of change and the project cycle First step in Theory of change

Context Analysis

Theory of change identifies key hypotheses to test through evaluation Evaluation

Design

Project cycle Management

Annual Review Implementation Theory of change should be reviewed at least annually revised if needed

Theory of change thinking about possible options drives the BC

Implementers will need to develop a more detailed Theory of change

(Source: Vogel, 2012:71)

99

Theories of Change and Programme Logic

3.4

Theories of Change in the African Context

Two theories of change that have been developed for application in the African context have been included above: the envisaged theory of change underlying the activities of CLEAR-AA in Africa (CLEAR, 2013), and the theory of change underlying the Hunger Project (2013:162). Buonaguro and Louw (2013), included at the end of this chapter, contains another example of a theory of change application for a project in Mozambique. The theory of change approach has also been fully embraced and institutionalised in the South African public sector. The National Evaluation Policy Framework (NEPF) which guides and regulates public sector evaluations in South Africa from the Department of Performance Monitoring and Evaluation (DPME) in the Presidency defines a theory of change as a “... tool that describes a process of planned change, from the assumptions that guide its design, the planned outputs and outcomes to the long-term impacts it seeks to achieve” (NEPF, 2011:20). It states explicitly that “(e)valuation can be applied to new programmes, as well as existing ones. In new ones the key components to be used first include diagnostic evaluations to understand the situation and develop a theory of change, and design evaluations to check the design and theory of change after the planning has taken place” NEPF (2011:10), and also that a “good quality plan should include a diagnostic analysis of the current situation and the forces at play, and which are likely to be the main strategic drivers of change. It should also explain the logic model or theory of change of the plan, in other words, the causal mechanisms between the activities, outputs, outcomes and impacts. It should explain the underlying hypothesis that if we do what we suggest, we will achieve certain objectives and targets. It should also be explicit about the assumptions being made about the external environment.” “One of the purposes of evaluation is to test this logic model by asking questions such as: • Were the planned outcomes and impact achieved, and was this due to the intervention in question? (Changes in outcome and impact indicators may have been due to other factors) • Why were the outcomes and impacts achieved, or not achieved? • Were the activities and outputs in the plan appropriate? • D id the causal mechanism in the logic model work? Did the assumptions in the logic model hold?” (NEPF, 2011:4). The NEPF explicitly identifies the prominent role of a theory of change in the design of government interventions in Figure 3.15. The theory of change underlying the system of public sector evaluation in the country is summarised in Figure 3.16.

100

Chapter 3 Figure 3.15: Types of evaluation

IMPACTS

Economic evaluation What are the cost-benefits?

OUTCOMES

Impact evaluation Has the intervention had impact at outcome and impact level, and why?

OUTPUTS

Implementation evaluation ACTIVITIES

What is happening and why?

INPUTS

DESIGN

Diagnostic

Design evaluation

What is the underlying situation and root causes of the problem

Assess the theory of change

(Source: NEPF, 2011:8)

101

Theories of Change and Programme Logic Figure 3.16: Outline of the South African Government’s Evaluation Programme logic

An issue becomes identified as a public concern and a policy on it is developed

A programme to implement the policy is designed

Its programme logic clearly shows how undertaking specific activities that have calculated outcomes will lead to the achievement of the intended policy impacts

Ways of checking if those activities, outcomes and impacts are happening are chosen These are indicators

The legislature provides funding and the public officials do the activities described in the programme

As implementation rolls out, work gets done and records kept

The logic process flows and the performance indicators send managers and officials clear signals about what they should do (”Doing the right things”) and what is important (”Doing things right”)

Public scrutiny and robust systems results in good management

The records are captured, verified and analysed into reports Reports are compared to plans and benchmarks such as international best practices

Success is identified and replicated

Accountability is improved

Challenges are highlighted and addressed

Evidence based decision making around resources is facilitated

Affected stakeholders are involved extensively and consistently

Public services become more effective and poverty is eradicated = Census & surveys, admin data sets etc.

= Performance information

= Evaluation

= Follow up actions

(Source: The Presidency, 2007:6)

102

3.5

Case: Developing impact theory for a social protection programme in Maputo, Mozambique Lucilla Buonaguro and Johann Louw

3.5.1 Introduction In the last ten years or so the literature on social protection has expanded significantly, and there is consensus on its importance as a poverty alleviation strategy (Barrientos & Hulme, 2008; Gentilini & Omamo, 2011). Despite this consensus, there is a wide range of programmes and policies that are counted as part of social protection, which complicates conceptual clarity and understanding of what is delivered in practice. In addition, policy approaches that offer social protection to vulnerable members of society vary from region to region. Norton, Conway and Foster (2002:543) have pointed to the danger of this, that different researchers and development agencies will have different meanings in mind when they write about it. In this paper we follow their definition: “Social protection refers to the public actions taken in response to levels of vulnerability, risk and deprivation which are deemed socially unacceptable within a given polity or society”. Mozambique is a country where, despite significant socio-economic progress in the last 15 years, many people are still exposed to the vulnerabilities, risks and deprivations referred to in the definition given above. In 2011, Mozambique was classified as a “low human development” country and ranked at 184 out of 187 countries on the human development index (United Nations Development Programme, 2011). Life expectancy at birth was estimated at 50.2 years, and 54.7% of the population was living below the national poverty line. Many survive on the street, and the informal economy for them constitutes the only opportunity for survival and social security. As for social protection legislation, Mozambique, compared with other developing countries, has a broad and comprehensive body of laws and regulations (Mausse & Cunha, 2011). However, the institutional social protection system is still insufficient and inefficient (Maleane & Suaiden, 2010). According to the last available data, formal social protection in 2007 in Mozambique was available for 200 000 to 250 000 public workers and 236 760 private companies’ employees – out of a population of approximately 22.3 million citizens. In this context, individuals remain generally dependent on informal safety nets, and families represent the principal lifeline during crises (Maleane & Suaiden, 2010).

3.5.2 The programme In 2011 the government of Mozambique launched the third Poverty Reduction Action Plan (PARP III) for the period 2011-2014 (Republic of Mozambique, 2011). PARP III is a fiveyear programme to combat poverty and promote inclusivity, aligned with the Millennium Development Goals. The PARP III documents recognise the challenge in expanding the number of vulnerable people covered by social protection programmes given the government’s capacity. In April 2010, the Mozambican Council of Ministers approved the National Strategy for Basic Social Security (NSBSS) for the period 2010-2014, in which the

103

Theories of Change and Programme Logic broadening of the social security provisions for vulnerable people was seen as a factor in poverty reduction. The NSBSS focuses on the collaboration between non-governmental actors, arguing that a partnership with civil society organisations will be a potentially successful tool to extend coverage (Mausse & Cunha, 2011). In June 2011 one such collaborative programme was launched as part of the NSBBS: the Social Protection and Informal Work Promotion Among Street People Project (SPIWP). It is this programme that is the focus of the present study, as well as what can be learned from its planning and early evaluation efforts. The SPIWP was planned as a three-year intervention, implemented in Maputo, the capital of Mozambique. It is funded by the European Commission, as part of its strategy to reduce vulnerability, social exclusion, and poverty through social protection mechanisms. The programme is administered by a group of eight organisations, drawing from both civil society and Mozambican government institutions, led by the Italian non-governmental organisation (NGO) Centro Informazione Educazione allo Sviluppo (CIES – Development Information and Education Centre), which is also the agency responsible for the grant contract with the European Commission. Two sets of service recipients are envisaged: the vulnerable and poor, as well as organisations working with them. Included in the first group are children, scavengers (the term scavengers may have a negative connotation, but it is the closest meaningful translation of the Portuguese word catador, as used in Mozambique), and mentally ill people living on the streets of the city centre of Maputo. Vulnerable families of six Maputo neighbourhoods are also included in this group, people who experience difficulties in accessing social protection services, with many at risk of becoming street dwellers, especially children and mentally ill people. The second level of beneficiaries of the SPIWP includes government institutions, civil society, and local communities. Several of its activities are aimed at creating and enhancing the capacities and abilities of these organisations to provide services to the primary beneficiaries, and that is why they are included here. Collaborative programmes like these are difficult to design, implement, and work with, and the SPIWP is no exception. It is implemented by organisations that evolved from different backgrounds, with different organisational values and tasks, and that have different expectations about what they regard as desirable outcomes. The programme activities furthermore consist of three interrelated sets of interventions that are implemented at three different levels of society: institutional, civil society, and community level. Researchers recognise that the creation of linkages between statutory and communitybased social protection systems can have positive influences on poverty reduction and vulnerability in developing countries (Coheur, Jacquier, Schmitt-Diabatè & Schremmer, 2007; Lund, 2009; Mupedziswa & Ntseane, 2013). These partnerships are typically aimed at facilitating access to social protection schemes of those vulnerable groups that are usually not covered by the statutory social security mechanisms, such as informal workers and street people (Jacobi, 2006). This collaborative approach can be considered as a response to the inefficiency and non-sustainability of social protection policy frameworks that, in many developing countries, remain attractive but non-affordable social welfare and antipoverty actions (Devereux, 2002; Low, Garret & Ginja, 1998).

104

Chapter 3

3.5.3 Evaluation When the SPIWP was launched in 2011, three factors combined to make the involvement of programme evaluation expertise attractive, and indeed advisable. First, as we indicated in the beginning, the range of programmes and policies that can be regarded as fitting social protection goals and objectives strongly suggests that conceptual clarity would be important, and that an external evaluation agent could be useful in this. A second factor was the complexity contained in the intervention itself: eight quite different organisations must collaborate to address the needs of a vulnerable and difficult-to-reach population. In the case of complex interventions, programme partners, or even members of the same partner organisation, bring to the programme their own values and agendas that may be reflected in different programme assumptions (Chen, 2005; Rossi, Lipsey & Freeman, 2004). In the present circumstances, it was quite possible that they did not share views on programme processes and outcomes (Funnell & Rogers, 2011). Finally, when the approach was made for evaluation expertise, the programme had not yet been implemented. This provided a unique opportunity. Chen (2005) and Owen and Rogers (1999) remarked how the development of an evaluation plan can benefit a programme that is still in its early stage of development: it can serve as a foundation to organising programme implementation and identify sources of problems when the programme is still highly fluid and open to modification. When the present authors became involved in an evaluation of the SPIWP in 2011, these possible contributions were more implicit than explicit. The question was what form evaluation efforts could take in this early phase of the programme’s development, as evaluation is typically considered as something that comes towards the end of a programme’s life cycle. We were able to convince the programme partners that one particular form of evaluation, programme theory evaluation, would be useful before the programme is even launched. The rest of our paper describes this approach, how it was implemented in the present evaluation, and what its effects were. Programmes have a certain logic to them, which Rossi et al. (2004) captured as a hierarchy of four steps: it starts with a valid conceptualisation of the problem to be addressed (generally known as needs assessment); followed by designing the appropriate means identified to remedy it (programme theory); then the implementation of that design (the implementation phase); and if appropriately implemented, concluding with achieving the intended improvements (impacts). Programmes can be strengthened at every step of the way, and in the present paper, we are particularly concerned with the role of programme theory evaluation (step 2 in their programme logic hierarchy) in improving the programme before it was launched. Bickman (1987:5) defined programme theory as “the construction of a plausible and sensible model of how the programme is supposed to work”. For Rossi et al. (2004), this plausible model contains two sets of assumptions: about the outcomes a programme is expected to achieve (called impact theory); and about its activities and organisation (process theory). The programme impact theory describes the cause-and-effect sequence of events generated by programme activities that provoke the change in the situation that the

105

Theories of Change and Programme Logic intervention intends to modify. This logical chain describes the steps toward the change, from the most proximal (immediate) outcome to the most distal (long term). For social protection programmes like the SPIWP, proximal outcomes would refer to the more or less immediate benefits for target groups, and distal outcomes would include the reduction of deprivation and vulnerability. In their scoping study of social protection research, Ellis, White, Lloyd-Sherlock, Chhotray and Seeley (2008) identified a lack of knowledge about impacts as one of the major gaps in this field. They argue that monitoring often is limited to the rollout of the project itself (what we call here “process”), and that the monitoring function comes to an end when the programme winds up. Thus we learn little about “the downstream impacts on the welfare and well-being of beneficiaries” (p. 12), or the macro level impacts, on poverty for example. This of course is what we called aspects of “impact theory” above. The present paper will address only the impact part of programme theory development, and not process theory as such. We wanted to elicit the assumptions about impact as they existed in the minds of the eight programme partners, and not necessarily as they exist in the literature. Extracting programme impact theory in this way has the potential to contribute significantly to new, start-up programmes like the SPIWP. First, and most obviously, it makes these expectations of impact explicit, if they are tacit at the start. Second, and as a result of having more explicit impacts, programme staff are alerted early to potential inconsistencies between them in what impacts they wish to achieve and for whom (Chen, 2005; Owen & Rogers, 1999). Third, drawing out stakeholders’ programme theory actively involves them in the process, which can result in an improvement of communication, contribute to awarenessraising about programme rationale, and contribute to the formation of consensus on the programme goals and outcomes (Funnell & Rogers, 2011). Fourth, it gives stakeholders the opportunity to examine and compare the logic of their ideas (Donaldson, 2007). Finally, having an agreed-upon impact theory is fundamental in constructing a framework for outcome monitoring and developing the appropriate measures to capture the expected change (Weiss, 1998). We thought this final aspect was particularly important in the light of the observations made concerning programme impacts in the social protection literature. Thus although an approach and a study like the present one cannot say anything about actual results achieved, we believe it is a demonstration of how important conceptual clarity is in the early stages of programme development, especially about what impacts are expected. Methodologically, an inductive, participatory mode of working is recommended to elicit stakeholders’ programme theory, and evaluators often rely on working group sessions and individual interviews (Chen, 2005) to achieve this. The working group mode is considered particularly appropriate with programmes that have been designed by multiple stakeholders and that present complicated aspects (Chen, 2005; Funnell & Rogers, 2011; Weiss, 1998).

106

Chapter 3

3.5.4 Impact A brief review of programme documentation, in particular the SPIWP Grant Application Form (CIES, 2010), revealed the overall goals of the programme: 1.  An improved social protection system for vulnerable groups, which envisages strengthening the overall social protection system through the creation of better coordination between government institutions and civil society. The implicit assumption here is that such coordination can improve the services offered to the most vulnerable. The programme activities related to this expected result are targeted mainly at secondary beneficiaries: programme partners, community organisations, civil society, and government departments. 2. Increased opportunities of socio-educational and occupational reintegration for street children, scavengers, and mentally ill people living on the street. It is envisaged that activities such as awareness campaigns about street dwellers’ and vulnerable peoples’ rights, education, and vocational training, will contribute to the reintegration of these groups into society. 3. An improved capacity of the public social protection system to identify and respond to existing social needs. The activities implemented to achieve this result include networking and advocacy, and social research. After this brief review of programme documents, a first set of interviews was conducted with the programme designers, discussing the design of the programme as described above. As a result of an iterative series of interviews, a first draft of the programme impact theory was developed, representing the change process as originally understood by the programme designers. This was followed up by interviews with two representatives from two programme partners, two local experts on street children and mentally ill people, and the European Commission’s officer in charge of programme supervision, to discuss the draft. This procedure is in line with Chen’s (2005) proposal for how best to develop programme theory through participatory modes of working. When this first draft of SRIWP’s impact theory was presented to the representatives of all the organisations, it became clear that this was not everyone’s understanding of what the programme was trying to achieve. This was an important finding, as discrepancies like these can lead to disagreements and confusion. Following Funnell and Rogers’ (2011) suggestion, a working group was formed to address these discrepancies, where all programme partners were represented. The group met twice in Maputo and the sessions were conducted in Portuguese. Using large sticky notes on a wall, a visual representation of the programme’s impact theory was built up over the two sessions, starting with immediate outcomes, and ending with long-term impacts. The result was an organised and consensual model involving 30 different outcomes, re-worked by the authors into a coherent set of outcomes, accepted by all partners as an accurate reflection of their thinking. Figure 3.17 presents this final model, with separate outcomes indicated for secondary and primary beneficiaries. We stress again that this is the “mental map” of the SRIWP’s impacts, as it exists in the minds of its significant stakeholders. It is not an indication of what exists in the social protection literature on outcomes and impacts, and remains very much a work in progress.

107

108

SECONDARY

PRIMARY

Vulnerable people more informed about available social protection services

Mentally ill street people enrolled in services like occupational therapy

Street teenagers receive vocational training

Young street children go back to school

More self-employment opportunities for scavengers not participating in the cooperative

Scavengers begin working together

Community members more informed about vulnerable people’s rights and needs

Increased financial resources for community-based social protection

Improved capacity to identify needs of primary beneficiaries

Network of actors involved in formal and traditional social protection delivery (government, civil society, and communities)

SHORT-TERM

FINAL PROGRAM IMPACT THEORY

Street teenagers employed as market assistants or in other jobs

Improved access to formal and traditional social protection services

Mentally ill people reintegrated into their families and/or jobs

Families show an interest in children’s wellbeing

Decrease in acts of vandalism by scavengers

Scavengers formally establish their cooperative

Decreased number of street people in Maputo

Increased respect of vulnerable people’s rights

Government more interested in improving legal framework in favour of the most vulnerable

Increased availability of traditional social protection services Decreased level of discrimination against scavengers

Improved capacity to respond in an integrated manner to demand for social protection services

LONG-TERM

Improved coordination among civil society, government and traditional social protection actors

MEDIUM-TERM

Improved living conditions of vulnerable people

Extended social protection coverage to vulnerable people

IMPACT

Theories of Change and Programme Logic Figure 3.17: Final programme impact theory

Chapter 3 The fine-tuned programme theory reveals the assumptions that programme stakeholders hold in terms of how the programme impacts should come about. For the scavengers, a cooperative must be established, to improve access to the formal social protection system. It is expected to increase the visible benefits of their work, which in turn will lead to greater inclusion within their communities. The stability thus provided is expected to result in a decrease in the acts of vandalism, such as rubbish bin fires, and increased access to the social protection system. Younger street children are expected to re-enter school, which in turn ought to create opportunities to renew the bond with their families. For teenaged street children, enrolment in vocational training is regarded as a first step towards job opportunities and to get them off the street. Services like occupational therapy are to be provided to mentally ill street people to improve their job prospects, which in turn ought to increase the chances that they will be welcomed back into their families. All these activities at primary beneficiary level are to be supported and enhanced by conditions introduced at secondary beneficiary level. This starts with better coordination of service delivery provided by all the actors involved in social protection, leading to improved capacity to identify and respond to the needs of the primary beneficiaries. Better coordination is also expected to increase the interest of governmental institutions to include vulnerable people’s social protection needs in the legislative framework.

3.5.5 Case conclusion In countries dominated by the informal economy, such as Mozambique, achieving universal social protection coverage through formal (institutional) mechanisms is a great challenge. We have tried to show in this discussion how evaluators can work with stakeholders at the front end of a programme that aims to extend available social protection coverage by creating connections between different levels of service provision. It demonstrates how evaluators could work with a programme in its early phases, even before implementation, to strengthen it and to increase the chances that it will make an impact. It was clear to us that participating in this process resulted in a number of positive results for the programme stakeholders: • The programme logic was much clearer to all the stakeholders, and could be reviewed over time as it evolved. • The different roles that different partners must play in the implementation of programme activities were made explicit via this process. • Participants experienced a sense of team building and organisational development, and a sense of ownership of the programme. • It drew attention to outcomes and impacts, and not just to activities and outputs. • The theory evaluation in the start-up phase of the programme had an important formative function, through the clarification of programme outcomes, and the appropriateness of activities given the programme logic. The definition of the programme outcomes was helpful in elaborating an appropriate outcome monitoring framework. One could of course ask how feasible it is that such a framework will lead to the expected outcomes. This work was done as part of our investigation, but is not reported here. Suffice

109

Theories of Change and Programme Logic it to say that the key assumptions presented above seem to be supported by evidence from the literature. For example: working in a recognised cooperative will increase income and social protection opportunities for scavengers; the combination of institutional and community-based social protection can extend the coverage of social protection services for vulnerable people (Coheur et al., 2007); coordination among civil society, government institutions, and communities can improve responses to the demand for social protection of vulnerable groups (Cook & Kabeer, 2009); and the provision of health and educational services can reduce exposure to risk (Lund, 2009). The next set of evaluation questions of course would refer to implementation, if we follow Rossi et al.’s (2004) step-wise hierarchy: in the next three years, how has this programme been implemented? Given its complexity, and developmental approach, we will not be surprised if implementation provides serious challenges to programme staff, and it is quite likely that actual service delivery will differ from the “programme as designed”. Ultimately the important question is one of impact: Did the programme work as intended? For this, we of course require longitudinal studies, to find out what the longer term effects have been on the well-being of the vulnerable and deprived groups who are considered as the target population of this set of social protection activities.

3.6 Conclusions Programme evaluation overlaps with but is also slightly different from more basic academic research in terms of the purpose of data collection and the standards for judging quality. Basic scientific research is generally undertaken to discover new knowledge, test existing theories, to establish truth and to make generalisations across time and space, whereas programme evaluation is explicitly undertaken to inform decisions, clarify options, identify improvements and to provide information about policies and programmes within the contextual boundaries of time, place, values and politics (Patton, 1997). Over the past decades there have been major changes in evaluation research and the way it is implemented. The most important changes include the emergence of new evaluation methodologies, the heightened role of theory and a stronger emphasis on the use of systematic programme evaluation logic as an integral part of policy programme development. The result is that there is a growing appreciation of the value of using applied research findings or evidence to improve policy content and processes (Hovland, 2007:5). Using a theory of change as part of the process for the development of a policy as well as the programme logic to unpack the different stages of the policy process, makes it easier to develop and implement policy since each step (from the ideas behind it, to the outcomes it hopes to achieve, to the resources needed) is clearly defined within the theory and displayed within the logic model (Auspos & Kubisch, 2004). Theories of change structured in programme logic format are, for a number of reasons, vital to the success of an evaluation. Grounding a policy or programme explicitly in the transformation theory that it is supposed to implement, maximises the achievement of the envisaged desired outcomes. The absence of the theory of change underlying a policy, programme or project

110

Chapter 3 further severely complicates any systematic reflection about and evaluation of the value that the intervention has added to the state of the problem that it was supposed to address. A programme logic framework could be used very effectively to evaluate policy development and implementation. The logframe matrix approach provides a practical tool for integrating planning, implementation, evaluation and reporting of policy development and implementation (Hovland, 2007:4). However, the logical framework itself should only be regarded as the monitoring and implementation instrument to maximise the success and use of the evaluation. “It creates an illustration of all the various moving parts that must operate in concert to bring about a desired outcome” (Anderson, 2004:3). The full cycle of policy, programme or project intervention development, implementation, evaluation, review and improvement, however, comprises much more than just completing a logframe template. The underlying theory of change is crucial in understanding why an intervention is made, what it is supposed to achieve and whether that has happened. Theories of change are playing increasingly important roles in determining if government programmes and policies are achieving their objectives, and whether they are effective or not (AEA Evaluation Policy Task Force, 2009). This is also true for systematic evaluation practices in Africa and South Africa. Numerous guides, manuals, templates, approaches and procedures are freely available from stakeholders across the globe involved in attempts to improve the quality of development interventions through the more explicit use of theories of change in the design, implementation and evaluation of the interventions.

References AEA Evaluation Policy Task Force. 2009. An evaluation framework for more effective government. http:// www.eval.org/aea09.eptf.eval.roadmapF.pdf. Retrieved: 13 March 2014. Alkin, M.C., Vo, A.T. & Hansen, M. 2013. Using logic models to facilitate comparisons of evaluation theory. Evaluation and Program Planning: 38:33. Anderson, A.A. 2004. The Community Builder’s Approach to Theory of Change: A Practical guide to theory development. Aspen Institute. http://www.aspeninstitute.org/sites/default/files/content/ docs/roundtable%20on%20community%20change/rcccommbuildersapproach.pdf. Retrieved: 04 August 2011. Auriacombe, C. 2011. The Role of Theories of Change and Programme Logic Models in Policy Evaluation. African Journal of Public Affairs, 4(2): 36-53. Auspos, P. & Kubisch, A.C. 2004. Building knowledge about community change: Moving Beyond Evaluations. Washington: Roundtable on Community Change. Austin, J. & Bartunek, J. 2004. Theories and Practice of Organization Development. Handbook of Psychology, 12: 309-332. Barrientos, A. & Hulme, D. 2008. Social protection for the poor and poorest in developing countries: Reflections on a quiet revolution. (Working Paper No. 30). Manchester: Brooks World Poverty Institute, University of Manchester. http://www.bwpi.manchester.ac.uk/resources/WorkingPapers/bwpi-wp-3008.pdf. Retrieved: 14 March 2014. Bennett, C. 1976. Analyzing impacts of extension programs, ESC-575. Washington, DC: Extension Service-U.S. Department of Agriculture. Bichelmeyer, B.A. & Horvitz, B.S. 2006. Comprehensive performance evaluation: Using logic models to develop theory based approach for evaluation of human performance technology interventions, in Pershing, J.A. (ed.). 2006. Handbook of Human Performance Technology. San Francisco: Pfeiffer.

111

Theories of Change and Programme Logic Bickman, L. 1987. The functions of program theory. New Directions for Program Evaluation, 33: 5-18. Buonaguro, L. & Louw, J.N. 2013. Developing impact theory for a social protection programme in Maputo, Mozambique. Case Study. Department of Organisational Psychology, UCT. Canada. 2012. Theory-Based Approaches to Evaluation: Concepts and Practices. Centre of Excellence for Evaluation. Treasury Board of Canada Secretariat. http://www.tbs-sct.gc.ca/cee/tbae-aeat/tbaeaeat-eng.pdf. Retrieved: 14 February 2014. Central Research Department. 2006. Monitoring and evaluation a guide for DFID-contracted research programmes. http://webarchive.nationalarchives.gov.uk/+/http://www.dfid.gov.uk/research/ me-guide-contracted-research.pdf. Retrieved: 12 March 2014. Chen, H.T. 2005. Practical program evaluation: Assessing and improving planning, implementation, and effectiveness. Thousand Oaks, CA: Sage. Chen, H.T. & Rossi, P.H. 1980. The multi-goal, theory-driven approach to evaluation: A model linking basic and applied social science. In Social Forces, 59: 106-122. CIDA. 2008. Logic Model. Canadian International Development Agency Results Based Policy Statement. 15 June 2008. http://www.acdi-cida.gc.ca/INET/IMAGES.NSF/vLUImages/ResultsbasedManagement/$file/RBM-LOGIC_MODEL-Def.pdf. Retrieved 27 August 2014. CIES. 2010. Grant application form: Parts A and B. Promoção da protecção social e trabalho informal no seio da população de rua. Author: Maputo. Clark, H. 2004. Deciding the Scope of a Theory of Change. New York: ActKnowledge Monograph. CLEAR. 2013. CLEAR Theory of Change. http://www.theclearinitiative.org/CLEAR_theory_of_ change.pdf. Retrieved: 15 February 2015. Cloete, F. Public policy in more and lesser-developed states, in Cloete, F. & De Coning, C. (eds.). Improving public policy: Theory, Practice and Results. 3rd edition. Pretoria: JL van Schaik. Cloete, F., De Coning, C. & Rabie, B. Practical Policy Improvement Tools, in Cloete, F. & De Coning, C. (eds.). Improving public policy: Theory, Practice and Results. 3rd edition. Pretoria: JL van Schaik. Coffman, Julia. 2007. Evaluation Based on Theories of the Policy Process. Evaluation Exchange XIII (1 & 2). Coheur, A., Jacquier, C., Schmitt-Diabatè, V., & Schremmer, J. 2007. Linkages between statutory social security schemes and community based social protection mechanisms: A promising new approach (Report No. 9). Geneva: International Social Security Association. http://www.issa.int/ Resources Retrieved: 13 March 2014. Collins, E. & Clark, H. 2013. Supporting Young People to Make Change Happen: A Review of Theories of Change. ActKnowledge and Oxfam Australia. Connell, J., Kubisch, A., Schorr, L., & Weiss, C. (eds.). 1997. Voices from the field: New approaches to evaluating community initiative. Washington, DC: Aspen Institute. Cook, S. & Kabeer, N. 2009. Socio-economic security over the life course: A global review of social protection. (Final Report). Brighton: Institute of Development Studies, University of Sussex. http:// www.ids.ac.uk/files/dmfile/AGlobalReviewofSocialProtection.pdf. Retrieved: 15 April  2014. Coryn, C., Noakes, L., Westine, C., & Schroter, D. 2011. A Systematic Review of Theory-Driven Evaluation Practice from 1990 to 2009. American Journal of Evaluation, 32(2): 199-226. Devereux, S. 2002. Can social safety nets reduce chronic poverty? Development Policy Review, 20(5): 657-675. Donaldson, S.I. 2007. Program theory-driven evaluation science. New York, NY: Lawrence Erlbaum. Duignan, P. 2009. Brief introduction to program logic models (outcomes models). http://www. youtube.com/watch?v=bZkwDSr__Us. Retrieved: 14 February 2014. Earl, S., Carden, F. & Smutylo, T. 2001. Outcome mapping: Building learning and reflection into development programs. Ottawa: International Development Research Centre. http://www.idrc. ca/EN/Resources/Publications/Pages/IDRCBookDetails.aspx?PublicationID=121. Retrieved: 14 February 2014. Ellis, F., White, P., Lloyd-Sherlock, P., Chhotray, V. & Seeley, J. 2008. Social protection research scoping study. Norwich: Overseas Development Group, University of East Anglia. Retrieved: http://www.gsdrc.org/docs/open/HD542.pdf.

112

Chapter 3 Frectling, J.A. 2007. Logic modelling methods in program evaluation. San Francisco: Jossey Bass. Funnell, S. & Rogers, P. 2011. Purposeful Program Theory: Effective Use of Theories of Change and Logic Models. San Francisco, CA: Jossey Bass. Gentilini, U. & Omamo, S. 2011. Social protection 2.0: Exploring issues, evidence and debates in a globalizing world. Food Policy, 36: 329-340. Grantcraft. 2006. Mapping Change: Using a Theory of Change To Guide Planning and Evaluation. http://portals.wi.wur.nl/files/docs/ppme/Grantcraftguidemappingchanges_1.pdf. Retrieved: 18 April 2014. Greene, J.C. 2006. Evaluation, Democracy and Social Change, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. Guijt, I. & Retolaza, I. 2012. Defining ‘Theory of Change. Discussion paper. Hivos. http:// www.hivos.nl/content/download/76473/665916/version/1/file/E-Dialogue+1++What+is+ToC+Thinking.pdf. Retrieved: 18 April 2014. House, E.R. & Howe, K. R. 1999. Values in Evaluation and Social Research. Thousand Oaks, CA: SAGE. Hovland, I. 2007. Making a difference: M&E of policy research. ODI Working Paper, 281. http:// www.odi.org.uk/resources/download/1751.pdf. Retrieved: 4 August 2011. Howlett, M., Ramesh, M., Perl, A. 2009. Studying Public Policy: Policy Cycles and Policy Subsystems. Oxford: Oxford University Press. International Network on Strategic Philanthropy. 2005. Theory of Change Tool Manual. http:// www.dochas.ie/Shared/Files/4/Theory_of_Change_Tool_Manual.pdf. Retrieved: 18 April 2014. INTRAC. 2012. Theory of Change: What’s it all about? ONTRAC 51. International NGO Training and Research Centre. http://www.intrac.org/resources.php?action=resource&id=741. Retrieved: 18 April 2014. ISPI. 2010. Handbook of Improving Performance in the Workplace. International Society for Performance Improvement. Hoboken, NJ: Pfeiffer. Jackson, E. 2013. Interrogating the theory of change: evaluating impact investing where it matters most. Journal of Sustainable Finance & Investment, 3(2): 95-110. Jackson, E.T. & Harji, K. 2012. Unlocking Capital, Activating a Movement: Final Report of the Strategic Assessment of the Rockefeller Foundation’s Impact Investing Initiative. New York: The Rockefeller Foundation. Jacobi, P. 2006. Public and private response to social exclusion among youth in Sao Paulo. The Annals of the American Academy of Political Science, 606: 216-230. James, C. 2011. Theory of Change Review: A report commissioned by Comic Relief. London. http:// mande.co.uk/blog/wp-content/uploads/2012/03/2012-Comic-Relief-Theory-of-ChangeReview-FINAL.pdf. Retrieved: 13 April 2014. Kellogg Foundation. 2004. Logic Model Development Guide. WK Kellogg Foundation. Battle Creek, USA: W.K. Kellogg Foundation. http://www.wkkf.org/resource-directory/resource/2006/02/ wk-kellogg-foundation-logic-model-development-guide. Retrieved: 17 April 2014. Keystone. 2014. Developing a Theory of Change: A Framework for Accountability and Learning for Social Change. www.keystoneaccountability.org. Retrieved: 13 February 2014. Lempert, D.H. 2010. Why Government and Non-Governmental Policies and Projects Fail Despite ‘Evaluations’: An Indicator to Measure whether Evaluation Systems Incorporate the Rules of Good Governance Journal of Multi-Disciplinary Evaluation, Volume 6(13). Low, J., Garret, J. http://www.uwex.edu/ces/lmcourse/interface/coop_M1_Overview.htm & Ginja, V. 1998. Formal safety nets in an urban setting: Lesson from a cash transfer program in Mozambique. In Ministry of Planning and Finance, Understanding poverty and well-being in Mozambique: The first national assessment (1996-97). Maputo: Government of Mozambique. Lund, F. 2009. Social protection and the informal economy: Linkages and good practices for poverty reduction and empowerment. Paris: Organisation for Economic Co-operation and Development. http://www.oecd.org/dataoecd/26/21/43280700.pdf. Retrieved: 13 February 2014.

113

Theories of Change and Programme Logic Maleane, S.O.T. & Suaiden, E.J. 2010. Inclusão, exclusão social e pobreza em Moçambique em pleno século XXI. Inclusão Social, 4(1):67-75. http://revista.ibict.br/inclusao/index.php/inclusao/ article/view/147 Retrieved: 18 April 2014. Maslow, A.H. 1943. A theory of human motivation. Psychological Review, 50(4): 370-96. Mausse, M. & Cunha, N. 2011. Mozambique: Setting up a social protection floor. In Successful social protection floor experiences (315-332). New York: United Nations Development Programme. www.ilo.org/gimi/gess/RessFileDownload.do?ressourceId=20840. Retrieved: 18 April 2014. Max-Neef, Manfred A. 1991. Human Scale Development: Conception, Application and further Reflections. New York: Apex Publishers. McCawley, P.F. n.d. The logic model for program planning and evaluation. http://www.uiweb. uidaho.edu/extension/LogicModel.pdf). Retrieved: 22 July 2011. Mupedziswa, R. & Ntseane, D. 2013. The contribution of non-formal social protection to social development in Botswana. Development Southern Africa, 30: 84-97. Muspratt-Williams, A. 2009. Strategic thinking by non-government organizations for sustainability: A review of the logical framework approach. Stellenbosch: Stellenbosch University. National Treasury. 2010. Framework for Strategic Plans and Annual Performance, National Treasury. August 2010. Pretoria: Government Printer. Neef, M., Manfred, A., Elizalde, A. & Hopenhayn, M. 1991. Human Scale Development. The Apex Press. London. NEPF. 2011. National Evaluation Policy Framework. Pretoria: DPME. NORAD. 1999. The logical framework approach. Norwegian Agency for Development Cooperation. http://www.ccop.or.th/ppm/document/home/LFA%20by%20NORAD%20Handbook.pdf. Retrieved: 24 July 2011. Norton, A., Conway, T., & Foster, M. 2002. Social protection: Defining the field of action and policy. Development Policy Review, 20(5): 541-567. ORE. 2004. Theory of Change: A Practical Tool For Action, Results and Learning. Organizational Research Services. http://www.aecf.org/upload/publicationfiles/cc2977k440.pdf. Retrieved: 15 April 2014. Örtengren, K. 2004. The Logical Framework Approach, A summary of the theory behind the LFA method. SIDA. http://unpan1.un.org/intradoc/groups/public/documents/un/unpan032558. pdf. Retrieved: 18 April 2014. Ortiz, A. & Macedo, J.C. 2010. A ‘Systemic Theories of Change’ Approach for Purposeful Capacity. Development. IDS Bulletin, 41:3: 87-99. Owen, J.M., & Rogers, P.J. 1999. Program evaluation: Form and approaches. London: Sage. Patton, M. 1997. Utilization-Focused Evaluation. The new century text. Thousand Oaks, CA: Sage  Publications. Reeler, D. 2007. A Three-fold Theory of Social Change and Implications for Practice, Planning, Monitoring and Evaluation. http://www.cdra.org.za/articles/A%20Theory%20of%20Social%20 Change%20by%20Doug%20Reeler.pdf. Retrieved: 13 February 2014. Republic of Mozambique. 2011. PARP 2011-2014 (Report No. 11/132). Washington, DC: International Monetary Fundhttp://www.imf.org/external/pubs/ft/scr/2011/cr11132.pdf. Retrieved: 18 April 2014. Retolaza, I. 2011. Theory of Change: A thinking and action approach to navigate in the complexity of social change processes. Hivos/ UNDP/ Democratic Dialogue. http://www. democraticdialoguenetwork.org/app/documents/view/en/1811. Retrieved: 17 April 2014. Rogers, P.J. 2008. Using Programme Theory for Complicated and Complex Programmes. Evaluation, 14(1): 29-48. Rossi, P., Lipsey, M.W. & Freeman, H.E. 2004. Evaluation: A systematic approach. 7th edition. Thousand Oaks, CA: Sage. Spreckley, F. 2009. Results-based Monitoring and Evaluation Toolkit. www.locallivelihoods.com. Retrieved: 13 February 2014.

114

Chapter 3 Stachowiak, S. 2013. Pathways for Change: 10 Theories to Inform Advocacy and Policy Change Efforts. http://www.evaluationinnovation.org/sites/default/files/Pathways%20for%20 Change.pdf. Retrieved: 14 February 2014. Stein, D. & Valters, C. 2012. Understanding a ‘Theory of Change’ in International Development: A Review of Existing Knowledge. London: DiFID, Asia Foundation & LSE. Suchman, E. 1967. Evaluative research: Principles and practice in public service and social action programs. New York: Russell Sage Foundation. Taplin, D. & Clark, H. 2012. Theory of Change Basics: A Primer on Theory of Change. New York: Acknowledge. Taplin, D.H. & Clark, H. 2012. Theory of Change Basics. Actknowledge. http://www.seachangecop. org/sites/default/files/documents/2012%2003%20AK%20-%20Theory%20of%20Change%20 Basics.pdf. Retrieved: 13 February 2014. Taplin, D., Clark, H., Collins, E. & Colby, D. 2013. Technical Papers: A Series of Papers to support Development of Theories of Change Based on Practice in the Field. New York: Actknowledge and The Rockefeller Foundation. Taylor-Powell, E. & Henert, E. 2008. Developing a Logic Model: Teaching and Training Guide. Madison Wisconsin: University of Wisconsin Extension, Madison Wisconsin. The Presidency. 2007. Policy framework for the Government-wide Monitoring and Evaluation Systems. http://www.info.gov.za/view/DownloadFileAction?id=94547). Retrieved: 24 July 2011. THP. 2013. The Hunger Project: Outcome Evaluation Pilot Project (OEPP) Measuring Outcomes of THP’s Epicenter Strategy. http://www.thp.org/?gclid=CI2Ii6SB6r0CFSUUwwodpKAAXg. Retrieved: 17 April 2014. United Nations Development Programme. 2011. Human Development Report. http://hdr.undp. org/en/reports/global/hdr2011/download/. Retrieved: 18 April 2014. University of Wisconson. Undated. Enhancing Program Performance with Logic models. http:// www.uwex.edu/ces/lmcourse/interface/coop_M1_Overview.htm. Retrieved: 23 July 2011. Vogel, I. 2012. Review of the use of ‘Theory of Change’ in international development. UK Department of International Development. http://r4d.dfid.gov.uk/pdf/outputs/mis_spc/DFID_ToC_ Review_VogelV7.pdf. Retrieved: 14 February 2014. Vogel, I. & Stephenson, Z. 2012. Examples of Theories of Change: Annexure 3, in Vogel, I. 2012. Review of the use of ‘Theory of Change’ in international development. UK Department of International Development. http://r4d.dfid.gov.uk/pdf/outputs/mis_spc/DFID_ToC_Review_VogelV7.pdf. Retrieved: 14 February 2014. Weiss, C. 1995. Nothing as Practical as Good Theory: Exploring Theory-Based Evaluation for Comprehensive Community Initiatives for Children and Families, in Connell, J., Kubisch, A., Schorr, L. & Weiss, C. (Eds.). New Approaches to Evaluating Community Initiatives: Concepts, Methods and Contexts. New York: Aspen Institute: 65-92. Weiss, C.H. 1997. Theory-based evaluation: Past, present, and future. New Directions for Evaluation, 76: 41-55. Weiss, C.H. 1998. Have We Learned Anything New About the Use of Evaluation? American Journal of Evaluation, 19: 21-33. Wholey, J. 1979. Evaluation: Promise and performance. Washington, D.C.: Urban Institute Press.

115

Chapter 4 Evaluation Models, Theories and Paradigms Babette Rabie 4.1 Introduction Within the expanding evaluation theory field and profession, there are numerous competing and supporting concepts, models, theories and paradigms advocating what evaluation focus or should focus on and how it is or should be done. For purposes of this chapter we use the distinctions among these and related concepts that have been used by De Coning, Cloete and Wissink (2011:32): • “A concept is an abstract idea (frequently controversial) that serves as a thinking tool to illustrate specific attributes of intangible phenomena. • A model is a representation of a more complex reality that has been oversimplified in order to describe and explain the relationships among variables, and even sometimes to prescribe how something should happen. Models can therefore be used in a neutral, descriptive way, or they can be used in a normative way, expressing a preference for a particular value judgement. Models are built around specific concepts. • A theory is a comprehensive, systematic, consistent and reliable explanation and prediction of relationships among specific variables (e.g. the theories of democratic and revolutionary change). It is built on a combination of various concepts and models, and attempts to present a full explanation and even prediction of future events. Theories can also be used descriptively or prescriptively. While theories are normally assessed in terms of their predictive validity, models are normally assessed in terms of their utility in accurately reflecting reality. • A paradigm is a collection or pattern of commonly held assumptions, concepts, models and/ or theories constituting a general intellectual framework or approach to scientific activities (e.g. ideologies like liberalism, Marxism, nationalism, apartheid, fascism, feminism, globalism, environmentalism and Darwinism). A paradigm is dominant if it is widely accepted in the scientific community concerned.” Throughout this text the concept “theory” will include the concept “model”, except when it is explicitly used separately. Various attempts have been made to classify alternative competing evaluation concepts, models, theories and paradigms based on their similarities and differences.

4.2 Evaluation Theory: A Classification System During its relatively short history, the evaluation profession has already become characterised by a variety of philosophies, approaches, models, traditions and practices. The first evaluation studies tested bold new reform approaches, while ignoring the effect

116

Chapter 4 of small changes to existing programmes or local practices for local goals. Over time, evaluation approaches changed and diversified to reflect accumulating practical experience (Shadish, Cook & Leviton, 1991:32). Various evaluation theories emerged that tried to “describe and justify why certain evaluation practices lead to particular kind of results” (Shadish, Cook & Leviton, 1991:30). While early theories focused on methods for doing evaluations in natural field settings, later theories focused on the politics of applying methods in field settings, and how research fits into social policy (Shadish & Luellen, 2004:83). Various attempts have been made to classify these theories and models, signalling a natural growth in the evaluation discipline to assist better evaluation theory and practice (Mathison, 2005:258). Some of the more prominent classification attempts are summarised below. Shadish, Cook and Leviton (1991) classified theories and theorists into three “stages”. Their theories in stage one introduced science and experiments as a means to address social problems. Some approaches (for example those of Campbell and Scriven) adopted a positivist approach to evaluation research and supported methodologies that allowed for social experimentation that “proves” that certain results have been achieved. The stage two theories and theorists questioned the usefulness of the experimental approaches in providing relevant data for decision-making purposes. Pawson and Tilley (1990) and Pawson (2006) proposed that reality is socially constructed, and that the perspective of an individual may influence his or her interpretation of “success” or “failure”. The approaches in this stage emphasised use and pragmatism of evaluations, with methodological rigour as only one consideration in designing the evaluation. Emerging from the stage 2 theories, the stage three approaches emphasised that since more than one interpretation of reality existed, this may influence the political and programme decisions that emerge from the evaluation. The evaluator therefore needs to focus on the potential influences of the current social reality that may skew evaluation findings in favour of those in power, while marginalising the poor and powerless (see for example Mertens, 2004; House & Howe, 2000; Greene, 2006; Fetterman, 1996). The categories proposed by Guba and Lincoln (1989) emphasised the changing role of the evaluator, in what was proposed as “four generations” of evaluators (see Stockmann, 2011:34-35). In this system, the first generation of evaluators focused on measuring, with the evaluator as an external data gatherer. The second generation of evaluators complemented measuring with neutral descriptions of the phenomenon with the evaluator as a neutral observer. The third generation evaluators in addition to measuring and describing should assess or judge objectively, thereby focusing on the construction of criteria and values for judgement. The final generation questioned whether the evaluator should be the judge, and therefore emphasised the involvement of various stakeholders in the evaluation process, with the evaluator adopting the role of negotiator and moderator. Chen presents four types of evaluation strategies linked to the purposes of the evaluation. He distinguishes between evaluation strategy (the general direction taken by the evaluator to meet a particular purpose) and evaluation approaches (the systematic set of procedures and principles guiding evaluators, including the conceptualising problems, research

117

Evaluation Models, Theories and Paradigms method application and interpretation of data) (Chen, 2005:144). Four types of strategies are identified, namely assessment strategies that provide information on the performance of the intervention; development strategies that assist in planning the intervention; enlightenment strategies that examines underlying assumptions and mechanisms that mediate observed effects; and partnership strategies that involves stakeholders in planning and implementing interventions (Chen, 2005:144-148). Alkin and Christie (2004:12) developed an evaluation tree with three main branches, namely use, methods and valuing. Various evaluation theorists were sorted onto the three branches on the basis of their (most) valued contributions to the evaluation field. Cardin and Alkin (2012) added more evaluation theorists to what they called a Modified Evaluation Theory Tree. The Evaluation Tree was further updated in Alkin (2013). Figure 4.1: The Evaluation Tree

Use

Methods Verdung

Most Significant Change

Wadsworth

Developmental Evaluation Preskill

King

Chelimsky

Alkin

Wholey

Carlsson Henry & Mark

Cronbach

Fetterman

Citizen Report Card

Systematisation

Funnell & Rogers

Cousins

Patton

Chen

Weiss

APRM

Outcome Mapping

Valuing

Wehipeihana

Levin Greene

Rossi 3ie Pawson & Tiley

Cook

RPA/PRA Eisner Lincoln & Guba

House Boruch

Williams Owen

Mertens

Stake Stufflebeam Tyler

LFA/RBM

Campbell

Scriven Parlett & Hamilton MacDonald

Social accountability

Social inquiry

Epistemology

(Source: Cardin & Alkin, 2012:114)

Following the metaphor initiated by Alkin and Christie, Chilisa and Malunga (2012:12‑13, 32) presents a first tentative effort to develop an Africa Rooted Evaluation Tree with two initial branches (see Case 7 in the Annexures below for a visual depiction of the Africa Rooted Evaluation Tree). The first branch focuses on “decolonizing and indigenizing evaluation … to recognize the adaptation of the accumulated Western theory and practice on evaluation to serve the needs of Africans” whilst the second “‘relational evaluation branch’ … draws from the concept of ‘wellness’ as personified in African greetings and the southern African concept of ‘I am because we are’. The wellness reflected in the relationship between people extends

118

Chapter 4 also to non-living things, emphasizing that evaluation from an African perspective should include a holistic approach that links an intervention to the sustainability of the ecosystem and environment around it”. For evaluation in Africa to be “rooted” it should include an analysis of the intervention’s contribution towards community wellness, and balance both Western and African priorities and indicators. Rossi, Lipsey and Freeman (2004) developed classification systems that link evaluation to the programme life cycle. Within this approach, they proposed an evaluation hierarchy that evaluates various parts of the programme, namely the need for the programme, its design and theory, the implementation process, the outcome or impact and finally the cost and efficiency of the programme. Table 4.1 summarises the purposes and typical evaluation questions at the respective programme stages. Table 4.1: Aim of evaluation at various stages of programme development Stage of Programme Development

Potential Questions

Evaluation forms

Assessment of social problems and needs

To what extent are community needs and standards met?

Needs assessment; problem description

Determination of goals

What must be done to meet those needs and standards?

Needs assessment; service needs

Design of programme alternatives

What services could produce the desired changes?

Assessment of programme logic or theory

Selection of alternatives

Which of the possible programme approaches is best?

Feasibility study, formative evaluation

Programme implementation

How should the programme be put into operation?

Implementation assessment

Programme operation

Is the programme operation as planned?

Process evaluation, programme monitoring

Programme outcomes

Is the programme having the desired effects?

Outcome evaluation

Programme efficiency

Are programme effects attained at a reasonable cost?

Cost-benefit analysis, costeffectiveness analysis

(Source: Rossi, Lipsey & Freeman, 2004:40)

The Department of Performance Monitoring and Evaluation translates this to five types of evaluation, presented in Table 4.2. Owen (2006:41-54) distinguishes between proactive evaluation aimed at synthesising previous evaluation findings, clarificative evaluation to clarify the underlying logic and intended outcomes of the intervention, interactive evaluation to improve the evaluation design, monitoring evaluation to track progress and refine the programme and, finally, impact evaluation for learning and accountability purposes. Stufflebeam and Shinkfield (2007) identified a list of 26 approaches to evaluation classified into five categories: Pseudo-evaluations, Questions- and Methods-Oriented Evaluation Approaches (also seen as Quasi-Evaluation Studies), Improvement- and Accountability-

119

Evaluation Models, Theories and Paradigms Oriented Evaluation Approaches, Social Agenda and Advocacy Approaches, and Eclectic Evaluation Approaches. Table 4.2: Evaluation studies commissioned during an intervention life cycle Type of evaluation

Covers

Timing

Diagnostic evaluation

This is preparatory research (often called ex-ante evaluation) to ascertain the current situation prior to an intervention and to inform intervention design. It identifies what is already known about the issues at hand, the problems and opportunities to be addressed, causes and consequence, including those that the intervention is unlikely to deliver, and the likely effectiveness of different policy options. This enables the drawing up of the theory of change before the intervention is designed.

At key stages prior to design or planning

Design evaluation

Used to analyse the theory of change, inner logic and consistency of the programme, either before a programme starts, or during implementation to see whether the theory of change appears to be working. This is quick to do and uses only secondary information and should be used for all new programmes. It also assesses the quality of the indicators and the assumptions.

After an intervention has been designed, in first year, and possibly thereafter

Implementation evaluation

Aims to evaluate whether an intervention’s operational mechanisms support achievement of the objectives or not and understand why. Looks at activities, outputs, and outcomes, use of resources and the causal links. It builds on existing monitoring systems, and is applied during progamme operation to improve the efficiency and efficacy of operational processes. It also assesses the quality of the indicators and assumptions. This can be rapid, primarily using secondary data, or indepth with extensive field work.

Once or several times during the intervention

Impact evaluation

Seeks to measure changes in outcomes (and the well-being of the target population) that are attributable to a specific intervention. Its purpose is to inform high-level officials on the extent to which an intervention should be continued or not, and if there are any potential modifications needed. This kind of evaluation is implemented on a case-by-case basis.

Designed early on, baseline implemented early, impact checked at keu stages e.g. 3/5 years

Economic evaluation

Economic evaluation considers whether the costs of a policy or programme have been outweighed by the benefits. Types of economic evaluation include: • cost-effectiveness analysis, which values the costs of implementing and delivering the policy, and relates this amount to the total quantity of outcome generated, to produce a “cost per unit of outcome” estimate (e.g. cost per additional individual placed in employment); and • cost-benefit analysis (CBA), which goes further in placing a monetary value on the changes in outcomes as well (e.g. the value of placing an additional individual in employment).

At any stage

Evaluation synthesis

Synthesising the results of a range of evaluations to generalise findings across the government, e.g. a function such as supply chain management, a sector, or a cross-cutting issue such as capacity. DPME will undertake evaluation synthesis based on the evaluations in the national evaluation plan and do an annual report on evaluation.

After a number of evaluations are completed

(Source: DPME, 2011:9)

120

Chapter 4 The various approaches and the classification attempts that try to make sense of these approaches may seem overwhelming to new evaluators, especially with the addition of further approaches and models as the evaluation field expands to incorporate evaluators from various disciplines. However, while classification schemes are usually criticised as soon as they are published on the basis of what is included and excluded, the charting of evaluation approaches has a pragmatic purpose as it provides evaluation practitioners with the details to make a choice among various evaluation approaches on the basis of their inherent parameters, purposes and processes (Mathison, 2005:257). The functions of the various evaluation approaches and models are to provide a “guideline for the researchmethodological operationalization of evaluation aims, facilitate the conceptual elaboration of evaluation questions and their implementation in concrete operations” (Stockmann, 2011:30). However, the discipline of the evaluator that informs his or her conception on what is regarded as scientific research, often with specific associated methodologies, as well as ever-emerging new evaluation questions, aims and objects of evaluation drives the development of further evaluation approaches (Stockmann, 2011:31; see also Fitzpatrick et al., 2004:59). The classification system proposed here builds on the above previous classification schemes in attempting to address some of the identified limitations of these approaches. It uses three main classification categories, based on the main underlying conceptual, theoretical or practical approach to or driving force behind the evaluation that provides the foundation for the evaluation, namely the scope of the evaluation, the approach or underpinning philosophy of the evaluation, and the evaluation design and methodology which provide the parameters for collecting and assessing data to inform the evaluation. This classification system accounts for the scope of the evaluation study as determined by the parameters that delimits the study, the philosophical or evaluation approach as informed by the objectives of the evaluation study which informs the implicit or explicit normative or value frameworks underlying the evaluation exercise and alternative designs and methodologies for evaluation that are most appropriate to the evaluation questions and data sources. This classification system thus adopts a very pragmatic approach: the first category assists evaluators to delimit what will be evaluated, approaches in the second category assists evaluators in clarifying the purpose of the evaluation, while the last category answers questions on how this will be evaluated. The various evaluation types and approaches presented in literature will be discussed within the three categories of the proposed classification system.

4.3

Evaluation approaches based on scope

The functional, geographic or behavioural parameters of the evaluation determine and delimit the focus of the evaluation. The evaluation may be very broad, encompassing several of the dimensions of performance (e.g. during an organisational performance review or an integrated evaluation). Alternatively, the evaluation may focus only on one policy, programme or project intervention or be limited to a particular development sector (e.g. the environmental, educational, transport or health sector, a geographical area or community), or confined to a particular programme or project phase or stage of an intervention (such as its inputs, resource conversion or management processes, outputs, outcomes or impacts).

121

Evaluation Models, Theories and Paradigms Evaluation approaches based on scope, include: • Systemic evaluation • Policy evaluation • Programme and project evaluation • Evaluability assessment • Community evaluation • Meta-evaluation • Input evaluation or feasibility studies • Process or ongoing evaluation • Output evaluation • Outcome evaluation • Impact evaluation or impact assessment Only organisational evaluations (and not individual performance evaluations), are dealt with further below, subdivided into two categories, namely evaluation focused on a particular intervention in its entirety, be that a policy, a programme, a project or a product; and evaluation focused on all aspects or a particular element of the intervention, such as its inputs, processes, outputs, outcomes or impact. Evaluations may be focused on the following types of interventions.

4.3.1 Evaluating the entire intervention Systemic evaluation The focus of the evaluation may be on the entire system or subsystem, which includes then both the specific intervention under consideration, but extends also to all aspects in the immediate and broader environment that may influence or be influenced by the intervention (e.g. a comprehensive organisational evaluation of a NGO, municipality or government agency against the background of the context or environment within which it operates). Systemic evaluation was developed for operations research and based on critical systems thinking. It is based on the following principles: • What lies beyond the inquiry – both inside and outside of the system under evaluation – has important significance for the inquiry. • As the evaluator is part of the system, the evaluation’s purpose, values, methods, analysis and conclusions must be openly debated and clarified. • Practical improvement is inherent in systemic evaluation. This necessitates ethical clarification on what is “in” and “out” of the valuation, and for whom there is improvement (Rogers & Williams, 2006:88). Policy evaluation Owen (2006:23) argues that “(p)olicies can be considered as the most pervasive form of social intervention”. Two investigatory activities may be applied to policies, namely policy analysis, which describes the process of developing the policy, including the alternative options available and assumptions upon which decisions are made, and policy research, which aims to determine policy impact to inform further policy decisions (Owen, 2006:26). Policy evaluation evaluates alternative public policies, in terms of their individual ability

122

Chapter 4 to deliver on stated outcomes, as well as comparison between policies to determine which result is more desirable in terms of effectiveness, efficiency and peripheral consequences. One of the first policy scientists, Nagel (2002:xi) stated that “(p)ublic policy evaluation involves deciding among alternative ways of resolving controversies regarding what should be done to deal with economic, technological, social, political, international, and legal problems at the societal level. Systematic evaluation involves processing (a) goals to be achieved, (b) alternatives available for achieving them, and (c) relationships between the goals and the alternatives to decide on the best alternative, the best combination of alternatives, or the best allocation among the alternatives. Winwin evaluation involves choosing policy alternatives that can enable conservatives, liberals, and other major groups to simultaneously achieve results that are better that their best initial expectations”. Policy evaluation may be applied to several policy sectors simultaneously to evaluate the combined effects and consequences of these policies. Examples are the South African Presidency’s Ten Year Review (Presidency, 2003), its Fifteen Year Review (Presidency, 2008), Twenty Year Review (Presidency, 2014) or the Ten Year Review of the African Peer Review Mechanism completed by the South African Institute for International Affairs (Turianskyi, 2013). Alternatively policy evaluation may limit its scope to a sectoral evaluation that focuses on one policy sector only (e.g. the social, environmental or trade union sector). Programme monitoring and programme evaluation Programme monitoring “is the systematic documentation of aspects of programme performance that are indicative of whether the programme is functioning as intended or according to some appropriate standard. Monitoring generally involves programme performance related to programme process, programme outcomes, or both” (Rossi, Lipsey & Freeman, 2004:431). Programme evaluation “entails the use of social research methods to systematically investigate the effectiveness of social intervention programmes in ways that are adapted to their political and organisational environments to inform social actions that may improve social conditions” (Rossi, Lipsey & Freeman, 2004:431). Programme evaluation “is the systematic assessment of programme results and … the systematic assessment of the extent to which the programme caused those results” (Wholey, Hatry & Newcomer, 2004:xxxiii). Project evaluation entails the same with a particular project, not a programme, as the focus of the evaluation. Evaluability assessment An evaluability assessment may be undertaken formatively to determine the information needs of policy makers and managers, the feasibility and cost of answering alternative evaluation questions and the likelihood of results being used (Wholey, Hatry & Newcomer, 2004:2). The evaluability assessment determines whether minimal preconditions for evaluation have been met before the actual evaluation can take place. An evaluation is regarded as evaluable if the programme goals are well-defined and plausible; the relevant data can be reasonably obtained; and the intended users have agreed on how they will use the information generated through the evaluation (Wholey, Hatry & Newcomer, 2004:34).

123

Evaluation Models, Theories and Paradigms Interventions should meet the following criteria before an evaluation study can be undertaken successfully: • The intervention goals, objectives, side effects and information needs must be well  defined • The programme goals and objectives must be plausible • The relevant performance data must be accessible • The intended users of the evaluation results must have agreed on how the evaluation results will be used (Wholey in Rossi, Lipsey & Freeman, 2004:137). Dahler-Larson (2012:34) however reiterates the importance of earlier writings of Wholey (1987) and Shadish, Cook and Leviton (1991) that evaluability assessment does not only determine whether an evaluation can be done, but also whether it is sensible to do the evaluation, given the circumstances and possible repercussions that may flow from such an exercise. To further evaluability assessments, a comprehensive list of nineteen questions pertaining to the characteristics of the evaluand, availability of alternative knowledge streams, consequences of the evaluation and the democratic nature of the evaluation system are proposed (see Dahler-Larsen, 2012). Community evaluation Community-based evaluation is focused on a particular community, which may be geographically based, or spatially spread, but with similar characteristics such as ethnicity, interest or ideology. Evaluations are conducted in partnership with the community, with community-sensitive evaluation methods and measures, and community-focused reporting and dissemination (Conner in Mathison, 2005:69-70). Meta-evaluation Meta-evaluations comprise the evaluation (not the mere description of what the process entailed) of evaluations and evaluators (Scriven in Mathison, 2005:249-251). A particular evaluation study thus becomes the focus of another evaluation. “Triangulation and meta evaluation should be major parts of the methodology. Interpretations by evaluators and others should be scrutinized by colleagues and selected stakeholders … to identify shortcomings in design and poor interpretations” (Stufflebeam interpreted by Stake, 2004:215).

4.3.2 Evaluating parts of an intervention An evaluation study may be confined only to certain parts of an intervention, such as the availability of resources, the implementation processes, or the outputs or outcomes of the intervention, whereas an integrated evaluation combines the various focus areas into an overall assessment. Input evaluation or feasibility studies Input evaluation or feasibility studies assists decision makers to examine alternative strategies for addressing the assessed needs of targeted beneficiaries, develop an appropriate plan and budget, thereby preventing failure or waste of resources (Stufflebeam, 2004:338‑339). Criteria include:

124

Chapter 4 • • • • • • •

esponsiveness to assessed needs of targeted beneficiaries R Responsiveness to targeted organisational problems Potential effectiveness Availability of financial resources Political viability Administrative feasibility Potential for important impacts outside the local area

Feasibility studies are often undertaken to explore alternative implementation options for a policy or intervention. It involves the identification of potential costs, benefits, constraints and predicted impacts of alternative options (Shafritz, 2004). Feasibility studies make use of various statistical and other trend identification, assessment and projection techniques, modelling, cost-benefit analysis and alternative scenario-building approaches to test the feasibility of alternative options and inform the final decision on the most desirable course of action. Process evaluation Process monitoring and evaluation is the “systematic and continual documentation of key aspects of programme performance that assess whether the programme is operating as intended or according to appropriate standards. The focus is on the integrity of the programme operations and actual service delivery to the target audience” (Rossi, Lipsey & Freeman, 2004:431,171). It investigates the operation of the programme, including whether the administrative and service objectives of the programme are being met; whether services are delivered in accordance with the goals of the programme; whether services are delivered to appropriate recipients and whether eligible persons are omitted from the delivered service; whether clients are satisfied; whether the administrative, organisational and personnel functions are well-administrated; whether service delivery is well-organised and in line with programme design and other specifications (Rossi, Lipsey & Freeman, 2004:56-57, 78, 171). Cloete sees progress monitoring as necessary “to keep track of the time frame, the spending programme, the progress towards objectives and the quality and quantity of outputs … through project management techniques. The focus is primarily on the effectiveness, efficiency and levels of public participation in the implementation process” (Cloete, 2008:6). One method, “costbenefit analysis” comprises an efficiency assessment that provides a frame of reference for relating costs to programme results. Cost-benefit analysis compares the direct and indirect benefits (outcomes) of the programme with the direct and indirect costs (inputs). The process involves attaching financial values to the costs, benefits, loss of opportunity and externalities of the programme, expressing the result of the analysis in monetary terms. Three perspectives may be calculated, namely the cost benefit to the individual, to the programme sponsor and/or to the community at large (Rossi, Lipsey & Freeman, 2004:332357; see also Kee, 2004:506). A second method, “cost-effectiveness analyses”, compares the cost and result implications of alternative programmes with the same end-goal. It expresses the respective efficiencies of different programmes in substantive terms, thereby determining the programme(s) more efficient at achieving the stated goal (Rossi, Lipsey & Freeman, 2004:332, 363). Cost-effectiveness assesses the ability to achieve objectives and

125

Evaluation Models, Theories and Paradigms outcomes at a reasonable cost by calculating the cost per unit or the cost per beneficiary of a particular service (Save the Children, 1995:194; Kee, 2004:506). Output evaluation Outputs are the tangible products that result from activities. Output evaluation typically measures the quantity, quality and diversity of services delivered by a specific intervention with the typical evaluation question being what did we do? It may therefore also focus on the number of service recipients or types of services rendered (Mark in Mathison, 2005:287). As bureaucratic accountability normally emphasises the delivery of planned outputs, output measures often takes precedence in the internal monitory systems of government  organisations. A form of output evaluation is product evaluation which entails evaluating the product against quality assurance standards. In the social context, product evaluation measures, interprets and judges achievements to ascertain the extent to which the evaluand met the needs of the rightful beneficiaries. Product evaluation should assess intended and unintended, as well as positive and negative outcomes, both short and long term (Stufflebeam & Shinkfield, 2007:344-345). Outcome evaluation Outcome evaluation provides information on important programme outcomes or end results to assess the effectiveness or benefits of the programme for the target group (Rossi, Lipsey & Freeman, 2004:224-225; Chen, 2005:35; Weiss, 1998:8). As such, it “entails the continual measurement and reporting of indicators of the status of social conditions a program is accountable for improving” (Rossi, Lipsey & Freeman, 2004:430). “The aim is to verify whether clients are better off after receiving the services” (Chen, 2005:184). Typical questions included in an outcome evaluation include: • Are the outcome goals and objectives being achieved? • Do the services benefit the recipients? Are there adverse consequences? • Is the problem addressed or improved? (Rossi, Lipsey & Freeman, 2004:78). In measuring outcomes, distinctions must be made between the: • outcome level, which refers to the actual outcome measurement at a particular point of  time • outcome change, reflecting the difference between outcome measurements at different points in time • programme effect, determining which portion of an outcome change is attributable to the programme and not to external factors (externalities) • proximal outcomes that manifests immediately or soon after completion of the intervention (e.g. the knowledge gain reflected by training participants) versus distal outcomes that manifests only in the medium (6 months to 2 years) or long term (2 to 20 years) after the completion of the intervention (e.g. the improvement in practices as a result of the training) • intended outcomes of the intervention versus the unintended positive or negative outcomes of the intervention. The multiple outcomes pursued by interventions

126

Chapter 4 complicate evaluation results, especially when the intervention renders positive changes for some outcomes, but neutral or even negative changes for other outcomes • direct versus indirect beneficiaries of the attained outcomes. As with multiple outcomes, the spill-over effects of interventions may influence the final evaluation findings. For example, the training of laid-off workers from the Johannesburg clothing industry in the late 1990s in basic business skills rendered better results when these workers shared their knowledge with non-South African residents (unintended beneficiaries) which enabled these entrepreneurs to move into a market gap for unique Africanstyle  clothing. Outcome evaluations may focus on “the individual level (changes in knowledge, skills, attitudes), organisational level (changes in policies, practices, capacity), community level (changes in employment rates, school achievement, recycling), and the policy or government level (changes in laws, regulations, sources of funding)” (Mathison, 2005:287). Impact assessment Impact assessment determines “the extent to which a program produces the intended improvements in the social conditions it addresses”. It may either refer to long-term outcomes or describe the effect of a programme on the wider community (Weiss, 1998:8). It tests whether the desired effect on the social conditions that the programme intended to change, was attained and whether those changes included unintended side effects (Rossi, Lipsey & Freeman, 2004:58, 427). Owen (2006:255) elaborates by explaining that impact evaluation tests whether the objectives or needs have been met, whether there were unintended benefits, whether the implementation process and outcomes are responsible for the result, whether the same effect will be attained under different circumstances, and whether there are more efficient alternatives. It differs from outcome evaluation in that it has a longer term focus, and its focus is beyond the direct beneficiaries of the intervention on society as a whole. Impact assessment studies are also interested in determining causality: whether the observed changes can accurately be contributed to the intervention in question, and not some other external cause (e.g. Becker, 2003 and Vanclay, 2003). As such, impact assessment is the most difficult of evaluation studies, frequently requiring rigorous methods and preand post-test design as proposed by the quasi-experimental tradition. Impact assessment studies often necessitates large scale sample surveys with randomised pre-test and post-test evaluations of the target population and control groups, quasiexperimental designs with before and after comparisons of project and control populations, ex-post comparison of project and non-equivalent control groups, and rapid assessment ex-post impact evaluations and participatory appraisals where estimates of impact are obtained from combining group interviews, key informants, case studies and available secondary data (Kusek & Rist, 2004:22-24). Integrated evaluation Integrated evaluation combines the various focus areas into an overall assessment. A useful approach is the CIPP model developed by Stufflebeam (2004) which includes the following four aspects in a holistic evaluation: • C ontext: Analysing needs, problems, assets, opportunities to define goals and priorities;

127

Evaluation Models, Theories and Paradigms • I nputs: Considering alternative approaches, competing plans and budgets for feasibility and potential goal effectiveness; • Process: Tracking the implementation of plans; and • Product: Identifying and assessing intended/unintended, short/long-term outcomes (Stufflebeam & Shinkfield, 2007:326; Stufflebeam, 2004:246). As part of the CIPP model, Stufflebeam developed various checklists that assist the evaluator in ensuring a thorough evaluation covering all critical aspects. A series of these checklists are available online at http://www.wmich.edu/evalctr/checklists/ checklistmenu.htm#models.

4.4

Evaluation approaches based on an explicit philosophy or formal substantive theory

Various philosophical approaches to evaluation are purposively adopted and pursued in evaluation studies. These approaches range from “a largely positivistic perspective on the one hand where quantitative approaches are used to generate information that is analysed along so-called scientific criteria, to the more interpretative and constructivist approaches on the other hand which privileges the generation of local knowledge, learning and use. This has presented itself as a dualism that has distinguished adherents into two camps, the quantitative or scientific versus the qualitative or interpretative, the former being seen as closer to the pure sciences and the latter to the social sciences” (Naidoo, 2007:31). Some of the previous evaluation approach classification attempts distinguish between value-driven and use-driven evaluation approaches. Problematic to this distinction are that all evaluations inherently entail a value judgement (good or bad, in Scriven’s simple distinction) and that all evaluations are conducted with a particular end-use or purpose in mind. It is however useful to differentiate between approaches that are theory-driven and that pursues, through a more scientific approach, the aim of expanding knowledge about the evaluand, from the participatory evaluation philosophies that favour a more social science approach to evaluation research with the general aim of empowerment and creating shared understanding.

4.4.1 Theory-driven evaluation approaches Theory-based evaluation entails the identification of the critical success factors of the evaluation, linked to an in-depth understanding of the workings of a programme or activity (the “theory of change”, “programme theory” or “programme logic”. Theory-driven evaluation is therefore “the systematic use of substantive knowledge about the phenomena under investigation and scientific methods to improve, to produce knowledge and feedback about, and to determine the merit, worth and significance of evaluands” (Donaldson & Lipsey, 2006:67). The approaches in this category are all based on an implicit “theory of change” (e.g. how to reduce crime, poverty and disease and achieve growth and development), which links the evaluation with intended improvements in practice (Rogers & Williams, 2006:77). It does not assume simple linear cause and effect relationships, but allows for the mapping

128

Chapter 4 and design of complex programmes. Where evaluation data indicates that critical success factors of a programme have not been achieved, it is concluded that the programme will be less likely to succeed (Kusek & Rist, 2004:10). Selected evaluation approaches in this sub-category are: • Goal-free evaluation • Clarificatory evaluation • Illuminative evaluation • Realist evaluation • Cluster evaluation and multisite evaluations • Developmental evaluation Goal-free evaluation This is another inductive approach standing in opposition to utilisation-focused evaluation (see next section), the usefulness of which was promoted by Scriven (1974). “In this approach, the evaluator purposely remains ignorant of a programme’s printed goals and searches for all effects of a programme regardless of its developer’s objectives. If the programme is doing what it is supposed to do, the evaluation should confirm this, but the evaluator will also be more likely to uncover unanticipated effects that the goal-based evaluations would miss because of the preoccupation with stated goals” (Stufflebeam & Shinkfield, 2007:374). Goal-free evaluation studies all aspects of the programme and notes all positive and negative aspects without focussing only on information that supports the goals (Posavac & Carey, 1997:23-27). The approach enhances the objectivity of evaluators during the evaluation process, as it does not prescribe what the evaluation should produce or focus on. It is also particularly useful in evaluations that aim to determine unintended consequences of an intervention. Clarificatory evaluation Clarificatory evaluation, also known as conceptual evaluation, programme needs assessment or the assessment of programme theory, assists in clarifying or developing the programme plan and analysing the programme assumptions and theory to determine its reasonability, feasibility, ethicality, and appropriateness and improve coherence (Chen, 2005:127; Rossi, Lipsey & Freeman, 2004:55,93; Owen, 2006:191). Evaluators become part of the team that “together, interpret findings, analyse implications and apply results to the next stage of development” (Patton in Mathison, 2005:116). Typical questions to be answered with the evaluation include: • What are the nature and magnitude of the problem? • Characteristics and needs of the population? • Required services? How much? When? Delivery mechanisms? Ideal organisation of programme? • Required resources? (Rossi, Lipsey & Freeman, 2004:18, 77). Clarification evaluation tests the logic of the intervention, the feasibility of the design, encourages consistency between design and implementation and provides the foundation for monitoring and impact evaluation (Owen, 2006:192). A useful approach is to draw the theory of change or “logic model” for the intervention to provide a picture of how it

129

Evaluation Models, Theories and Paradigms is believed the intervention will work to bring about desired results through a specific sequence of activities (Kellogg Foundation, 2004:10). The logic model presents a systematic and visual illustration of the relationship between the resources of a programme; the activities; the tangible deliverables it produces; and the changes or results it wishes to achieve. Clarifying programme theory helps to test the validity of the assumptions that certain activities and outputs will deliver the envisioned outcomes and impact. The developed programme theory provides the framework for the development or adoption of indicators that will measure various stages of the implementation or results of the process. The basic components of the logic model are depicted in Figure 4.2. Figure 4.2: The Logic Model

Resources / Inputs

Activities

Your planned work

Outputs

Outcomes

Impacts

Your intended result (Source: Kellogg Foundation, 2004:9)

In this model, “resources include the human, financial, organizational, and community resources a programme has available to direct toward doing the work …, activities are the processes, tools, events, technology, and actions that are an intentional part of the programme implementation …, outputs are the direct products of programme activities and may include types, levels and targets of services to be delivered by the programme …, outcomes are the specific changes in programme participants’ behaviour, knowledge, skills, status and level of functioning … [and] impact is the fundamental intended or unintended change occurring in organizations, communities or systems as a result of programme activities” (Kellogg Foundation 2004:10). Contextual factors that may influence the observed outcomes may include antecedent variables (characteristics of the beneficiaries or environment present at the start of the programme) or mediating factors (external influences that emerge during programme implementation) (McLaughlin & Jordan, 2004:10). To assist evaluators in thinking through the logic of a programme, Kusek and Rist developed a “CORAL questionnaire” with guiding questions within each of the following areas: • Identifying concerns of stakeholders • Desired outcomes of solutions • Identifying known or likely risks • Credibility of the assumptions on which logic is based • Enabling feedback of new programme logic and knowledge into implementation systems (Kusek & Rist, 2009:190). Another tool that assists in clarifying the intervention logic is “Logframes”, or the logical framework approach, originating from performance management efforts in the US Navy

130

Chapter 4 in the 1960s and still commonly used in international development (Rogers in Mathison, 2005:235). Logical Framework Analysis is a widely used tool that assists in “testing the logic of a plan of action by analysing it in terms of means and ends. This helps to clarify how the planned activities will help to achieve the objectives [and assesses] the implications of carrying out the planned activities in terms of resources, assumptions and risks” (Save the Children, 1995:178). The logframe consists of a narrative summary of the programme logic, divided into four levels: the goals to be achieved; the purpose of the project; the outputs and activities to be produced; and sometimes the required inputs as well (Rogers in Mathison, 2005:235; Valadez & Bamberger, 1994:85). “The logical sequence of these activities is stated in the following way: • If INPUTS are provided at the right time and in the right quantities, then OUTPUTS will be  produced. • If OUTPUTS are produced, then PURPOSE (impact/benefits) will be obtained. • I f PURPOSE is obtained, then GENERAL GOALS will be achieved” (Valadez & Bamberger, 1994:85-86). The strength of the statistical association between elements may be assessed through multiple regression analysis (Valadez & Bamberger, 1994:92-94) whilst verifiable indicators are also set for each level (Rogers in Mathison, 2005:235). It becomes useful to present the Logical Framework as a table or matrix, as illustrated by Rakoena (2007:slide 13) in Table 4.3. Rogers warns that there is a risk in using logic models “excessively focused on intended processes and outcomes, as they can lead to evaluations that search only for confirming evidence and not for evidence of unintended outcomes and the influence of other factors” (in Mathison, 2005:234). Developing the theory of change, however, allows evaluators to obtain a more in-depth understanding of the workings of a programme or activity. Whilst the simple version of the logic model presents a simple linear cause-and-effect relationship, recent research by Funnell and Rogers (2011) provide for the development of complex and complicated theories of change, acknowledging that interventions pursues multiple outcomes, multiple outputs through multiple delivery chains (please refer to Chapter 3 for a more detailed discussion on theories of change). During clarification evaluation evaluators will scrutinise the envisioned theory of change to clarify how the intervention will work and to expand the developed theory of change, where necessary, to accurately reflect the complexities of the intervention. The developed logic model also forms the basis for the development of measures and evaluation questions at each stage of the intervention that may assist to guide implementation processes but also to conduct ex-ante evaluation studies on the actual implementation and results from the  intervention.

131

Evaluation Models, Theories and Paradigms Table 4.3: Guidelines for completing a logframe matrix Narrative Summary

Measurable Indicators

Means of Verification

Important Assumptions

Goal (Or Overall Objective)

What are the quantitative, or qualitative, ways of judging whether the broad objectives have been achieved? (Quantity, quality and time)

What sources of information exist, or can be provided cost effectively?

(Sustainable Long-term Objectives)

What are the qualitative measures or qualitative evidence, by which achievement and the distribution of effects and benefits can be judged? (Quantity, quality and  time)

What sources of information exist, or can be provided cost effectively? Does provision for information collection need to be provided for under the Inputs/ Resources?

(Purpose to Goal)

What kind and quantity of outputs, and by when will they be produced? (Quantity, quality and time)

What are the sources of information?

(Output to Purpose)

Inputs/Resources

What are the sources of information?

(Activity to Output)

What are the wider problems which the project will help to resolve? Purpose (Or Specific Objectives) What are the intended immediate effects on the project area or target group? What are the extended benefits and to whom are they intended? Outputs (Or Results) What are the project outputs to be produced by the project in order to achieve the project pupose?

Activites (Or Work Elements) What are the work elements that should be undertaken in order to accomplish the outputs, and when?

What materials/equipment or services (e.g. training) are to be provded to achieve outputs, and at what cost as well as over what period of time?

What external factors are necessary for sustaining the project objectives in the long run?

What are the conditions external to the project necessary if the achievement of the project’s purpose is to contribute to the attainment of the project’s goal?

What are the factors not within the control of the project which, if not present, are liable to hamper the project progress from achieving its purpose?

What external factors must be realised to obtain the planned outputs on time?

(Source: Rakoena, 2007:slide13)

Illuminative evaluation Illuminative evaluation is an inductive approach that deters from adopting a particular philosophy or model. Its primary concern is with description and interpretation rather than with measurement and prediction. Hamilton explains that its aim is “to study the innovatory programme, its significant features, recurring concomitants and critical processes”. Hamilton refers to three overlapping stages of illuminative evaluation: observation, further inquiry and seeking to explain. “Overall illuminative evaluation concentrates on the information gathering rather than the decision-making component of evaluation. The task is to provide a comprehensive understanding of the complex reality surrounding a programme: in short, to ‘illuminate’” (Hamilton in Mathison, 2005:191-194). Realist or realistic evaluation Realist evaluation, advocated by Pawson and Tilley (1997), extends the experimental tradition of Campbell and Stanley, but takes a different view of what constitutes

132

Chapter 4 experimentation (Mouton, 2007:504). Tilley (in Mouton, 2007:507) explains that “in the case of social programmes we are concerned with change. Social programmes are concerned with effecting a change in a regularity … [which initially] is deemed … to be problematic … The aim of a programme is to alter these regularities. Thus, where science is concerned with understanding regularities, evaluation of programmes is concerned with understanding how regularities are altered”. Realist evaluation tries to establish why, where, and for whom programmes work or fail by identifying the mechanisms that produce observable programme effects and testing the mechanisms, as well as other contextual factors, that may have caused the observed effect (Henry in Mathison, 2005:359). It thus tests whether there is an unequivocal causal relationship between a programme and its outcomes to establish beyond doubt that it was the actual programme which caused the measurable change, and not some other, unidentified, variable. It poses the question whether the same programme will be successful everywhere. The aim with realist evaluation is to determine the causal factors in the context which causes the perceived evaluation result (Mouton, 2008). The evaluation is based on a CMOC Framework, which describes the context mechanism outcome configuration of the intervention, where: • Context refers to the prior set of social rules, norms, values and interrelationships • Mechanism refers to the programme activities and outputs • Outcome refers to the perceived change that must be explained by evaluation Realist evaluation develops CMOC theories that explain the particular aspects of the mechanism and context that produce the final outcome. Figure 4.3 depicts the relationship and experimental design that underpins this evaluation approach. Figure 4.3: CMOC Framework Hypothesis – theory of change

Revise understanding of CMO framework

Data collection on appropriate mechanisms and contexts

Data analyses – analyse outcome patterns to see what can/cannot be explained

(Source: Mouton, 2007:508)

133

Evaluation Models, Theories and Paradigms Cluster evaluation “Cluster evaluation seeks to determine impact through aggregating outcomes from multiple sites or projects, whereas multisite evaluation seeks to determine outcomes through aggregating indicators from multiple sites. It looks across a group of projects to identify common threads and themes that, having cross-project confirmation, take on greater significance.” (Russon in Mathison, 2005:66-67) Although cluster evaluation may need input from the programme managers and stakeholders in the various localities, these role players do not determine the evaluation questions, process and methods, and thus it does not reside under the participatory approaches to evaluation. The aim with cluster evaluation is to clarify and verify the validity of the theory of change. Developmental evaluation Emerging from Patton’s earlier work on participatory and use-driven evaluation (see utilisation-focused evaluation in the next section), the developmental evaluation approach rejects the notion that a preordained theory of change will meet the emerging needs of those who need to respond to the evaluation findings. The developmental evaluation approach supports continuous innovation and development of the theory of change to enable the programme to adapt to emergent and dynamic realities in complex environments. “Informed by systems thinking and sensitive to complex nonlinear dynamics theory, developmental evaluation supports social innovation and adaptive management. Evaluation processes include asking evaluative questions, applying evaluation logic, and gathering real-time data to inform ongoing decision making and adaptations” (Patton, 2011:1).

4.4.2 Participatory evaluation approaches Participatory evaluation is an overarching term for any evaluation approach that “involves programme staff or participants actively in decision making and other activities related to the planning and implementation of evaluation studies” (King in Mathison, 2005:291-294). In participatory or inclusive evaluation the evaluation team consists of the evaluator (either as team leader or as supportive consultant) and representatives from stakeholder groups, who together plan, conduct and analyse the evaluation. The degree of participation can range from shared evaluator-participant responsibility for evaluation questions and activities, to participants’ complete control of the evaluation process. With shared responsibility, the evaluator is responsible for the quality of the process and the outcomes, but designing and conducting the evaluation is done in collaboration with stakeholders. In evaluations where participants control the evaluation, the evaluator becomes a coach or facilitator who offers technical skills where needed. Decisions on important evaluation considerations like the preferred design and methodology become the joint decision of the evaluator and included stakeholders (Mertens in Mathison, 2005:187-198). In a sense, all evaluations have some participation from stakeholders as evaluators need to interact with stakeholders to obtain information. However, a study has a participatory philosophy when the relationship between the evaluator and the participants provides participants with a substantial role in making decisions about the evaluation process (King in Mathison, 2005:291-294).

134

Chapter 4 Participatory evaluation approaches provide great benefit in settings where there is low capacity or where buy-in needs to be established to ensure the utilisation of  results: • Critical theory evaluation • Naturalistic, constructivist, interpretivist or fourth-generation evaluation • Appreciative inquiry • Evaluative inquiry • Responsive evaluation • Democratic evaluation • Empowerment evaluation • Utilisation-focused evaluation Critical theory evaluation Critical theory evaluation aims to determine the merit, worth or value of something by unveiling false culturally-based perspectives through a process of systematic inquiry. The evaluation is informed by a critical social science epistemology and tries to reveal structural injustices to generate action that may address them (Greene, 2006:129). “In positioning stakeholders as reflective and dialogic agents in discerning what is needed, what is good, and why this is so, critical theory evaluation seeks to change the way things are by challenging the way we make sense of things.” MacNeil in Mathison (2005:92-94) states that critical theory evaluation is based on the premises that “we operate beneath layers of false consciousness” of our own perceptions of the world and that we should change the state of affairs through critical reflection. “Critical theory evaluation seeks to engage evaluation participants in a dialectic process of questioning the history of their ideas and thinking about how privileged narratives of the past and present will influence future value judgements. Recognizing how power presents itself and the situated position of power during an evaluation is a central characteristic of critical theory evaluation … The primary roles of the critical theory evaluator are that of educator and change agent.” (MacNeil in Mathison, 2005:92-94). Feminist evaluation is a specific form of critical theory evaluation, aiming to promote social justice, particularly for women (Seigart in Mathison, 2005:154-157). Another related approach is transformative or inclusive evaluation, which also actively tries to include the least advantage. Special effort is made in the inclusive approach to include those who have been traditionally under-represented. While traditionally included groups are not excluded during the inclusive process, the process explicitly recognises that certain groups and viewpoints have been absent or misrepresented in the past and that inclusion of these voices is necessary for a rigorous evaluation (Mertens, 1999:6). Naturalistic, Constructivist or Fourth-generation evaluation Naturalistic evaluation also referred to as constructivist, interpretivist or fourth-generation evaluation, attempts to blend the evaluation process into the lives of the people involved. Fourth-generation evaluation incorporates three previous eras (description, judgement and expanded stakeholders) of evaluation to focus both on the tangible, countable reality and

135

Evaluation Models, Theories and Paradigms the intangible socially-constructed reality (what people believe to be real) (Lincoln & Guba, 2004:228). Naidoo (2007:24) explains: “Since context produced the issues and problems, it is only by returning to the same contexts that problems can be solved”. The main objective is “to judge the merit or worth of the evaluand in ways natural to the setting, expectations, values, assumptions, and dispositions of the participants, with minimal medications due to the inquiry processes used and assumptions held by the evaluator” (Williams in Mathison, 2005:271). Naturalistic evaluations are based on the following assumptions (see Lincoln in Mathison, 2005:161-164): • Stakeholders respond not only to the physical reality, but also to their socialpsychological constructs, including values and beliefs used to make sense of reality. During the evaluation process, it is of equal importance to collect information on these intangible realities, in addition to the tangible reality. • The original programme objectives are not the only focus of the evaluation process, but should be expanded with stakeholders’ claims, concerns and issues. • Quantitative research methods using controlled experiments are unlikely to uncover social constructs of stakeholders and should be augmented with qualitative research  methods. • Values are assigned a central role in the evaluation, as they provide the basis for determining merit. The values of stakeholders, values inherent to the setting and conflict in values are critical in formulating judgements and conclusions about the evaluand. The two phases of constructivist evaluation are discovery and assimilation, which may be carried out sequentially or simultaneously. “The discovery phase of constructivist evaluation represents the evaluator’s effort to describe ‘what’s going on here,’ the ‘here’ being the evaluand and its context. The assimilation phase of constructivist evaluation represents the evaluator’s effort to incorporate new discoveries into the existing construction or constructions”. (Guba & Lincoln, 2001:2) A checklist explaining the approach and key elements of constructivist organisation as advocated by Guba and Lincoln may be accessed at http://www.wmich.edu/evalctr/ checklists/checklistmenu.htm#models. The interpretivist or constructivist evaluation approaches emphasise that the values that informs the evaluation, may differ from one context to the next. In an extension of this acknowledgement, there is a growing awareness that evaluation within an African culture context should be approached from a different value departure point than similar evaluations in a western or eastern context. The recent writings of Chilisa (2012), Chilisa, Merterns and Cram (2013), Malunga (2009a; 2009b) and Malunga and Banda (2011) provide early direction for this emerging debate on “African” evaluation will be followed with great interest as it further develops and matures. Appreciative and Evaluative inquiry Appreciative inquiry focuses on the strengths of a particular organisation or intervention with the assumption that focusing attention on the strengths will strengthen them further. Appreciative inquiry is based on the social constructivist concept that “what you look for is

136

Chapter 4 what you will find, and where you think you are going is where you will end off” (McClintock, 2004:15). It is based on five principles, namely: • Understanding the organisation is directly linked to the future of the organisation. • Inquiry always leads to change and cannot be separated from change. • Improving the organisation comes from the collective imagination of the stakeholders within the organisation. • Human resources inside the organisation determine the future of the organisation. • Momentum for change requires positive affect and social bonding of personnel in the organisation. (Preskill in Mathison, 2005:18-19) Evaluative inquiry responds to a range of information needs of decision makers and determining the worth of the programme may be one of these needs (Owen, 2006:17). Followers of this approach believe that the ultimate aim of evaluations is to produce useful findings that inform decision-making to bring about change and that the expected use should guide the evaluation’s design and implementation (Preskill, 2004:345). House (1993) has suggested that evaluative inquiry consists of collecting data, including relevant variables and standards, resolving inconsistencies in the values, clarifying misunderstandings and misrepresentations, rectifying false facts and factual assumptions, distinguishing between wants and needs, identifying all relevant dimensions of merit, finding appropriate measures of these dimensions, weighing the dimensions, validating the standards and arriving at an evaluative conclusion (House in Owen, 2006:17). Evaluative inquiry places emphasis on the importance of individual, team and organisational learning as a result of participating in the evaluation process. It therefore tries to incorporate evaluation into the normal operations of the organisation, supports individual evaluation efforts, encourages stakeholder participation and embraces diversity in perspectives, values and knowledge. The underlying assumption is that stakeholders develop new perceptions of the organisation and themselves during the evaluation process and the evaluation process should respond to the changing needs of the stakeholders. Institutional self-evaluation, a form of evaluative inquiry, entails members of the organisation describing their activities and actions to each other and the external interested parties. The aim is to obtain “information and members’ judgements about the worth of the activities and actions” (Mathison, 2005:201). Each of the three inquiry phases, that is, focusing the inquiry (deciding what to evaluate), carrying out the inquiry (with appropriate research design and methodology) and applying the learning (the evaluation team develops, implements and monitors strategies and action plans to address the evaluation concerns), tries to enhance team member participation, open dialogue and trust to maximise insight and understanding about the organisation’s context (Preskill in Mathison, 2005:143-146; Preskill, 2004:349). Responsive evaluation Responsive evaluation has emerged from the writings of Stake (1974). House (1980) regards the evaluation function as more formative than summative in nature, with no single “right” answer.

137

Evaluation Models, Theories and Paradigms “Responsive evaluation is an orientation … or disposition that favours personal experience. It draws on and disciplines the ordinary ways people perceive quality and worth … The essential intellectual process is responsiveness to key issues or problems, especially those recognised by people at the sites. It is not particularly responsive to programme theory or stated goals but more to stakeholder concerns … Evaluators must become well acquainted with programme activity, stakeholder aspirations and social and political contexts.” (Stake & Abma in Mathison, 2005:376-379) Responsive evaluation helps the client to understand problems and uncover strengths and weaknesses in the programme. The responsive evaluator searches for pertinent issues and questions throughout the study and attempts to respond in a timely manner by collecting and reporting useful information, even if the need for such information is anticipated at the start of the study (Stufflebeam & Shinkfield, 2007:415). It places emphasis on the context within which the evaluation takes place and the ultimate needs and usefulness of the evaluation by stakeholders (Mouton, 2008; Mouton, 2007:502). Stake’s “responsive clock” (see Figure 4.4) reflects the importance he placed on initial stakeholder analysis before the evaluation process commences. Figure 4.4: Stake’s responsive clock

Talk with clients, programme staff, audiences Assemble formal reports, if any Prepare into format for audience use Validate, confirm, attempt to disconfirm Thematise, prepare portrayals, case studies

Identify programme scope Overview programme activities Discover purposes, concerns Conceptualise issues, problems

Observe designated transactions and outcomes

Identify data needs re issues Select observers, judges, instruments if any (Source: Mouton, 2008)

Stufflebeam and Shinkfield provide the following comparison (Table 4.4) between evaluations with specific, predetermined theories of change that are either proved or disproved in the evaluation, and responsive evaluation. During preordinate evaluation, the evaluator predetermines the evaluation plan, based on the programme goals, which is then imposed on the programme. Responsive evaluation orients the evaluation to the programme activities, as opposed to the goals, thereby responding to various information needs and values with appropriate methods that emerge during the course of the programme implementation (Stake in Shadish, Cook & Leviton, 1991:270).

138

Chapter 4 Table 4.4: Preordinate evaluation versus responsive evaluation (Adapted from Stufflebeam & Shinkfield, 2007:416) Distinction

Preordinate evaluation

Responsive evaluation

Purpose

Determine goal achievement

Help address strengths and weaknesses

Services

Meet predetermined information requirements

Respond to audiences’ information requirements

Design

Pre-specified

Emergent

Methodology

Research model: intervene and observe

Observation of natural behaviour: particularise

Techniques

Experimental design, hypothesis, random sampling, tests, statistics

Case study, purposive sampling, observation, expressive reporting

Communication

Formal and infrequent

Informal and continuous

Value basis

Pre-stated objectives

Values of people at hand

Key trade-offs

Sacrifice direct service to programme to ensure objectivity

Sacrifice precision in measurement to increase usefulness

Democratic evaluation Democratic evaluation “is an approach to evaluation that uses concepts from democracy to arrive at justifiable evaluative conclusions … by considering all relevant interests, values, and perspectives” to arrive at conclusions that are impartial to values (House & Howe, 2000; House, 2004:220). Democratic evaluation allows the multiple reality of a programme to be portrayed, providing decision makers with a variety of perspectives and judgements to consider (MacDonald (1979) in Alkin & Christie, 2004:40). House (1991, 1993) argues that “evaluation is never value neutral; it should tilt in the direction of social justice by specifically addressing the needs and interests of the powerless”, thereby promoting social justice to the poor and marginalised through the evaluation process (Alkin & Christie, 2004:41). Mouton (2008) explains that while experimental design is value neutral, empowerment evaluation focus on promoting certain values in designing the evaluation. House (as explained by Mouton 2008) proposes that evaluation leads to enhancing social justice through improved interventions: Who are the targeted group? Do they benefit more than others? Evaluation thus becomes a democratising force with evaluators advocating on behalf of disempowered groups (Mouton, 2007:502). Democratic evaluation incorporates democratic processes within the evaluation, to assure valid conclusions within conflicting views. It extends impartiality by including relevant interests, values and views so that conclusions are unbiased in value as well as facts. Although all value positions are included during the evaluation process, they are subject to criticism, like any other findings of the evaluation. The guiding principles of the approach are the inclusion of the interests, values and views of major stakeholders; extensive dialogue with and between stakeholder groups and extensive deliberation to discover true interests of stakeholders before arriving at conclusions (House & Howe, 2000:1). A checklist for deliberatively democratic evaluations may be accessed at http:// www.wmich.edu/evalctr/checklists/checklistmenu.htm#models.

139

Evaluation Models, Theories and Paradigms Democratic evaluation can be distinguished from bureaucratic evaluation, where the evaluator provides an unconditional service to government agencies and the results of the evaluation is the property of the commissioning agency and not available for public knowledge. It is also distinguished from autocratic evaluation where the evaluator provides a conditional service to government agencies but retains the ownership of the evaluation products. What differentiates democratic evaluation is the need for accessibility of evaluation methods and results to multiple, non-specialist audiences (Greene, 2006:119- 120). Empowerment evaluation In an attempt to optimise the usage of evaluation results, authors such as Greene (1988), Mark and Shotland (1985) and Fetterman (1996) advocated greater stakeholder participation in the design and implementation of the evaluation study (Mouton, 2007:500501). Empowerment evaluation uses the evaluation process to foster self-determination with the help of the evaluator coach or critical friend. The evaluator helps the group to determine their mission, take stock through evaluation tools of the current reality and to set goals and strategies based on the self-assessment (Fetterman, 2004:305). In empowerment evaluation, the evaluator’s role also includes the building the capacity of stakeholders to enable them to conduct independent evaluations (Rossi, Lipsey & Freeman, 2004:51). The main advantages of the approach include that: • The evaluation is useful to the stakeholder group • The evaluation promotes sense of ownership • Participants are able to use the evaluation findings throughout the project, not just after completion of the evaluation (Mouton, 2007:501) • The evaluators become facilitators and coaches, not judges in the process (Mouton, 2007:502) • It builds capacity and provides illumination and liberation for those involved in the evaluation (Mouton, 2007:502). An important addition to the above advantages is that empowerment evaluation alters the balance of power in programme context by enhancing the influence of stakeholders (Rossi, Lipsey & Freeman, 2004:51). Fetterman (1996) regards the goal of empowerment evaluation as fostering self-determination through the capacitation and illumination of programme participants and clients so as to enable them to conduct their own evaluations. The evaluator becomes a coach or facilitator who assists the client in the process (Alkin & Christie, 2004:55). Utilisation-focused evaluation Utilisation-focused evaluation (Patton, 1980) begins with the premise that evaluations should be judged by their utility and actual use; therefore, evaluators should facilitate the evaluation process and design any evaluation with careful consideration of how everything that is done, from beginning to end, will affect use (Patton in Mouton, 2007:504). Patton describes it as “evaluation done for and with specific intended primary users for specific, intended uses” (Patton, 2008:37). Owen describes 10 dimensions that should be considered in

140

Chapter 4 developing the evaluation plan. These include specifying the evaluand, the purpose of the evaluation, the primary audiences, available evaluation resources, the focus (elements) of the evaluation, the evaluation process (data collection, management and analysis) for each evaluation question, reporting strategies, codes of behaviour, budget and time deadlines and other issues that may arise from the negotiations (2006:73-74). Alkin (2004:299) distinguishes between five areas of evaluation activity, namely framing the context of use (client group, primary users, broader stakeholders), negotiating agreement on measures and procedures, establishing a framework for judging results, data collection and reporting and interpretation and facilitation of use. These authors reflect a growing recognition by the evaluation community that stakeholders regard evaluations as useful if they generate information not only on how well the policy, programme or project have done, but also if it provides relevant information on what ought to be changed to improve the success of the intervention (Chen, 2004:134). Evaluations must thus not only find problems, but also propose solutions to them. Owen confirms that the role of evaluators is changing from that of an independent judge to a collaborative consultant offering not only descriptions and judgements, but also prescriptions and recommendations (in Alkin & Christie, 2004:54). Wholey describes the new evaluator as one who believes in the organisation and helps it to succeed by starting with the traditional role of critic, but moving beyond that to assist the organisation to improve its performance (in Shadish, Cook & Leviton, 1991:234). The evaluation report should be logically structured and meet the needs of both the evaluator contractors and the main stakeholders (EDRP, 2009). Table 4.5 presents the basic format of the evaluation report, constructed from the guidelines of the EDRP (2009; see also NZAID, 2009; and the UN Population Fund in Wildschut, 2004:26). Table 4.5: Structure of the evaluation report Title page: ▪▪ Title and nature of evaluation ▪▪ Title of programme, phase, duration ▪▪ Identification of author, affiliation and designation ▪▪ Date and place of submission Table of contents ▪▪ Main headings and sub-headings ▪▪ Index of tables, figures and graphs Executive summary ▪▪ An overview of the entire report in no more than five pages ▪▪ It should include: A brief background of why the review or evaluation was carried out ▪▪ A succinct description of the methodology used and description of project/programme stakeholder participation in the evaluation ▪▪ Key findings, including intended and unintended changes/impacts as well as a description of how primary stakeholders perceive the changes brought about by the intervention(s) ▪▪ Value for money of the intervention ▪▪ Recommendations and suggested follow-up action

141

Evaluation Models, Theories and Paradigms Introduction ▪▪ Overview of the purpose of the evaluation and the structure of the report ▪▪ Description of the programme in terms of needs, objectives, aims, delivery systems ▪▪ The context in which the programme operates ▪▪ Purpose of the evaluation in terms of scope and main evaluation questions ▪▪ Main users of the findings/report ▪▪ Description of other similar studies that have been done Research methodology ▪▪ Design of research ▪▪ Implementation of research and collection of data ▪▪ Methodology used (including who participated, how and at what stage) ▪▪ The timing of the review or evaluation ▪▪ Analysis of data Evaluation results Findings and conclusions: ▪▪ What changes have been brought about by the intervention – positive and negative, intended and unintended, qualitative and quantitative? ▪▪ What have been the differential effects of the intervention on men and women? ▪▪ What has been the cost of the intervention(s) compared to the programme results? Is it value for money? ▪▪ Discussion of the reasons for successes and failures, especially the constraints and enabling factors ▪▪ Other cross-cutting issues (e.g. human rights, etc.) ▪▪ Implications of the findings for future activities. Based on the evaluation findings and drawing from the evaluator(s) overall experience in other contexts, provide lessons learned (both the best and worst practices) that may be applicable in other situations as well. Include both positive and negative lessons Recommendations Base recommendations on the conclusions and lessons learned, and discuss their anticipated implications List proposals for action to be taken (short and long term) by the person(s), unit or organisation responsible for follow-up in priority order Annexes ▪▪ Terms of reference of the evaluation ▪▪ References and sources ▪▪ Glossary of acronyms used ▪▪ Diagrams, drawings and photographs generated through the participatory processes ▪▪ Names of evaluators and their companies (CV should also be shown, but summarised and limited to one page per person) ▪▪ Methodology applied for the study (phases, methods of data collection, sampling, etc.) ▪▪ Logical framework matrices (original and improved/updated) ▪▪ List of persons and organisations consulted, literature and documentation other than technical annexes (e.g. statistical analyses) (EDRP, 2009)

As evaluation findings are often presented to different stakeholders, it may be advisable to develop a plan and schedule for conveying the needed reports to the different audiences, e.g., the client, the programme staff, a pertinent policy board, beneficiaries, and the general public (Stufflebeam, 2004:4). As such, Stufflebeam recommends “dividing final reports into three sub-reports: Programme Antecedents (for those who need background information), Programme Implementation (for those who might want to replicate the programme), and Programme Results (for all members of the audience)”. The resumés of the various evaluators, data collection methods and instruments, a log of data collection activities, data tables and interim reports, summary of costs of different evaluative activities, summary of problems that were encountered and

142

Chapter 4 addressed during the evaluation and a summary of the professionalism and standard of the evaluation are required (Stufflebeam, 2004:4). As with the other participatory approaches, the utilisation approach regards the evaluator not as a distant judge in this process, but rather a catalyst that brings together the viewpoints, interests and values of alternative users of the evaluation findings. Through a participatory approach, the group’s values (not the evaluator’s) determine the nature of recommendations arising from the evaluation (Stufflebeam & Shinkfield, 2007:434, 440). The process therefore consists of identifying stakeholders, obtaining commitment, involving stakeholders in data design and collection, judgement and dissemination of results and further decision making (Patton in Alkin & Christie, 2004:48). Potential stakeholders or users of the evaluation data that could participate in the evaluation process may be the policy makers and decision makers, programme sponsors (funders), evaluation sponsors (may be a programme sponsor), target participants, programme managers, programme staff, programme competitors (competing for same resources), contextual stakeholders in the environment and the evaluation and research community (credibility of evaluation or interest field) (Rossi, Lipsey & Freeman, 2004:48-49). Patton argues that, as evaluation cannot be value-free, “utilisation-focused evaluation answers the question of whose values will frame the evaluation by working with clearly identified, primary intended users who have the responsibility to apply evaluation finding and implement recommendations” (Patton, 2004:277). The utilisation-focused approached may be used as part of formative or summative, qualitative or quantitative, responsive, naturalistic or experimental evaluations. Thus, while the approach may incorporate any evaluation report, it tends to be more responsive and interactive rather than preordinate and independent (Stufflebeam & Shinkfield, 2007:439). Possible limitations of the approach include: • In cases of deep conflict and hostility between stakeholders, it may be impossible to reconcile political and ideological values (Rossi, Lipsey & Freeman, 2004:43). • Programme stakeholders owe primary allegiance to their own positions and political alignments, and may lead to a criticism of evaluation results, despite participation in the process (Rossi, Lipsey & Freeman, 2004:43). • Quality is sacrificed for usability – the evaluator should build the capacity of users to understand cost/accuracy trade-offs (Mouton, 2008). • The approach “blurs” the lines between the evaluator and programme staff by providing advice during the process and therefore there is less objectivity (Mouton, 2008). • Over-focus on the needs of immediate stakeholders (e.g. the evaluation contractors or the direct recipients) may distract from the needs from the distal taxpayer or policy maker (Weiss, 2004:188).

4.5

Evaluation design

The design of the evaluation describes the nature and process of data collection and analysis for the study. Advances in social research methods since the 1950s present the evaluation field with various options in designing studies to collect and analyse data that informs the evaluation process. Evaluation research studies may adopt either a quantitative, a

143

Evaluation Models, Theories and Paradigms qualitative or mixed-methods design, as the evaluator tries to find a workable balance between the emphasis placed on procedures that ensure the validity of findings and those that make findings timely, meaningful and useful to various users. Where that balancing point will be will depend on the purposes of the evaluation, the nature of the programme and the political or decision-making context (Rossi et al., 2004:25). Rossi refers to this as the “good-enough” rule, which entails choosing the best possible design, taking into account practicality and feasibility (paraphrased by Shadish, Cook & Leviton, 1991:377). While a particular evaluation approach such as the classic experimental study may be ideal, it may not be feasible. Lee Cronbach concluded in 1982 that “evaluation studies should be judged primarily by its contribution to public thinking and to the quality of service provided subsequent to the evaluation ... An evaluation should inform and improve the operations of the social system with timeous feedback (not necessarily perfect information)” (in Rossi et al., 2004:23‑24). Given the advantages and disadvantages of different approaches, the OECD argues for “the use of a plurality of approaches that are able to gain from the complementarities in the information they can provide” (OECD, 2007:24). Selected evaluation designs appropriate for evaluation studies are: • Quantitative evaluation designs, including the classic experimental design and quasiexperimental evaluation • Qualitative evaluation (non-experimental) designs, including surveys, case studies, interviews or participatory action research • Mixed-method evaluation designs

4.5.1 Quantitative approaches Quantitative designs are ideal when the study aims to answer “what happened” oriented questions. Quantitative designs may include either experimental or quasiexperimental designs. Experimental design David Campbell stated in 1969 that “policy and programme decisions should emerge from continual social experimentation that tests ways to improve social conditions … Social research [is] feasible to extend the experimental model to evaluation research to create an experimenting society”. He therefore advocated “for an experimental approach to social reform … in which we retain, imitate, modify, or discard social programmes on the basis of apparent effectiveness on the multiple imperfect criteria available” (Rossi, Lipsey & Freeman, 2004:23-24). When a clear statement of the programme objective to be evaluated has been explicated, the evaluation may be viewed as a study of change. The programme to be evaluated constitutes the causal or independent variable, and the desired change is similar to the effect or dependent variable … the project may be formulated in terms of a series of hypotheses that state that activities A, B and C will produce results X, Y and Z (Stufflebeam & Shinkfield, 2007:277,281). The classic experimental design entails the random assignment of subjects to treatment and non-treatment conditions, and the pre- and post- measurement of both groups. The impact of programmes is determined by comparing the outcomes of the groups to determine whether the intervention has produced the desired outcome (Mouton, 2007:495; OECD, 2007:22).

144

Chapter 4 Campbell’s commitment to the experimenting society opened up possibilities for scientific, rational experiments to address intangible social problems. However, in later years his approach was described as “utopian” and too narrow, opening the door to other approaches to conducting social research (Shadish & Luellen, 2004:83). Experimental design may still be regarded as the benchmark for evaluation studies as its systematic approach allows for the strongest possible causal connection between the treatment and the observed outcomes (Pierre, 2004:151). The complexity and cost of randomised trials warrant their use only for severe problems with multiple potential solutions, as well as commitment to use the evaluation findings (Boruch, 2004:119-120). Quasi-experimental evaluation Quasi-experimental evaluation has its roots in the experimental tradition of Natural and Social Sciences (including Psychology, Social Work and Sociology), but, given the problems with “randomly assigning participants to interventions in real life – as opposed to laboratory conditions”, the experimental tradition was extended to several quasi-experimental designs (Mouton, 2007:495). Lee Cronbach (in Rossi, Lipsey & Freeman, 2004:23-24) advocated in 1982 that “evaluation should be orientated toward meeting the needs of programme decision makers and stakeholders. Whereas scientific studies strive principally to meet research standards, evaluations should provide maximally useful information within the political circumstances, programme constraints and available resources”. The term “quasi-experimental” refers to approximations of randomised experiments (Campbell in Shadish, Cook & Leviton, 1991:120). Like their experimental counterparts, they entail comparisons across conditions and may include before-after or other comparisons. Although their control of internal validity is not as reliable as true experimental design, they nevertheless provide valuable answers to cause-and-effect questions (Mark & Henry, 2006:323). Factors that may undermine the validity of the quasi-experiment include historical or seasonal events that influence observed results, maturation of the subjects, the effect of the test or instruments used on the subject’s behaviour, attrition of subjects from the programme and statistical regression that would have occurred naturally without any intervention (Reichardt & Mark, 2004:128-129). Various quasi-experimental designs, which can more realistically be applied to evaluate programmes in real life emerged to complement true experimental design, including the pre-test- post-test non-equivalent comparison group design, pre-test- post-test no comparison group design, interrupted time-series designs, comparison group designs and regression-discontinuity design where the conditions for being part of the experimental group is known and therefore “controllable” (see Reichardt & Mark, 2004). Although the quasi-experimental design provides better control over the validity of findings, “it is also important to note that the quasi-experimental traditions – because of its emphasis on the logic of experimentation (cause and effect) – automatically ended up focusing exclusively on outcome or impact evaluations with little regard for process and implementation evaluation questions” (Mouton, 2007:495).

145

Evaluation Models, Theories and Paradigms Other useful quantitative research methods used by evaluators include: • Benchmarking • Backward and concept mapping • Correlation and regression analysis • Field experiments • Panel studies • Pre-post design • Standardised tests • Statistics • Surveys • Time series analysis and longitudinal studies

4.5.2 Qualitative approaches Qualitative approaches represent a move away from positivism in evaluation results. Developed as critique against the “black box” mentality of the quantitative approach, qualitative evaluation focuses on the constructed nature of social programmes, the contextuality of social interventions and importance of focusing on processes of implementation, in addition to assessing programme outcomes and effects (Mouton, 2008). In trying to address this mentality, Love (in Wholey, Hatry & Newcomer, 2004:66) presents a “transparent” box for evaluations, depicted in Figure 4.5. Figure 4.5: The transparent box paradigm (Love 2004:66)

Environment: Socio-economic conditions, community needs, stakeholder commitment, interest groups, regulations, public support, political support, media support

ORGANISATIONAL CONTEXT Management and Leadership

INPUT

Decision making

Communication and Coordination

Programme Technology: Clear outcomes Inspiring mission Valid theory Stable goals Relevant activities Adequate staffing Cohesive teams Financial resources

Evaluation feedback

OUTPUT

OUTCOMES

(Source: Love, 2004:66)

Decision makers, when presented with programme outcomes, are likely to want to know why the outcomes have realised and how performance can be improved, leading to “why” and “how” evaluation studies (Wholey, 2004:269). “Understanding the quality of the

146

Chapter 4 programme requires understanding programme activities in considerable detail. The measurement of outcomes and impact … is often simplistic and of low validity” (Cronbach interpreted by Stake, 2004:215). Qualitative evaluation is ideal when non-causal questions form the basis for the evaluation; when contextual knowledge, perspective and values of the evaluand are required before finalising the evaluation design; when the focus is on implementation rather than outcomes; when the purpose of the evaluation is formative; when it is important to study the intervention in its natural setting by means of unobtrusive measures (Pierre, 2004:151; Mouton, 2007:497). The OECD confirms that qualitative evaluation is “more likely to rely upon the opinions of programme stakeholders including managers and beneficiaries about the functioning and impact of the programme through techniques including surveys, case studies and peer reviews” (OECD, 2007:22). One example of this approach is rapid appraisals that comprise “a repertoire of informal techniques used to collect and analyse data about local situations and people”. To ensure validity, the approach relies on triangulation: information is cross-checked by talking to various stakeholders in the community, by using different data-gathering methods and by using a diverse research team (Dart in Mathison, 2005:357-358). Methods used to collect information include key informant interviews, focus group discussions, community group interviews, direct observations and mini-surveys (Kusek & Rist, 2004:15). Surveys Surveys are well suited for descriptive, explanatory and exploratory studies of large populations too large to observe directly. It involves the development, administration and analysis of questionnaires to a selected sample of the population. The approach is wellsuited for qualitative evaluation as participants’ viewpoints, opinions and observations are tested in the questions (Babbie & Mouton, 1998:232, 265). Case study evaluation The case study approach sees the evaluator analysing the goals, plans, resources, needs and problems of the case in its natural setting (as opposed to imposed experimental conditions) to prepare an in-depth report on the case, with descriptive and judgemental information, perceptions of various stakeholders and experts, and summary conclusions (Stufflebeam & Shinkfield, 2007:309-310). The approach blends particularly well with qualitative evaluation objectives, as well as with participatory and responsive evaluations. In the case study approach, the role of the evaluator is to: • Bind the case and conceptualise the object of study • Select phenomena, themes or issues (research questions) • Seek patterns of data to develop issues • Triangulate key observations and bases for interpretation • Select alternative interpretations to pursue • Develop assertions or generalisations about the case (Stufflebeam & Shinkfield, 2007:314-315).

147

Evaluation Models, Theories and Paradigms Success case method tries to establish the success of a particular intervention by identifying the best and worst programme participants through a survey. By using a process of selfreports and selective interviews, the evaluator tries to uncover what parts of the intervention the participants applied, how successful (or not) they were in these endeavours, the value of their successes or failures and the environmental factors that may have influenced their results positively or negatively. Comparison of the stories of successful and unsuccessful participants allows for the identification of several key factors that allowed successful participants to benefit from a particular intervention (Brinkerhoff in Mathison, 2005:401402; Rogers & Williams, 2006:88). Interviews Interviews may be either individually conducted or with focus groups. The process entails a conversation between the interviewer and interviewee(s) based on a broad series of questions or specific topics where individual perceptions and opinions are elicited without the restrictions of a questionnaire that only pursues the researcher’s preconceived options captured in the various questions and answers. In addition to the interviewees’ communicated perceptions and opinions, depth interviews also investigate how these perceptions or opinions were formed, whilst focus groups allow for the development of additional ideas and perceptions through the sharing of ideas in the group setting (see Babbie & Mouton, 1998:288-292). Participatory action research (PAR) Participatory action research combines the investigative research process with education of less powerful stakeholders and subsequent action on the research results. The cycle starts with observation and reflection, which leads to a plan of change to guide action. As a result of the action orientation, the approach is best suited to address action-oriented evaluation questions (Rogers & Williams, 2006:83, 84). Principles of the approach include: • The social, political, economic, cultural and spiritual context is important and should be  understood. • Who creates and controls the production of knowledge is essential. • Popular knowledge is as valid as scientific knowledge. • Research should be conducted in collaboration with participants with constant dialogue. • Critical reflection is an integral part of doing research. • Research needs to lead to actions and social transformation (Whitmore in Mathison, 2005:291). Participatory action research is an ideal data-gathering method with most of the participatory approaches to evaluation. Other useful qualitative research methods used by evaluators include: • Archives and document analysis • Checklists • Content analysis • Comparative or cross-case analysis • Delphi technique and expert opinions

148

Chapter 4 • • • • •

I ndividual interviews, group interviews and focus groups Narrative analysis and narrative storytelling Natural experiments Observation Unobtrusive measures and technology-aided methods

Table 4.6: Quantitative vs qualitative evaluation approaches Qualitative evaluation methodologies

Quantitative evaluation methodologies

Advantages

Disadvantages

Advantages

Disadvantages

Engages participants in policy learning

Respondents and interviewers may be biased or poorly informed

Clear answers on  impact

Cost of data collection and technical demands

Can vary the scale and  hence

Rarely provides a clear  answer

If well done will get close to true impact

Lacks information on context and mechanisms behind policy impacts

Deeper understanding of processes leading to impacts

Tends to “describe” rather than “evaluate”

Can be independently verified

Absence of pure control  groups

Should be easy to interpret

Risks including ”unrepresentative” groups

Possible false impression of precision

Can assess against a wide range of evaluation criteria

No opportunity for independent verification

Narrow focus on effectiveness and  efficientcy

Picks up unintended consequences

Hard to judge efficiency and  effectiveness

Difficult to use on indirect interventions that seek to influence the business  environment

Better understanding of polcy options and  alternatives

Hard to establish cause and  effect (Source: OECD, 2007:23)

4.5.3 Mixed-method approaches Given the various advantages and disadvantages associated with pure quantitative or pure qualitative designs (see Table 4.6), recent practice in evaluation research design favour “mixed-methods” designs. Whilst measured statistical outcomes might be the result of combined effects of factors, adding qualitative research methods to quantitative methods help to overcome the limitations of pure quantitative methods. Combining different designs and data sources allows for: • triangulation that tests the consistency of findings obtained through different instruments to ascertain multiple causes influencing results; • complementarity that clarifies and illustrates results from one method with the use of another method; • development or improvement of methods where one method shapes subsequent methods or steps in the research process;

149

Evaluation Models, Theories and Paradigms • initiation of new research questions or challenges where the results are obtained through one method by providing new insights on how the programme has been perceived and valued across sites; and • expansion of the richness and detail of the study, exploring specific features of each method (see Greene, Caracelli & Graham, 1989:259). The value of mixed-method designs that encourages the simultaneous adoption of multiple paradigms, methods and methodologies provide for a more holistic view that incorporates the perspectives of different stakeholders values, views and interests (Green, 2010). A mixed-method design becomes particularly critical given the ever-increasing complexities of development policies, programme and projects. Rugh & Bamberger (2012:4-6) conclude that mixed-method designs combine the strengths of qualitative and quantitative methods to present alternatives to traditional counterfactual designs when complexity inhibits the use of such designs (see Figure 4.6 and also Bamberger, Rugh & Mabry, 2012:4-7). Figure 4.6: Applying mixed methods to develop alternative counterfactual designs A. Theory-driven Aproaches • Logic models • Historical analysis • General elimination theory

B. Quantitative Approaches • Experimental and quasi-experimental designs • Pipeline design • Concept mapping • Statistical analysis of comparator countries • Citizen report cards and consumer surveys • Social network analysis

E. Rating Scales

Counterfactual designs for assessing the effects of development interventions • Attribution analysis • Contribution analysis • Substitution analysis

F. Techniques for strengthening counterfactual designs • Disaggregating complex programmes into evaluable components • Portfolio analysis • Reconstructing baseline data • Creative use of secondary data • Drawing on other studies • Triangulation

C. Qualitative Approaches • Realist evaulation • PRA techniques • Qualitative analysis of comparitor countries • Comparison with other sectors • Expert judgment • Key informants • Public sector comparisons • Public expenditure tracking

D. Mixed-method Designs

(Source: Rugh & Bamberger, 2012:6)

The adoption of a mixed-method design often assists in overcoming the constraints in a “real-world” evaluation. The RealWorld Evaluation Approach was developed to assist evaluators to conduct evaluations with budget, time, data and political constraints. It comprises seven steps, commencing with the planning and scoping of the evaluation in terms of the needs, the programme theory, constraints and the ideal design. The next four steps assist the evaluator to address budget, time, data and political constraints. Step 6 aims to strengthen the evaluation design and validity of conclusions by identifying and

150

Chapter 4 addressing potential threats. Finally, step 7 assists clients in using the evaluation through communication and building evaluation capacity (Bamberger, 2009:200,203). Budgetary, time, data and political constraints may compromise the viability of an “ideal” evaluation design, as well as the validity of the conclusions that may be drawn as a result of these influences. To overcome these limitations, a mixed-method design adds further data and perspectives, thereby overcoming the constraints and strengthening the validity of findings despite these constraints.

4.6

Implications for management practice

Identifying what needs to be evaluated requires a clear understanding of the problem area of interest and what hopes to be gained from the evaluation study. In social research, this is similar to the identification of the research problem, “a clear and unambiguous statement of the objective of the study (the unit of analysis) and the research objectives” (Mouton, 2001:48). The evaluation may be focused on improving or testing the conceptualisation or design of an intervention, on monitoring and improving the implementation of the intervention or on the assessment of the effectiveness or efficiency of the intervention in delivering its results (Babbie & Mouton, 1998:339-340). To ensure that the evaluation question focuses on the area of greatest concern to important stakeholders and decision makers, it is important to engage these stakeholders during the conceptualisation phase of the evaluation to help formulate appropriate questions (Rossi, Lipsey & Freeman, 2004:68-69). The formulated research questions may be empirical questions, such as exploratory questions, descriptive questions, causal questions, evaluative questions, predictive questions or historical questions or non-empirical questions such as meta-analytical questions, conceptual questions, theoretical questions or philosophical, normative questions (Mouton, 2001:53-55). Rossi, Lipsey and Freeman (2004:70-75) provide the following criteria for evaluation questions. Firstly, it is important that the formulated evaluation questions are appropriate and reasonable to the scope and context of intervention that is being evaluated. Furthermore, the evaluation question should be answerable, which means credible evidence that responds to the question should be realistically obtainable. Finally, the evaluation question should also include the required performance criteria that provide a basis for the determination of merit, success or failure. Typical evaluation questions that may inform evaluation studies are, for example: • What are the needs of the population? • What services should be provided? • Is the intervention properly conceptualised? • Is the intervention ready for implementation (feasibility exercise)? • Is the intervention being implemented according to design? • Are the intended services being delivered to the intended persons? • Do all members of the target group (intended beneficiaries) receive the intervention? • Are the intended immediate outcomes being realised? • Do the services have beneficial or adverse effects on the recipients?

151

Evaluation Models, Theories and Paradigms • Is the cost reasonable in relation to the magnitude of the benefits? (See Wildschut, 2004:5-6; and Rossi Lipsey & Freeman, 2004:77-78). The evaluation question(s) will determine the appropriate quantitative or qualitative data-gathering techniques, which will inform the design of the study in addition to the stated goals of the evaluation. As the different approaches emphasise different aspects of the evaluand, it can be argued that a combination of approaches will provide “richer” evaluation data through a multifaceted evaluation focus. However, each additional approach implies added resources to bring it to fruition. It is the evaluator’s task to select the right balance of approaches to ensure the most accurate evaluation results within the limited resources available.

4.7

Case: Do Managers Use Evaluation Reports? A Case Study of a Process Evaluation for a Grant-Making Organisation1 Asgar Bhikoo and Joha Louw-Potgieter

Programme evaluation aims to bring about programme improvement by assessing how a programme was implemented and whether the implementation has led to the desired outcomes. Often evaluators invest much time in designing and implementing an evaluation that will yield credible results, but the findings are not always used by stakeholders for programme improvement. In order to obtain better usage of programme evaluation results, evaluators need to design and present their evaluations so that programme staff and stakeholders understand their programmes better and are able to appreciate the implications of the evaluation results. The aim of this case study is to illustrate whether an organisation which awards grants to non-governmental organisations (NGOs) implemented the recommendations of an earlier process evaluation of its capacity-building programme (CBP). The CBP is intended for NGOs that have previously received grants from the grant-making organisation and also for NGOs that were unsuccessful in their grant applications. The programme focuses on building capacity within these NGOs in order to attract further funding or first-time funding. The programme activities consist mainly of training modules to improve the NGOs’ organisational, financial and social sustainability. Once sustainability has been attained, the NGOs are assisted to submit improved grant applications. In 2006, a consulting firm conducted a process evaluation of the CBP and made the following recommendations for improvement (See Table 4.1 for details): 1. Obtain Sector Education and Training Accreditation (SETA) for the programme 2. Improve the training manual 3. Improve training delivery 4. Formalise internal processes 1 Bhikoo and Louw-Potgieter (2013).

152

Chapter 4 5. Improve marketing 6. Improve communication 7. Monitor and evaluate the CBP on an on-going basis In 2010, the evaluators of this case study assessed whether these improvements were implemented by the grant-making organisation. The evaluators used the recommendations contained in the 2006 process evaluation to design a 22-item questionnaire to measure improvements made to the programme (see Table 4.7). The response categories of these items were based on Brinkerhoff’s (2005) Success Case Method (tried this and had clear and positive results; tried this but had no clear results yet; tried this somewhat but do not expect any results; tried this and it backfired; have not tried this at all). The questionnaire also comprised open-ended questions which assessed whether other improvements (apart from those suggested in the process evaluation) were made. The programme manager completed the questionnaire and his responses were verified against existing programme  records. A second questionnaire was completed by the programme manager and the evaluator of the 2006 process evaluation. This questionnaire assessed evaluation use from the perspectives of these two role players. The questionnaire contained 25 items with a five-point Likertscale response format (1 = strongly disagree; 5 = strongly agree) (see Table 4.8). The results of the first questionnaire showed that the programme manager indicated that 17 out of the 22 improvements were implemented. The evaluators verified this against existing programme records. The results and verification are reflected in Table 4.7. Results from the open-ended section of the questionnaire showed that five additional improvements (not recommended in the process evaluation) were made. These were: a name change for the CBP to Basics in Organisational Development, improved application and enrolment forms and the inclusion of three new courses (two in leadership and one in volunteer management). The results of the second questionnaire showed that there was a significant difference between the perceptions of the programme manager and the evaluator of the 2006 process evaluation regarding evaluation usage. These results are shown in Table 4.8. Mean score of programme manager = 3.84; mean score of evaluator = 4.04; t = 3.76; p = .047; p < .05 (two-tailed). The evaluator had significantly more positive perceptions of usage than the programme manager. This is understandable, as the evaluator was the author of the 2006 process evaluation and self-interest might have played a part in these results. On the other hand, despite this difference, it should be kept in mind that both the evaluator and the programme manager scored toward the high end of the 5-point scale, indicating positive perceptions of the use of the 2006 process evaluation.

153

Evaluation Models, Theories and Paradigms Table 4.7: Programme Manager’s Perceptions of Recommendations Implemented plus Verifications Recommendations

Perception

Document evidence

Business plan

No

-

Training venues comply

Yes

Strategy goal 5

Facilitators accredited

Yes

Accreditation certificates

SETA accreditation obtained

No

-

Page numbers in manual

Yes

New manual

Content page in manual

Yes

Agenda 2010

Glossary in Afrikaans and Xhosa

Yes

Agenda 2010

More examples and case studies in manual

Yes

Agenda 2010

More examples and case studies in training

Yes

Agenda 2010

More interactive training

Yes

Agenda 2010

More rural NGOs received funding

Yes

Newsletter

Referral to other service providers

No

-

More capacity programmes included

Yes

Training calendar

Increased training in rural areas

Yes

Annual report 2009/10

Selection criteria for participants

No

-

Integrated database to track attendance

Yes

MS Access printout

Yes

2010 Pamphlet

Interdepartmental meetings and reports

Yes

Meeting agendas 2009

Improved statistics monitoring

Yes

MS Excel spreadsheet

Improved networking opportunities created

Yes

Training calendar

Shared services with similar organisations

Yes

No evidence

Continuous external monitoring and evaluation

No

-

Total recommendations implemented

17 Yes

5 No

Seta accreditation

Course manual

Training delivery

Formalising internal processes

Marketing Marketing strategy changed Communication

Monitoring and evaluation

154

Chapter 4 Table 4.8: Evaluation Usages Perceptions of Programme Manager and 2006 Evaluator as Measured by Evaluation Use Questionnaire Questionnaire items 1.

The 2006 process evaluation was tailored to the needs of the grant-making organisation.

2.

The programme staff of the grant-making organisation was included in the 2006 evaluation process.

3.

The improvements suggested in the 2006 process evaluation were relevant to the grantmaking  organisation.

4.

The evaluation findings and improvements were communicated in a manner that was easy to understand.

5.

The results of the evaluation were readily accepted by the staff of the grant-making organisation.

6.

The evaluation report was delivered at the appropriate time for the grant-making organisation to make appropriate changes to their programme.

7.

The evaluator respected the programme staff of the grant-making organisation.

8.

Utilisation of evaluation findings were included as part of the evaluation design for the 2006 evaluation.

9.

The evaluation changed how the programme staff thought about the implementation of the capacitybuilding programme.

10.

The evaluation changed the state of functioning of the grant-making organisation.

11.

The 2006 process evaluation helped programme staff understand their programme better.

12.

By allowing their programme to be evaluated, the grant-making organisation increased the likelihood that organisational development would occur.

13.

The 2006 process evaluation was conducted to demonstrate that those involved in the delivery of the capacity building programme were accountable for it.

14.

The 2006 process evaluation was conducted to improve the service delivery and implementation of the overall capacity-building programme.

15.

Evaluating the capacity-building programme has allowed the grant-making organisation to improve its data-collection processes.

16.

The interaction between the programme staff has changed as a result of the 2006 process evaluation.

17.

By allowing the capacity building programme to be evaluated, the grant-making organisation has encouraged similar service providers to think differently about their respective programmes.

18.

Programme staff has developed the necessary skills to continue with on-going monitoring and evaluation of the 2006 process evaluation.

19.

Funders have become more interested in the implementation of the capacity-building programme as a result of the 2006 process evaluation.

20.

The evaluator and programme staff were continuously liaising and communicating with each other during the evaluation process.

21.

Enough post-evaluation technical support was provided to the programme managers and staff to ensure that the improvements suggested would be implemented.

22.

The improvements suggested could be implemented realistically in a short space of time after the  evaluation.

23.

The grant-making organisation’s strategic interests were taken into account when devising the evaluation  improvements.

24.

The grant-making organisation’s functional capacity was taken into account when devising the evaluation  improvements.

25.

Changing how the programme staff have thought about the evaluation has resulted in positive changes to the implementation of the capacity-building programme.

155

Evaluation Models, Theories and Paradigms The results raise the following questions of interest: 1. Why have we found such high usage of evaluation recommendations in the case of the grant-making organisation? 2. What can evaluators do to improve evaluation usage? 3. What can managers who commission evaluations do to ensure usable evaluations? Each one of these questions will be answered in more detail below.

4.7.1 Uncharacteristically high usage of the 2006 process evaluation The majority of improvements (17 out of 22) contained in the 2006 process evaluation was implemented by 2010. These results are not consistent with what the literature on evaluation use suggests. Evaluators are on record regarding poor use of evaluation results (Christie, 2007; Cousins, 2003; Cousins & Leithwood, 1986; Henry & Mark, 2003; Johnson, Greenseid, Toal, Kin, Lawrenz & Volkov, 2009; Kirkhart, 2000; Leviton & Hughes, 1981; Patton, 1997; Mark & Henry, 2004; Preskill & Torres, 2000; Weiss, 1998). There are a number of possible explanations for the high usage of the 2006 process evaluation results. Firstly, the recommended improvements involved easily implementable process changes and not fundamental changes to programme theory, outcomes or impact (see Alkin and Taut (2003) on this issue). Were these programme aspects involved, it is doubtful whether the same usage results would have been obtained. Secondly, programme staff were included during the 2006 evaluation process and informed about the objectives of the evaluation. Their knowledge of a process evaluation and how it applied to their programme increased during the evaluation. These factors are important as they encourage evaluation use (Rossi, Lipsey & Freeman, 2004). Thirdly, the evaluation was commissioned by and tailored for the funders of the CBP. Funders are generally influential in bringing about positive changes in programme implementation and in this instance it seems to have been the case.

4.7.2 What can evaluators do to improve evaluation usage? Rossi et al. (2004) indicated that an evaluation has three utilisations, namely direct, conceptual and persuasive. Direct utilisation refers to how programme staff use evaluation findings and recommendations to make changes to programme implementation. Patton (1997), in his utilisation-focused approach to evaluation, agrees that using evaluation findings for programme improvement involves making direct changes to programme implementation. Henry and Mark (2004) use the idea of individual influence of evaluation results and mention skills acquisition and behaviour change of programme staff as two such individual influence mechanisms. These two mechanisms refer to how programme delivery is changed as a result of the evaluation findings. According to Rossi et al. (2004), conceptual utilisation refers to programme staff using evaluation findings to think about their programme differently than prior to the evaluation. Patton (1997) calls this generating knowledge, while Kirkhart uses the concept, direct intentional influence, to explain the extent to which evaluation findings change the mind set of programme staff and stakeholders regarding future implementation and outcomes.

156

Chapter 4 Henry and Mark (2003) indicate that elaboration, an individual level influence mechanism, is concerned with how programme staff henceforth think about implementation and achieving results in their particular programme. The third type of utilisation, namely persuasive utilisation, refers to how evaluation findings are used to challenge the current state of political affairs regarding social policy (Rossi et al., 2004). Kirkhart’s (2000) notion of unintentional influence refers to evaluation findings changing the way in which stakeholders outside the programme (such as government and regulatory bodies) think about a programme and how this change in thinking could result in policy change. For Henry and Mark (2003) policy change, a collective level type of influence, refers to evaluation findings being used to influence policy makers’ attitudes towards a particular social problem. Through existing information and evaluation findings, policy makers are then able to make a better assessment of the state of affairs of a particular social problem and address it accordingly by means of a change in policy. Apart from these three utilisations, programme evaluators agree that involvement of programme staff in the evaluation process is crucial for usage of evaluation results. Weiss (1998:30) indicated that the best way to encourage evaluation use is to involve potential users “in defining and interpreting results and through reporting the results to them”. Johnson et al. (2009) added stakeholder involvement as a new category for use to their two other categories, namely evaluation implementation and policy setting. In summary, programme evaluators can insert this knowledge of evaluation usage throughout the evaluation process by alerting stakeholders to programme improvements, knowledge acquisition and application of this new knowledge and evaluation results that contradict policy. Thus usage becomes part of the debate between evaluators and programme stakeholders while the evaluation is being carried out.

4.7.3 What can managers, who commission evaluations, do to ensure usable evaluations? While sufficient literature exists on how evaluators should ensure useful evaluations, little attention has been paid to how managers or other agents who commission evaluations can improve the use of evaluation results. The current evaluators attempted to put together the following suggestions: 1. Clarify the purpose of the evaluation. Are the evaluation results intended for programme improvement (formative evaluation), judgements regarding the continuation of the programme (summative evaluation) or programme accountability (how resources were used during programme implementation)? 2. Clarify the level of evaluation that needs to be used. Is this a process/implementation evaluation (an evaluation which assesses whether the programme has been implemented as intended) or an outcome evaluation (an evaluation which judges whether the programme has brought about the intended change)? 3. Have realistic expectations about the level of evaluation and the length of time that the programme has been running. Process/implementation evaluation can be commissioned for programmes that run one full cycle or more than one cycle. Outcome

157

Evaluation Models, Theories and Paradigms evaluations require that sufficient time has passed for changes in participant behaviour to have occurred. Quite often, this means measuring change 6-18 months after the programme has been implemented. Impact evaluations that deal with the effect of the specific programme on distal or long-term outcomes require even more time than outcome evaluations. 4. Ascertain that sufficient monitoring data exist. For outcome and impact evaluations, measures of outcome indicators before and after the programme are required. A programme without sufficient data will not result in a useful evaluation. 5. Require that programme staff are involved in the process of the evaluation. This is a time consuming and challenging process as programme staff may not have the necessary evaluation knowledge and skills or may be reluctant to expose “their” programme to evaluation. Sufficient time should be allowed to foster programme staff’s commitment to and participation in the evaluation.

4.7.4 Case Conclusions This case study of an evaluation-of-an-evaluation has provided a new direction in assessing the use of evaluation results in South Africa. It is it the first of its kind and offers a framework in which to conduct similar studies in future. It also presents an opportunity for other evaluators to use this approach in order to follow up on evaluation usage. Furthermore, it provides practical guidelines for commissioning managers to ensure evaluation usage and thus establish a culture of evaluation within an organisation. Finally, the current evaluators would like to suggest adding the following evaluation question for others who wish to explore evaluation usage and its results. In the case of the grant-making organisation the question would be: Did implementation of the 2006 recommendations lead to improved organisational development and service delivery of the grant-making organisation?

4.8 Conclusions Due to the complexity of evaluation studies in practice, studies do not take “one” approach to evaluation. The three categories of evaluation approaches are not mutually exclusive, in the same sense that approaches within the three categories are not competing, but rather complementary to each other. The approach or combination of approaches most suited for a particular evaluation study will be determined by the specific evaluation question and objectives adopted for the study. Dahler-Larsen views the diversity in approaches as an asset, as it sparks constant debate and new practices with regards to new and old problems (2006:157). It also reinforces the holistic complexity of the social phenomena that we try to understand, and the fact that our current measuring instruments are still primitive and only able to provide us with approximations of the real nature of these phenomena. In order to get the most accurate perspective of the policy we are evaluating, it is necessary to consider and apply different approaches. Thus, an outcome evaluation study at local government level may take a participatory approach to clarify the multiple aims and intended uses of the evaluation results, followed by a more theory-driven approach in the

158

Chapter 4 summative evaluation to determine whether the predetermined goals were reached, as well as identifying potential unintended consequences. From the different perspectives, the evaluation design provides the “blueprint” of the study which defines the nature of the study in terms of being empirical or non-empirical, drawing on existing or primary data, using numeric or textual data and the degree of structure and control that the researcher has in the study (Mouton, 2001:146). The evaluation design should aim to ensure both internal and external validity. Internal validity assures that there is a causal relationship between the intervention and the observed outcome and that no other plausible alternative could have caused the effect. To eliminate other potential contributing factors requires a combination of experimental designs, inferential statistics, empirical observations and substantive theory. External validity tests the stability of the cause-effect relationship across persons, settings, times and implementation styles and mediums (Cook, 2004:88-89). The ideal design achieves both high scientific credibility (the extent to which the evaluation is guided by scientific principles) and high stakeholder credibility (the extent to which stakeholders perceive the evaluation to incorporate their views, concerns and needs) (Chen, 2004:134-135). To achieve this balance, a helpful strategy is “to pursue stakeholder credibility in the earliest phases of evaluation design but to yield to scientific principles later in the process” (Chen (1990) in Chen, 2004:135). The next chapter of this book provides a detailed discussion of evaluation designs, whilst Ammons (2002), Mathison (2005), Mouton (2001), Wildschut (2004), Save the Children (1995), McClintock (2003), IFAD (2002) and Wholey, Hatry and Newcomer (2004) also provide useful discussions in the evaluation context.

References Alkin, M.C. (ed.). 2004. Evaluation Roots: Tracing Theorists’ Views and Influences. Thousand Oaks, CA: Sage. Alkin, M.C. (ed.). 2013. Evaluation Roots: A Wider Perspective of Theorists’ Views and Influences. 2nd Edition. London: Sage. Alkin, M.C. & Christie, A.C. 2004. An Evaluation Theory Tree, in Alkin, M.C. (ed.). Evaluation Roots: Tracing Theorists’ Views and Influences. Thousand Oaks, CA: Sage. Alkin, M.C. & Taut, S.M. 2003. Unbundling evaluation use. Studies in Educational Evaluation, 29: 1-12. Ammons, D. 2002. Performance Measurement and Managerial Thinking. Public Performance and Management Review, 25(4): 344-347. Babbie, E. & Mouton, J. 1998. The practice of social research. Oxford: Oxford University Press. Bamberger, M. 2009. RealWorld Evaluation: conducting evaluations under budget, time, data and political constraints, in Segone, M. Country-led monitoring and evaluation systems. Better evidence, better policies, better development results. UNICEF Evaluation Working Papers. Geneva: UNICEF. Bamberger, M., Rugh, J. & Mabry, L. 2012. RealWorld Evaluation: Working Under Budget, Time, Data, and Political Constraints. Second Edition. London: Sage Publications. Becker, H. 2003. Theory formation and application in social impact assessment, in Becker, H.A. & Vanclay, F. (eds.). The international handbook of social impact assessment: Conceptual and methodological Advances. Cheltenham, U.K.: Edward Elgar. Bhikoo, A. & Louw-Potgieter, J. 2013. Do Managers Use Evaluation Reports? A Case Study of a Process Evaluation for a Grant-Making Organisation. Case Study commissioned for this book. Boruch, R.F. 2004. A trialist’s notes on evaluation theory and roots, in Alkin, C.M. (ed.). Evaluation roots, tracing theorists’ views and influences. Thousand Oaks CA: Sage.

159

Evaluation Models, Theories and Paradigms Brinkerhoff, R.O. 2005. The Success Case Method: A strategic evaluation approach to increasing the value and effect of training. Advances in Developing Human Resources, 7: 86-101. Campbell, D.T. 1969. Reforms as experiments. American Psychologist, 24: 409-429. Cardin, F. & Alkin, M.C. 2012. Evaluation Roots: An International Perspective. Journal of Multidisciplinary Evaluation, 8(17): 102-118. Chen, H. 2004. The roots of theory-driven evaluation: Current views and origins, in Alkin, C.M. (ed.). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage. Chen, H. 2005. Practical Program Evaluation: Assessing and Improving Planning, Implementation and Effectiveness. London: Sage Publications. Chilisa, B. 2012. Indigenous Research Methodologies. Berkeley: Sage. Chilisa, B. & Malunga, C. 2012. Made in Africa Evaluation: Uncovering African Roots in Evaluation Theory and Practice, in African Thought Leaders Forum on Evaluation for Development Expanding Thought Leadership in Africa. The Bellagio Centre, 14-17 November 2012. Chilisa, B., Mertens, D. & Cram, F. 2013. Indigenous Pathways into Social Research. Walnut Creek, CA: Left Coast Press. Christie, C.A. 2007. Reported influence of evaluation data on decision-makers’ actions: An empirical examination. American Journal of Evaluation, 28: 8-25. Cloete, F. 2008. Evidence-based policy analysis in South Africa: Critical assessment of the emerging government-wide monitoring and evaluation system. Paper presented at the 2008 SAAPAM conference on Consolidating State Capacity, 29-30 October 2008 in Bloemfontein, South Africa. Cook, T.D. 2004. Causal generalization: How Campbell and Cronbach influenced my theoretical thinking on this topic, in Alkin, C.M. (ed.). 2004. Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage. Cousins, J.B. 2003. Utilization effects of participatory evaluation, in Kelleghan, T. & Stuafflebeam, D.L. (eds.). International handbook of educational evaluation. Dordrecht: Kluwer Academic Publishers: 245-265. Cousins, J.B., & Leithwood, K.A. 1986. Current empirical research on evaluation utilization. Review of Educational Research, 56: 331-364. Cronbach, L. and Associates. 1981. Towards Reform of Program Evaluation. San Francisco: Jossey-Bass. Dahler-Larsen, P. 2006. Evaluation after Disenchantment: Five Issues Shaping the Role of Evaluation in Society, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. Dahler-Larsen, P. 2012. Evaluation as a situational or a universal good? Why evaluability assessment for evaluation systems is a good idea, what it might look like in practice, and why it is not fashionable, Scandinavian Journal of Public Administration, 16(3): 29-46. De Coning, C., Cloete, F. & Wissink, H. 2011. Theories and Models for Analysing Public Policy, in Cloete, F. & De Coning, C. (eds.). Improving Public Policy: Theory, Practice & Results. Pretoria: Van Schaik: 32-65. DPME. 2011. National Evaluation Policy Framework. Department: Performance Monitoring and Evaluation. Pretoria: Government Publications. Donaldson, S.I. & Lipsey, M.W. 2006. Roles for Theory in Contemporary Evaluation Practice: Developing Practical Knowledge, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. EDRP. 2009. Ex-detainees Rehabilitation Program. 2009. www.edrp.gov.ps/Evaluation_Files/ Tendering/6-Evaluation%20Report%20Structure.pdf. Retrieved: 21 April 2014. Fetterman, D.M. 1996. Empowerment Evaluation: An Introduction to Theory and Practice, in Fetterman, D.M., Kaftarian, S.J., & Wandersman, A. (eds.). Empowerment Evaluation: Knowledge and Tools for Self-Assessment & Accountability. Thousand Oaks, CA: Sage: 3-27. Fetterman, D.M. 2004. Branching Out or Standing on a Limb: Looking to Our Roots for Insight, in Alkin, C.M. (ed.). Evaluation Roots, Tracing Theorists’ Views and Influences. Sage Publications,  California.

160

Chapter 4 Fitzpatrick, J.L., Sanders, J.R. & Worthen, B.R. 2004. Program evaluation: Alternative approaches and practical guidelines. 3rd ed. Boston: Pearson. Funnell, S.C. & Rogers, P.J. 2011. Purposeful Program Theory: Effective use of Theories of Change and Logic Models. San Francisco: Jossey Bass. Greene, J. 2006. Evaluation, democracy, and social change, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). The Sage Handbook of Evaluation. London: Sage. Greene, J. 2010. Keynote Address at the 2010 Conference of the European Evaluation Society in  Prague. Greene, J.C., Caracelli, V.J. & Graham, W.F. 1989. Toward a Conceptual Framework for MixedMethod Evaluation Designs. Educational Evaluation and Policy Analysis, 11(3): 255-274. Guba, E.G. & Lincoln, Y.S. 1989. Fourth Generation Evaluation. Newbury Park, CA: Sage Publications. Guba, E.G. & Lincoln, Y.S. 2001. Guidelines and checklist for Constructivist (a.k.a. Fourth generation) evaluation. http://www.wmich.edu/evalctr/checklists/checklistmenu.htm#models. Retrieved: 22 April 2014. Henry, G.T. & Mark, M.M. 2003. Beyond use: Understanding evaluation’s influence on attitudes and actions. American Journal of Evaluation, 24: 293-313. House, E.R. 1993. Professional evaluation: Social impact and political consequences. Thousand Oaks, CA: Sage House, E.R. 2004. Intellectual History on Evaluation, in Alkin, C.M. (ed.). 2004. Evaluation Roots, Tracing Theorists’ Views and Influences. Sage Publications, California. House, E.R. & Howe, K.R. 2000. Deliberative democratic evaluation checklist. http://www.wmich.edu/ evalctr/checklists/checklistmenu.htm#models. Retrieved: 28 April 2009. IFAD. 2002. Managing for impact in rural development: A guide for Project M&E. International Fund for agricultural development. www.ifad.org. Retrieved: 1 April 2014. Johnson, K., Greenseid, L.O., Toal, S.A., King, J.A., Lawrenz, F., & Volkov, B. 2009. Research on evaluation use: A review of the empirical literature from 1986 to 2005. American Journal of Evaluation, 30: 377-410. Kee, J.E. 2004. Cost-effectiveness and cost-benefit analysis, in Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). Handbook of practical program valuation. 2nd edition. San Francisco: Jossey-Bass, John  Wiley. Kellogg Foundation. 2004. Using Logic Models to Bring Together Planning, Evaluation, and Action. Logic Model Development Guide. http://www.wkkf.org/resource-directory/resource/2006/02/ wk-kellogg-foundation-logic-model-development-guide. Retrieved: 21 April 2014. Kirkhart, K.E. 2000. Reconceptualising evaluation use: An integrated theory of influence. New Directions for Evaluation, 88: 5-23. Kusek, J.Z. & Rist, R.C. 2004. Ten Steps to a Results-Based Monitoring and Evaluation System. Washington D.C: The World Bank. Leviton, L.C., & Hughes, E.F.X. 1981. Research on the utilization of evaluation: A review and a synthesis. Evaluation Review, 5: 525-549. Lincoln, Y.S. & Guba, E.G. 2004. The Roots of Fourth Generation Evaluation: Theoretical and Methodological Origins, in Alkin, C.M. (ed.). 2004. Evaluation Roots, Tracing Theorists’ Views and Influences. Sage Publications, California. Love, A. 2004. Implementation Evaluation, in Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). 2004. Handbook of practical program evaluation. 2nd edition. San Francisco: Jossey-Bass, John Wiley. 63-97. Malunga, C. 2009a. Making Strategic Plans Work: Insights from Indigenous African Wisdom. London: Adonis & Abbey Publishers. Malunga, C. 2009b. Understanding Organisational Leadership through Ubuntu. London: Adonis & Abbey  Publishers. Malunga, C. & Banda, C. 2011. Understanding Organisational Sustainability through African Proverbs: Insights for Leaders and Facilitators. Rugby, UK: Practical Action Publishers.

161

Evaluation Models, Theories and Paradigms Mark, M.M., Greene, J.C. & Shaw, I.F. 2006. Introduction: The evaluation of policies, programs and practices, In Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. Mark, M.M., & Henry, G.T. 2004. The mechanisms and outcomes of evaluation influence. Evaluation, 10: 35-57. Mark, M.M. & Henry, G.T. 2006. Methods for Policy-Making and Knowledge Development Evaluations, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. Mark, M.M. & Shotland, R.L. 1985. Stakeholder-based evaluation and value judgments. Evaluation Review, 9: 605-626. Mathison, S. (ed.). 2005. Encyclopedia of Evaluation. Sage Publications. California. McClintock, C. 2004. Using Narrative Methods to Link Program Evaluation and Organizational Development. The Evaluation Exchange, IX(4): 14-15. McLaughlin, J.A. & Jordan, G.B. 2004. Using logic models, in Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). Handbook of practical program evaluation. 2nd edition. San Francisco: Jossey-Bass, John  Wiley. Mertens, D.M. 1999. Inclusive evaluation: Implications of transformative theory for evaluation. American Journal of Evaluation, 20(1): 1-14. Mertens, D.M. 2004. Research and Evaluation in Education and Psychology: Integrating Diversity with Quantitative, Qualitative and Mixed Methods. London: SAGE Publications. Mertens, D.M. 2009. Transformative Research and Evaluation. New York: Guilford Press. Mouton, J. 2001. How to succeed in your Master’s and Doctoral Studies. A South African guide and resource book. Pretoria: Van Schaik. Mouton, J. 2007. Approaches to programme evaluation research. Journal of Public Administration. Vol. 42(6): 490-511. Mouton, J. 2008. Class and slide notes from the ‘Advanced Evaluation Course’ presented by the Evaluation Research Agency in Rondebosch, 20-24 October 2008. Nagel, S.S. (ed.). 2002. Handbook of public policy evaluation. Thousand Oaks CA: Sage. Naidoo, I.A. 2007. Unpublished research proposal submitted to the Graduate School of Public and Development Management, University of Witwatersrand. OECD. 2007. OECD Framework for the Evaluation of SME and Entrepreneurship Policies and Programmes. Paris, France: OECD. Owen, J.M. 2006. Program Evaluation: Forms and Approaches. 3rd edition. The Guilford Press. New  York. Patton, M.Q. 1980. Qualitative evaluation methods. London: Sage Patton, M.Q. 1987. Creative evaluation, 2nd edition. London: Sage. Patton, M.Q. 1997. Utilisation-focused evaluation: The new century text. Thousand Oaks, CA: Sage Publications Inc. Patton, M.Q. 2004. The roots of utilization-focused evaluation, in Alkin, C.M. (ed.). Evaluation roots, tracing theorists’ views and influences. Thousand Oaks CA: Sage. Patton, M.Q. 2008. Utilization-focused evaluation. 4th edition. Thousand Oaks CA: Sage. Patton, M.Q. 2011. Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use. New York: Guilford Press. Pawson, R. 2006. Evidence-Based Policy: A Realist Perspective. London: Sage. Pawson, R. & Tilley, N. 1997. Realistic Evaluation. Berkeley: Sage. Pierre, R.G. 2004. Using Randomized Experiments, in Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). Handbook of Practical Program Evaluation. 2nd edition. San Francisco:.Jossey-Bass, John Wiley & Sons Inc. Posavac, E.J. & Carey, R.G. 1997. Program evaluation. Methods and Case Studies. 5th edition. New Jersey: Prentice Hall. Presidency. 2003. Towards a Ten Year Review. Synthesis Report on Implementation of Government Programmes. www.info.gov.za/otherdocs/2003/10year.pdf. Retrieved: 30 January 2014.

162

Chapter 4 Presidency. 2008. The Fifteen Year Review. Pretoria: The Presidency. http://www.thepresidency.gov. za/pebble.asp?relid=535 . Retrieved: 22 April 2014. Presidency. 2014. The 20 Year Review. Pretoria: The Presidency. http://www.thepresidency-dpme. gov.za/news/Pages/20-Year-Review.aspx. Retrieved: 15 April 2014. Preskill, H. 2004. The transformational power of evaluation: Passion, purpose and practice, in Alkin, C.M. (ed.). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage. Thousand Oaks CA. Preskill, H.S. & Torres, R.T. 2000. Learning dimensions of evaluation use. New Directions for Evaluation, 88: 25-38. Rakoena, T. Project implementation planning, monitoring and evaluation. Presentation distributed via the SAMEA discussion forum. www.samea.org.za. Retrieved: 14 August 2007. Reichardt, C.S. & Mark, M.M. 2004. Quasi-Experimentation in Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). Handbook of Practical Program Evaluation. 2nd edition. San Francisco: Jossey-Bass, John Wiley & Sons Inc. Rogers, P.J. & Williams, B. 2006. Evaluation for Practice Improvement and Organizational Learning, in Shaw, I.F., Greene, J.C. & Mark, M.M. (eds.). 2006. The Sage Handbook of Evaluation. London: Sage Publications. Rossi, P.H., Lipsey, M.W. & Freeman, H.E. 2004. Evaluation. A Systematic Approach. Seventh Edition. London: Sage Publications. Rugh, J. & Bamberger, M. 2012. The Challenges of Evaluating Complex, Multi-component Programs. In European Evaluation Society Newsletter: Evaluation Connections. May 2012: 4-7. Save the Children. 1995. Toolkits: A practical guide to assessment, monitoring, review and evaluation. London: Save the Children. Scriven, M. 1967. The methodology of evaluation, in Tyler, R.W., Gagné, R.M. & Scriven, M. (eds.). Perspectives of curriculum evaluation, Chicago, IL: Rand McNally. 39-83. Scriven, M. 1974. Evaluation perspectives and procedures, in Popham, W.J. (ed.). Evaluation in education. Berkeley: McCutchan. Shadish, W.R., Cook, T.D. & Leviton, L.C. 1991. Foundations of Program Evaluation. Theories of Practice. Sage Publications. Shadish, W.R. & Luellen, J.K. 2004. Donald Campbell: The accidental evaluator, in Alkin, C.M. (ed.). 2004. Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage Shafritz, J.M. 2004. The Dictionary of Public Policy and Administration. Boulder, Co: Westview Press. Stake, R. 2004. Stake and Responsive Evaluation in Alkin, C.M. (ed.). Evaluation Roots, Tracing Theorists’ Views and Influences. Sage Publications. Stockmann, R. (ed.). 2011. A Practitioner Handbook on Evaluation. Cheltenham: Edward Elgar. Stufflebeam, D.L. 1971. The Use of Experimental Design in Educational Evaluation. Journal of Educational Measurement, 8(4): 267-274. Stufflebeam, D.L. 2004. The 21st-Century CIPP Model: Origins, Development, and Use, in Alkin, M.C. (ed.). Evaluation roots. Thousand Oaks: Sage. 245-266. Stufflebeam, D.L. & Shinkfield, A.J. 2007. Evaluation Theory, Models & Applications. San Francisco: Jossey-Bass. Turianskyi, Y. 2013. The African Peer Review Mechanism Ten Years On: How Can It Be Strengthened? SAIIA Policy Note 2. http://www.saiia.org.za/doc_download/459-policy-note-02-the-africanpeer-review-mechanism-ten-years-on-how-can-it-be-strengthened. Retrieved: 30 January 2014. Valadez, J. & Bamberger, M. 1994. Monitoring and Evaluating Social Programs in Developing Countries. A Handbook for Policymakers, Managers and Researchers. EDI Development Studies. Washington: World Bank. Vanclay, F. 2003. Conceptual and methodological advances in social impact assessment, in Becker, H.A. & Vanclay, F. (eds.). 2003. The International Handbook of Social Impact Assessment. Conceptual and Methodological Advances. Cheltenham, UK: Edward Elgar. Weiss, C. 1977. Research for Policy’s Sake: The Enlightenment Function of Social Science Research. Policy Analysis, 3(4): 531-545.

163

Evaluation Models, Theories and Paradigms Weiss, C. 1991. Policy research as advocacy: Pro and con. Knowledge & Policy, 4 (1/2): 37-56. Weiss, C.H. 1998. Evaluation. Methods for studying programs and policies. 2nd edition. New Jersey: Prentice Hall. Weiss, C.H. 2004. Rooting for evaluation: A cliff notes version of my work, in Alkin, C.M. (ed.). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks CA: Sage. Wholey, J.S. 1987. Evaluability Assessment: Developing Program Theory, in Bickman, L. (ed.). Using Program Theory in Evaluation. New Directions for Program Evaluation: 33. Wholey, J.S. 2004. Using Evaluation to Improve Performance and Support Policy Decision Making, in Alkin, C.M. (ed.). 2004. Evaluation Roots, Tracing Theorists’ Views and Influences. Sage Publications. Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). 1994. Handbook of practical programme evaluation. San Francisco, CA: Jossey-Bass. Wholey, J.S., Hatry, H.P. & Newcomer, K.E. (eds.). 2004. Handbook of practical programme evaluation. San Francisco: Jossey-Bass. Wildschut, L. 2004. Workshop 10: Guide to Contracting Evaluation. Workshop presented at the Third Conference of the African Evaluation Association, Cape Town, South Africa. 29 November 2004.

164

Chapter 5 Programme Evaluation Designs and Methods Johann Mouton 5.1 Introduction: Evaluation design and evaluation methodology Programme evaluation studies are a distinctive type of applied social science. One can typically distinguish between various types of social research such as surveys, case studies, comparative studies, ethnographic studies, participatory action studies, experimental studies, secondary data analysis, textual and discourse analytics studies and so on. Each of these design types or types of social research has its own distinctive logic and methodological configuration (Babbie & Mouton, 2001). This section will show that the same applies to programme evaluation studies where the same research designs and methods are used, with the only important difference, that the purpose of the evaluation is sometimes different from that of the academic research project. Academic research is normally interested in gaining new knowledge about or insights into social phenomena, explaining or predicting social phenomena in terms of existing models, theories or paradigms, testing an existing model, theory or paradigm, or developing new ones. Evaluations may also have these aims, but are usually more focussed on policies, programmes or project interventions by an organisation to attempt to improve an existing or potential problem. They can also frequently have additional aims like empowering the stakeholders in such an intervention or doing it in a participatory, democratic way, etc. Table 5.1 summarises the main differences: Table 5.1: Evaluation and Research Attributes Evaluation

Research

Utility – intended for use: improvement of  programmes

Academic – production of knowledge to clarify, understand and even predict

Programme-derived questions

Research-derived questions

Judgemental nature: measuring what is, against what should normatively be

Analytical nature: determining what is and assessing the  implications

Action setting: programme crucial

Research setting: design and methodology requirements  crucial

Potentially conflicting impartial evaluation and judgemental roles

Normative judgement mostly absent, reducing potential role conflicts

Often not published and reported to nonresearch  audience

Published and disseminated to research community

Accountable to funders, policy makers, managers, practitioners and programme participants

Accountable to funders and scholarly principles and codes of conduct for generating knowledge (Source: Rabie & Burger, 2014)

165

Programme Evaluation Designs and Methods One of the standard definitions of evaluation research is found in the classic text by Rossi, Freeman and Lipsey (2004), where they define evaluation research as “the systematic application of social research procedures for assessing the conceptualization, design, implementation and utility of social intervention programmes”. This definition emphasises three key aspects of evaluation research: • That we systematically apply the whole range of social research methods and procedures (methods of sampling, measurement, data-collection and data-analysis) when we conduct evaluation studies. • That we always make value judgments (assessments) when we do evaluation research. • That we evaluate various dimensions or aspects (the conceptualisation, the design, the implementation or delivery and the utility (effectiveness, efficiency and sustainability) of social intervention programmes. Box 5.1: Why genuine evaluation must be value-based

“One thing that makes something genuinely an e-valu-ation is that it involves asking and answering questions about quality or value. For example: • It is not just measuring outcomes; it’s saying how substantial, how valuable, how equitable those outcomes are. • It’s not just reporting on implementation fidelity (did it follow the plan?); it’s saying how well, how effectively, how appropriate the implementation was. • It’s not just reporting whether the project was delivered within budget; it’s asking how reasonable the cost was, how cost-effective it was, and so forth.

Who actually identifies and applies ‘values’ in an evaluation? Are these tasks always done by ‘a’ person, ‘the’ evaluator? Of course not. In a participatory, empowerment, or other collaborative evaluation, the quality and value questions are asked and answered, in the end, by those who are participating in running the evaluation. Hopefully this is done with the guidance of an evaluator who uses his or her knowledge of evaluation logic and methodology to help the participants a. identify what the ‘valuable outcomes’ and dimensions of ‘high quality programming’ are or might be; b. c. d. e.

develop robust definitions of ‘how good is good’ on those criteria; gather a good mix of evidence; interpret that evidence relative to the definitions of ‘how good is good’; and bring all the findings together to consider the ‘big picture’ evaluation questions.” (Extract from Davidson, 2010)

The distinctive nature of evaluation studies, then, derives from the unique focus of such studies: to make value judgments about the quality, worth or value of intervention programmes. This will be referred to here as the distinctive “logic of evaluation studies”. The term “logic” emphasises the fact that in evaluation research we employ rules of reasoning that are different from other forms of social research. The “logic of evaluation studies” is clearly illustrated in Box 5.1 where Davidson (2010) argues convincingly that all evaluation work involves making value-judgments whether the evaluator assesses the implementation or delivery of a programme, or whether one evaluates the outcomes and ultimate impact of a programme.

166

Chapter 5 Making a value-judgment presupposes that one has an explicit or implicit criterion or standard against which to judge something. In everyday life we make value-judgments when we evaluate cars (in terms of criteria such as “fuel efficiency” or “safety” or “comfort”) or we evaluate homes (in terms of “geographical location” or “affordability”). Even socalled “goal-free” evaluations that use an inductive, exploratory research design instead of a deductive theory-driven one, assesses in the end to what extent the intervention had any value or merit, and for that purpose standardised criteria for judging the value or merit of the intervention have to exist. In evaluation studies we utilise a wide range of “standards” or “criteria” of worth or utility when evaluating interventions. Following from the Rossi, Freeman and Lipsey definition, we compiled Table 5.2 to illustrate how different evaluation criteria are applied when evaluating different aspects of intervention  programmes. .

Table 5.2: Intervention dimensions and evaluation criteria Intervention dimension

Evaluation criteria

Conceptualisation

• • •

Design

• •

Implementation (Delivery)

• • • •

Outcome and impact

• • • •

Clarity of programme goals and objectives Logical consistency between programme objectives and programme activities (horizontal alignment in the logic model) Logical coherence of different levels of programme activities, outputs and outcomes (vertical alignment in the logic model) Appropriateness of the intervention given the target group needs and  expectations Feasibility of the design (given resource constraints) Appropriateness of implementation (Was the programme implemented as designed and planned?) Adequate coverage of the programme (Did all the intended beneficiaries receive the intervention?) Sufficient dosage (Did the intended beneficiaries receive the minimum “dosage” of the intervention as intended?) Standardisation of delivery (Was the programme implemented in the same way across multiple sites?) Effectiveness of programme (Were the desired outcomes achieved?) Impact of the programme (Was the desired and expected impact achieved?) (Cost-)Efficiency of outcomes (Value for money criterion) Sustainability of programme impact

It is important to emphasise that our discussion thus far has focused exclusively on the underlying logic or principles of reasoning that characterises evaluation studies. This is the “logic of evaluation design”. “Evaluation methods”, on the other hand, refers to the execution of an evaluation design in a specific study. The term “methodology” always refers to the “how” of a study: how we go about selecting the cases or “instances” of the programme to be evaluated, how we design instruments to measure aspects of programme implementation and outcome, how we collect data about the programme and how we analyse the data collected in support of our final conclusions. Whereas “evaluation design” refers to the principles of reasoning when designing an evaluation study, the term “evaluation methodology” refers to the actual methods employed in the process of conducting an evaluation study. We usually distinguish between four categories of evaluation methodology (these generic applied research methods are not dealt with in more detail in this book):

167

Programme Evaluation Designs and Methods • Methods of case selection or sampling (How we select the cases (actual interventions) to be evaluated) • Methods of measurement construction (How we construct the instruments and measures used in the measurement of the programme) • Methods of data collection (The methods employed in collecting data about the intervention including observational methods, self-reporting methods [interviews and questionnaires] and documentary analysis methods) • Methods of data analysis (Statistical and qualitative data-analysis methods) It is also important to note that the terms “evaluation” and “assessment” are synonyms, but that assessment has in some sectors historically been linked to specific types of evaluative activities, like “social impact assessment” or “environmental impact assessment”. These “assessments” are nothing else than “evaluations” that have developed specific approaches and methodologies over time in specific sectors. In the following section we look more closely at the concrete steps that are followed in designing evaluation studies.

5.2

The logic and steps in designing evaluation studies

I have elsewhere argued that all research involves a clear sequence of logical steps when designing a particular study (Babbie & Mouton, 2001). This applies to evaluation studies as well. All research commences with a deliberate decision to undertake a study. In the field of programme evaluation such decisions could in principle be taken by the evaluator (selfinitiated), but it is more likely to be the result of a commission or contract. The vast majority of evaluation studies – certainly on the African continent – have their origins in commissions (requests for evaluations) by national and local governments, NGOs, international and national development agencies and so on. The fact that many evaluation studies are in fact commissioned studies is quite important for any discussion on evaluation design, as such commissions usually already contain a fairly detailed “brief” or “terms of reference” (TOR) that embody the expectations of the commissioning agency about the type of study to be conducted. Evaluation TORs very often already indicate and even specify what kind of evaluation study is required and which evaluation questions should be answered. Having said this, it is still important for the evaluator to work through the logic of evaluation design (see Figure 5.1). This entails three further main phases: • Statement of the evaluation problem • Selecting or deciding upon the most appropriate type of evaluation design • Implementing the design, i.e. making the appropriate methodological decisions

168

Chapter 5 Figure 5.1: The logic of evaluation design

Decision to undertake evaluation study

Self-initiated Commissioned brief

Statement of evaluation problem

Identify unit of evaluation (evaluand) Define evaluation purposes/aims

Select appropriate evaluation design

Clarificatory evaluation, process evaluation, outcome and impact evaluations

Implement design: Evaluation methodology

Sampling, instrumentation, data collection, analysis, reporting

Defining the evaluation problem involves two subsidiary questions: • Identifying the evaluand (which intervention/programme must be evaluated?). This question also involves being clear about the intended beneficiaries of a programme. • What is the purpose of the evaluation? Various typologies have been developed to distinguish between the different purposes of evaluation studies. In the following section, each of these issues is discussed and elaborated on in more detail. Before any decision is taken about the most appropriate design and subsequent methodological decisions, the evaluator has to address two prior questions: • What exactly is the object of the evaluation? • What is the purpose (or sometimes purposes) of the evaluation?

5.2.1 Identifying the evaluand (unit of evaluation) in distinction from the “unit of observation” In social research methodology it is commonplace to distinguish between the “unit of analysis” and the “unit[s] of observation” in a study. This is a common distinction that all research students are taught. The “unit of analysis” is defined as the “object” or “phenomenon” that we wish to analyse or understand through our study (for example – “poverty in a rural area”). The “unit[s] of observation” refer to the different sources of data that we need to mobilise in order to analyse and understand the unit of analysis

169

Programme Evaluation Designs and Methods (for example – documentary sources, interviews with key informants, site visits and observations, tests and scales, existing records and so on). A similar distinction holds in the field of programme evaluation research. We distinguish between the “unit of evaluation” (the programme or project or intervention) that we wish to evaluate or assess and the “units of observation” (the sources of information and data about the intervention that we need to mobilise in order to come to a final judgment on the  intervention). The programme (at whatever level of aggregation, e.g. sub-programmes, projects) is the evaluand (a term introduced by Michael Scriven as a short-hand term for the “unit of evaluation”). The aim is to assess its design, delivery, implementation, deliverables, outcomes and impact. We gather evidence (from various sources) in order to come to a rigorous and systematic assessment of these dimensions of the intervention. Therefore, our final conclusions are always about the programme (as the unit of evaluation). We may find that the programme was not properly implemented or delivered. We may conclude that the programme outcomes did not materialise or were only partially achieved. We may find that there was some programme impact but that this impact disappeared over time. In short: the programme is our unit of evaluation – not the programme staff nor the intended beneficiaries nor other stakeholders (such as the funders). But in order to make robust and systematic judgments about the programme, we have to gather data from various sources, the so-called “observation units”. In programme evaluation studies, it is recognised that we typically employ a whole host of qualitative and quantitative methods of data collection in order to gather data and information from programme staff, the beneficiaries and all relevant stakeholders. It is also important to distinguish between the sources of information (units of observation) and the methods of collecting data or information from these sources. This distinction is illustrated in Figure 5.2. Figure 5.2: On methods and sources for data collection in evaluation studies

Data collection methods

We collect data by interviewing people (using individual and group interview techniques) AND by observing people in situ

We capture and analyse data from documentary and statistical sources

170

SOURCE of evidence

People/groups DOING things/ Activities (training/cataloguing/ designing courses/managing)

Documentary sources • Reports/books/brochures • Manuals/training resources • Personal documents (journals) • Procurement records/visual records Statistical/data sources • User statistics/attendance records • Loan records • Access/download statistics

Chapter 5 Although it may seem straightforward, identifying and being clear about the “unit of evaluation” is not always easy. Interventions differ in many ways: aims and objectives, mechanisms of change, target groups, duration, and so on. More specifically programmes differ from each other in the following ways: 1. Interventions are often layered (the problem of selecting the right level of the intervention). E.g. educational programmes have different components operating at different levels (district/school/classroom/individual teacher). 2. Interventions are organisationally and institutionally embedded (the problem of disentangling the effects of the organisational context from programme effects). 3. Interventions often change goals and objectives (and even target groups) over the course of its life cycle – the issue of goal drift. This means that the timing of the evaluation is crucial. Whether the evaluator gets involved early in the life cycle of the programme or mid term, has huge design implications. 4. The duration and frequency of interventions typically differ. So the evaluator needs to ask questions such as: What is the typical life cycle of the programme and is it a once-off implementation or a recurring programme that is implemented quarterly or annually over an extended period of time? 5. Interventions may also differ in terms of the number (and levels) of target groups: Interventions can have a single target group (such as a safe-sex programme that is aimed at changing the behaviour of adolescents) or have multiple target groups (such as school interventions that target the school management team, teachers and learners). 6. Interventions also differ in terms of scope and specifically whether it is a single-site or multi-site intervention. Multi-site interventions (such as programmes that are implemented in many schools, towns, organisations) are more complex and require a more complex evaluation design. Given the history of South Africa (and other countries on the African continent) many current interventions are transformational in purpose, which imply large scale organisational and institutional change (Mouton, 2009). Equally important are the capacity-building aims of these interventions (such as whole school development, strengthening rural health care, reducing poverty and so on). These programmes are typically large in scope (multi-site interventions) and implemented over longer time frames (3-5 years), and often may have multiple target groups. The African context within which these evaluations take place further have to take indigenous values and practices into account in order to maximise the relevance, validity and policy applications of the evaluation findings. These are all considerations that evaluators have to take into account when designing an evaluation study.

5.2.2 The purposes of evaluation Social interventions (such as programmes, policies, new systems, schemes) are evaluated for a number of different reasons. Since the advent of programme evaluation research, evaluations have been commissioned for purposes of programme management, programme improvement and refinement, financial accountability, in response to public demand, to meet accreditation requirements, for purposes of quality assurance and control and various other reasons.

171

Programme Evaluation Designs and Methods One of the first and most influential distinctions pertaining to evaluation purpose was introduced by Michael Scriven when he distinguished in 1980 between formative and summative evaluations (Scriven, 1991:168, 340, 1981:6-7): “Evaluations may be done to provide feedback to people who are trying to improve something (formative evaluation), or to provide information for decision-makers who are wondering whether to fund, terminate or purchase something (summative evaluation)”. Robert Stake’s often-quoted metaphor to explain the difference between summative and formative evaluations captures the distinction well: “When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative!” (Scriven, 1991:169). About the same time, Michael Patton proposed a more elaborate distinction. He argued that the different evaluation purposes that have been discussed in the literature can be reduced to three: to make judgements of merit or worth, to improve programmes and to generate knowledge. Table 5.3 summarises these three main classes of evaluation purpose (with examples): Table 5.3: Three primary uses/purposes of evaluation studies Uses or purposes

Examples

Judge merit or worth

▪▪ Summative evaluation ▪▪ Accountability ▪▪ Audits ▪▪ Quality control ▪▪ Cost-benefit decisions ▪▪ Deciding a programme’s future ▪▪ Accreditation/licensing

Improve programmes

▪▪ Formative evaluation ▪▪ Identifying strengths and weaknesses ▪▪ Quality enhancement ▪▪ Managing more effectively ▪▪ Adopting a model locally

Generate knowledge

▪▪ Generalisations about effectiveness ▪▪ Extrapolating principles about what works ▪▪ Building new theories and models ▪▪ Informing policy (Source: Patton, 1997:76)

Judgement-oriented evaluations Evaluations which are aimed at establishing the intrinsic value, merit or worth of a programme are judgement-oriented. This is arguably the most often-cited reason for undertaking an evaluation. Typically, this involves questions such as the following: Was the programme successful? Did it achieve its objectives? Was it effective? Did the programme attain its goals? Was the intended target group reached? Did the intended beneficiaries receive the intervention in the most effective and efficient manner? Another indication that some form of a judgement is called for, are the many references to “accountability”, “compliance” and “audits”. Summative evaluations judge the overall effectiveness of a programme and are particularly important when a decision has to be made about the continued funding of a programme.

172

Chapter 5 According to Patton (1997:68) judgement-oriented evaluations typically follow a fourstage (deductive) pattern: 1. Select criteria of merit or worth http://www.uwex.edu/ces/lmcourse/interface/ coop_M1_Overview.htm 2. Set standards of performance (e.g. outcome measures) http://www.uwex.edu/ces/ lmcourse/interface/coop_M1_Overview.htm 3. Measure performance (often quantitatively) 4. Synthesise results into a judgement of value In contrast, improvement-oriented (developmental) evaluations use more inductive strategies in which criteria are less formal as one searches for whatever areas of strength (or weakness) may emerge from a detailed study of the programme. Improvement-oriented evaluations Evaluations that have an improvement purpose ask different questions: What are the programme’s strengths and weaknesses? Has the programme been properly implemented? What constraints are there on proper implementation? Are the programme recipients responding positively to the intervention? If not, why not? Improvement-oriented evaluations typically involve collecting data for specific periods of time during the start-up phases of interventions (or at least early on in implementation), in order to make suggestions about improvement, to solve unanticipated problems and to make sure that participants are making the required progress towards the desired outcomes. Improvement-oriented evaluation usually, then, utilises information systems to monitor programmes, to track implementation and to provide regular feedback to programme managers. Knowledge-oriented evaluation Both judgement and improvement-oriented evaluations are driven by concerns for use and application. In both cases, the end result is some decision or action: whether to terminate a programme or cease funding (judgement) or to improve and fine tune a programme (improvement). But evaluations are also undertaken to improve our understanding of how programmes work and how people change their attitudes and behaviours because of successful interventions. In cases such as these, the purpose of an evaluation is to generate new knowledge. This knowledge can be very specific, e.g. in clarifying a programme model or underlying theory, distinguishing between types of intervention or elaborating policy options. In other cases, the knowledge-oriented evaluation might have more general aims in mind, such as seeking to understand the programme better, reducing uncertainty and risk of failure, enlightening funders and other stakeholders, and so on. The distinction between “judgement-oriented”, “improvement-oriented” and “knowledgegenerating” by Michael Patton is, however, not unproblematic. It seems rather obvious that the latter two must also involve some form of judgement on the part of the evaluator. In order to make decisions about improving interventions, it would be necessary that the evaluator (and other stakeholders) have reached some agreement (and hence judgement) on what counts as “strong”, “weak” or “improved” interventions. Similarly, knowledgeoriented purposes presuppose that the evaluator has made judgments about what the current state of knowledge is in a particular field, whether our understanding of the

173

Programme Evaluation Designs and Methods underlying change mechanisms is adequate and so on. In retrospect, Patton may have made a stronger case had he adapted the term “summative” (from Scriven, 1991:168) to describe the first purpose. This would result in a typology where it is acknowledged that evaluators make different kinds of judgement towards different kinds of purpose (summative, improvement-oriented or formative and knowledge-generation). In my own work I have developed a different typology which identifies four main types of “evaluation purposes”. The typology consists of four quadrants based on cross-tabulating two dimensions: the Learning-Accountability continuum versus the Formative-Summative continuum. One of the values of this typology is that each resultant quadrant presents an “ideal” evaluation purpose as well as the most likely stakeholders who would be interested in studies that prioritise that evaluation purpose. • The Learning-Formative quadrant contains studies where the purpose is improvementoriented (formative) and aimed at learning from the evaluation. The most likely audience for studies that fall into this category are programme staff, although funders of such programmes would conceivably also be interested in learning from such studies. • The Formative-Accountability quadrant contains studies – mostly monitoring and midterm evaluation studies where the intent is both formative as well as to account for funds received. • The Summative-Accountability quadrant includes studies that have as their primary purpose to inform a decision to continue or terminate a programme. The primary stakeholders for such studies are programme funders and the boards of directors/ trustees that they are accountable to. • The Summative-Learning quadrant includes studies that are equally decision-oriented (the summative dimension), but now aimed at informing the scholarly community about the most appropriate models and mechanisms of intervention change (the learning dimension). Figure 5.3: A typology of evaluation purpose

LEARNING A. Programme monitoring and Process evaulation for the sake of improving programmes (Programme staff/director)

C. Evaluation studies for the sake of knowledge creation as part of contributing to scholarship (Scholars)

FORMATIVE

SUMMATIVE

B. Monitoring and Evaluation Studies for the sake of quality enhancements (External regulatory agencies/ Compliance requirements)

D. Evaluation (ROI’s/value for money/cost-benefit) studies to inform decision-making - Continued funding/upscaling (Funders/Trustees)

ACCOUNTABILITY

174

Chapter 5 In summary, clarification of the evaluation “problem” means getting clarity about the exact object (unit of evaluation) and the primary purpose of the intended evaluation. As far as the former challenge is concerned, various aspects of the intervention (scope, duration, timing, embeddedness, frequency of implementation) need to be considered when constructing the evaluation design. As far as the latter is concerned, it is quite obvious that the evaluation purpose will have a direct bearing on what type of evaluation needs to be designed. In the next section, we move from how we formulate the statement of the evaluation problem to the actual decisions about the most appropriate evaluation design.

5.3

A decision framework for designing evaluation studies

Evaluations that assess the conceptualisation, design and implementation of interventions usually have an improvement (formative) and learning intent. This makes sense as we must aim to improve our interventions by learning from evaluation studies that critically “interrogate” the conceptualisation and design of the intervention (so-called clarificatory evaluations) and from studies that conduct rigorous and systematic assessments of the implementation and delivery of programmes. Evaluations that assess the outcomes and ultimate impact of interventions have a predominant (but not exclusively so) summative and accountability purpose. Again this is logical as it means that we want to base any decision on the future of the programme – whether it is continued or terminated, whether it is expanded or not – on evidence that there have been positive or negative benefits and impacts from the programme (see also Morrel, 2010). Based on this discussion, we present a decision framework for the selection of evaluation designs as below (Figure 5.4) which distinguishes between four main types of evaluation studies: clarificatory evaluations, process (or implementation) evaluations, outcome evaluations and impact evaluations. Figure 5.4: A decision-framework for selecting an evaluation design

WHAT IS THE PURPOSE OF THE EVALUATION?

IMPROVEMENT-ORIENTED

Clarificatory evaluation

Process evaluation Programme monitoring

JUDGEMENT-ORIENTED

Outcome evaluation

Case study designs

Impact assessment

True experimental designs

Quasi-experimental designs

Time-series and multiple time series designs Non-equivalent comparison group designs

175

Programme Evaluation Designs and Methods These main types of studies (and their sub-types still to be discussed) ask very different evaluation questions: • Clarificatory evaluation – Use the programme logic model framework to establish whether programme goals and objectives are well formulated, whether programme activities and outputs are clearly specified and whether expected outcomes and associated indicators are specified and theories of change, logic models or methodologies can be regarded as appropriate or valid. • Process or implementation evaluation – Direct questions at the implementation and delivery of programmes (Is programme delivery scheduled as designed? Are programme activities being implemented properly and how are these received and experienced by the target group?). As the focus is on the implementation or delivery of the intervention as it happens, it implies that process-evaluations happen in real time and are therefore ongoing. A subtype of process evaluation studies are programme (and performance) monitoring studies where questions are about standard and repeated delivery of activities and outputs. • Outcome evaluation – Have the anticipated programme outcomes (benefits) been achieved? To what extent have such outcomes been achieved across different target groups and different sites? Are there unintended outcomes (positive or negative) that occurred? • Impact evaluation (assessment) – Has the intervention made the anticipated impact and can such an impact (positive changes) unequivocally be attributed to the intervention? Are there unintended impacts (positive or negative) that occurred? Before discussing each of these designs in more detail, we can now expand our initial table (Table 5.4) and include each of the main evaluation design types into it. Table 5.4: Intervention dimensions, evaluation criteria and evaluation design types Intervention Dimension

Conceptualisation

Evaluation criteria Clarity of programme goals and objectives Logical consistency between programme objectives and programme activities (horizontal alignment in the logic model) Logical coherence of different levels of programme activities, outputs and outcomes (vertical alignment in the logic model)

Evaluation design type

Clarificatory evaluations

Design

Appropriateness of the intervention given the target group needs and expectations Feasibility of the design (given resource constraints)

Implementation (Delivery)

Appropriateness of implementation (Was the programme implemented as designed and planned?) Adequate coverage of the programme (Did all the intended beneficiaries receive the intervention?) Sufficient dosage (Did the intended beneficiaries receive the minimum “dosage” of the intervention as intended?) Standardisation of delivery – Was the programme implemented in the same way across multiple sites?

Process evaluations (including programme monitoring studies)

Outcome

Effectiveness of programme (Were the desired outcomes achieved? Were there unanticipated outcomes?) (Cost-)Efficiency of outcomes (Value for money criterion)

Outcome evaluations

Impact

Impact of the programme (Was the desired and expected impact achieved? Were there unanticipated impacts?) Sustainability of programme impact

Impact evaluations

176

Chapter 5

5.4

Clarificatory evaluation studies1

“Clarificatory Evaluation” is a relatively new term in evaluation circles. As far as we know, the term “Clarificative Evaluation” was first used by John Owen and Patricia Rogers in their book Program Evaluation: Forms and Approaches (1999). The term refers to a set of activities which are commonly carried out by evaluators and which focuses primarily on  understanding: • the problem to be addressed by a programme; • the needs of the target group who are experiencing a problem; • the context in which a programme will function; and • the conceptualisation or design of a programme. “Clarificatory Evaluation” is defined as referring to the evaluation activities concerned with trying to understand a programme (its goals, objectives, key activities, expected outcomes) and the problem/s it is addressing (the nature, scope, incidence, prevalence and a clear indication of the target group that the programme will focus on). Clarificatory Evaluation could be carried out as part of a comprehensive evaluation study or as a “once off”, when the evaluator is trying to understand the problem and programme that has been developed to address this problem. A needs assessment, feasibility study (to assess whether the necessary conducive environment for the intervention exists) or models mapping programme logic or theory are usually key activities in this type of evaluation. The process of Clarificatory Evaluation usually begins with a close inspection by the evaluator of all available programme documents – project proposals, funding proposals, promotional brochures, strategic and business plans and minutes of project meetings. This may lead to a variety of data-collection methods, depending on the type of evaluation activities required. Although Clarificatory Evaluation is most useful to programmes (and programme staff) if it is carried out prior to implementation of the programme, very often these activities are carried out after the implementation has begun or even when the programme has finished. Table 5.5 below indicates 4 key evaluation activities that form part of Clarificatory  Evaluation. Table 5.5: Key evaluation activities that form part of Clarificatory Evaluation Key evaluation activities for Clarificatory Evaluation Undertake a feasibility study to assess whether the necessary conductive environment for the intervention exists. Evaluability assessments are carried out prior to commencing an evaluation to establish whether a programme can be evaluated and what the barriers might be to the useful uptake of evaluation findings. A range of data-collection methods can be used to determine whether the programme has been implemented well enough to have a chance of producing its intended outcomes. Conduct a needs assessment study using a range of data-collection methods. Use the theory of change framework as unpacked in a programme logic or logframe format1 to establish whether programme goals and objectives are well formulated, whether programme activities and outputs are clearly specific and whether expected outcomes and associated indicators are specified.

1 This section is sourced from notes produced by Lauren Wildschut.

177

Programme Evaluation Designs and Methods Two evaluation activities can assist evaluators in developing a clear understanding of the programme – developing a programme theory of the programme and developing a logic model of the programme. Both these activities involve making the implicit explicit and articulating often undocumented assumptions. Let us first examine the idea of a programme theory change.

5.4.1 Understanding the Programme Theory of Change Every programme is based on some conception or set of beliefs of what must be done to bring about the intended social benefits. That set of beliefs is referred to as “programme theory of change”. It may be expressed in a detailed programme plan and rationale or be implicit in the programme’s structure and activities. When the programme theory is not articulated or documented in any way, the theory is regarded as “tacit” or “implicit” (Weiss, 1999). If this is the case, the evaluator must engage in a series of activities to make the theory explicit. This is usually done through: • reviewing programme documentation • interviewing/meetings with staff and key stakeholders • site visits and observations The role of a theory of change in evaluation has been summarised and assessed in more detail in Chapter 3. The development of a programme theory of change is an interactive process that should be explicitly undertaken in the original project proposal, but is very often begun by the evaluator who draws up a first draft of the theory and presents it to the organisation for discussion. This is done on an ongoing basis until the organisation feels that the articulation of the theory approximates their set of beliefs. The articulation of the theory could be in a narrative or graphic form as explained in Chapter 3. In summary, the end result of a systematic and intensive clarificatory evaluation process is a shared understanding – among programme staff, evaluators and other stakeholders – of the core logic of the intervention and specifically its underlying theory of change.

5.5 Process/implementation evaluation studies Whereas clarificatory evaluation studies are a type of evaluation study that ideally occur preimplementation, process or implementation evaluations are conducted once an intervention is being implemented. Process evaluation addresses issues such as the following: (1) the extent to which a programme is reaching the appropriate target population; (2) whether or not its service delivery is consistent with programme design; and (3) what resources are being expended (Rossi, Freeman & Lipsey, 2004). More specifically, questions asked during process evaluations are: • Are the activities being done as planned? • Do the key actors have a clear understanding of what is required of them? • Is there sufficient capacity and resources (time/funding/equipment) to implement the key activities? • Are the key activities being properly managed and are all (support) systems working  well?

178

Chapter 5 • Are there particular problems being encountered or specific barriers experienced? • If so, what do the actors identify as possible solutions to overcome such barriers? • How are the activities/interventions being received in terms of content, quality,  relevance? Coverage and bias: Because implementation evaluation deals with system delivery and questions whether the intervention is reaching the intended target population, one of the key issues in such studies is that of coverage. Coverage refers to the extent to which participation by the target population achieves the levels specified in the programme design. The opposite of (proper) coverage, is bias. Bias refers to the fact that certain subgroups in the target population – for whatever reason – did not receive the same levels of the intervention than other groups. Rossi, Freeman and Lipsey (2004:188-191) identify three reasons why programmes fail to deliver the services as specified: first, no treatment or not enough treatment; second, the wrong treatment is delivered; and third, the treatment is unstandardised, uncontrolled or varies across target populations. The first category of non-treatment refers to the fact that a programme or some of the components of a programme never reach the intended beneficiaries. This could be simply because of poor service delivery. It could also be because members of target groups decide to withdraw themselves or not participate in a programme. A related problem is when service delivery falls short and the “treatment”’ is “diluted”. This happens quite often as interventions are often delivered by service providers where the desired number of training sessions, site visits, workshops and other intervention modes are not achieved which means that the target group does not receive the full “dosage” of the intervention as designed and required. The second category of programme failure – wrong treatment – can occur in two ways. First, the way in which the intervention is delivered may in fact negate the treatment. The second way is that the treatment requires a delivery system that is too sophisticated. The example that Rossi, Freeman and Lipsey provide of the latter, is the following. They write that “interventions that work well in the hands of highly motivated and trained deliverers may end up as failures when administered by staff of a mass delivery system whose training and motivation are considerably less” (2004:190). Unstandardised treatment or uncontrolled treatment implementation arises when too much discretion in implementation exists, especially in multi-site interventions. It is also very common when interventions have to be delivered by a variety of service providers across different sites. This often leads to huge variations in programme management and system delivery across sites and may compromise the idea of a standardised and common implementation. Most classic texts on process evaluation (e.g. Rossi , Freeman & Lipsey, 2004) identify three main sources of data for process evaluations. These are records, observations and self- reports. • Records include all forms of service documentation. Such records can be in narrative form (such as field reports) or highly structured data (when project personnel check

179

Programme Evaluation Designs and Methods that services have been delivered). Morris and Fitz-Gibbon (1978), who have the most comprehensive discussion of methods for measuring programme implementation, list more than 30 types of records kept as part of programme management. These include: attendance and enrolment logs, logbooks, in-house memos, flyers and promotional brochures on programmes, legal documents, transcripts and minutes of meetings, activity rosters, and so on. • Observational methods are usually employed to include more systematic forms of observation (which would include using observation schedules) as well as more unstructured forms of participant observation. The latter method has evolved in ethnographic studies and most methodology books have good discussions on participant observation. • Self-report measures refer to all forms of interviewing people involved in a programme. Interviews should be conducted with project staff, as well as with all other stakeholders, e.g. board members, service deliverers, target-group members, community representatives, and so on. Standard protocols about interviewing procedures apply and again, most methodology books have adequate discussions on the principles of good interviewing. Process or implementation evaluations focus on the implementation or delivery of programmes. The focus, therefore, is on HOW the programme is being implemented at a particular site or sites and on HOW the programme is delivered to the target group. Given that process evaluations aim to understand the key activities, processes and events in programme delivery and whether these are being implemented as designed, it is not surprising that evaluators use a variety of methods of data collection to gather information on programme implementation. But how does one structure a process evaluation? What framework could one apply in structuring the key questions that one wishes to answer? We suggest that there are at least TWO “approaches” typically used in process evaluations. Although this is an important distinction – as we will clarify – we also do NOT wish to suggest that this is the only way to classify approaches to process evaluation and therefore also not that these “approaches” are mutually exclusive: • Theory-driven process evaluations (using the programme logic model or logframe as  framework). • Intervention-driven process evaluations (using the generic framework of programmes as a framework for the process evaluation).

5.5.1 Theory-driven process evaluations It is essential for an evaluator to attempt to understand, and therefore to make explicit, the implicit theory of a programme. One of the tools to do this is the Logic Model. As discussed above a logic model reconstructs the causal model or implicit theory of a programme. It makes explicit the causal connections between the various programme objectives, activities, outputs and expected outcomes. A complete Logic Model specifies in some detail which activities programme staff will engage in, in order to achieve the stated objectives; what outputs or deliverables to expect from good programme delivery and ultimately what benefits will accrue to the target group (programme impact) as explained in Chapter 3.

180

Chapter 5 The value of a logic model is not only evident during the phase of clarificatory evaluation, but it provides a platform or basis for process evaluations as well. By specifying clearly the various programme components, it already suggests what questions to ask during the implementation evaluation phase.

5.5.2 Intervention-driven process evaluations A precondition for doing a theory-driven evaluation is that the evaluation team must work with programme staff from the outset. It must be possible to engage with programme staff from the beginning in developing the logic model and subsequent programme theory. However, this is not always possible. When evaluators come on board when a programme’s implementation or roll-out has started, they have to adjust and structure their process evaluation differently. In cases such as these, no logic model or explicit programme theory is available. This means that the evaluator now has to focus more closely on the programme and intervention structure. One possible approach here is to use a rather generic framework of social programmes to guide the evaluation process. A note on programme monitoring studies: Some evaluation textbooks view programme monitoring as a special case of implementation or process evaluation studies. In the sense that both process evaluation and programme monitoring are aimed at assessing aspects of programme delivery, this is true. But there are two distinct differences between these two types of evaluation study: • Programme monitoring is essentially descriptive rather than evaluative in nature as is explained in Chapter 1. This means that it aims to generate descriptive, factual data on programme delivery. The focus in programme monitoring can be on processes and their outputs (activities/key events), on programme outcomes or on both. • Programme monitoring of mature (even standard) programmes often becomes a routine activity. Many specific interventions in a complex programme requires regular monitoring – the everyday meaning of the word “monitoring”. It is important to point out that the continuous monitoring of key programme indicators of selected aspects of programme processes or activities is also a tool for effective project management as is explained in Chapter 6. In this sense, monitoring data are often integrated into the routine information systems of a programme (or management information system). Rossi, Freeman and Lipsey (2004:166-175) emphasise that programme monitoring serves somewhat different purposes when undertaken from the perspectives of evaluation, accountability, and programme management, although the types of data required and methods of data collections are similar. From an evaluation perspective, programme monitoring serves at least three functions: • Evaluators need information about the extent of programme delivery in order to substantiate claims about the usefulness of the outcomes of the intervention. • Evaluators also require information about the coverage of the intervention, i.e. what proportion of the target population has benefited from the programme.

181

Programme Evaluation Designs and Methods • Evaluators also need information for programme diffusion. An intervention can only be replicated elsewhere if one knows in great detail how it was implemented in the first  instance.

5.6

Impact evaluation (assessment) studies

Impact assessment has traditionally been defined as estimating the net effects of a programme, i.e. those effects other than extraneous factors or design effects that can reasonably be attributed to the programme. The following schematic presentation captures the basic logic of this definition of impact assessment (Rossi, Freeman & Lipsey, 2004). Figure 5.5: Definition of impact assessment

Gross effects

Effects of Intervention (net effect)

Effects of other processes (extraneous factors)

Design effects

This schematic presentation shows that estimating the total or gross impact of a programme involves isolating the effects that can be attributed to the intervention itself (net effect), from other processes or events that may also impact on the target group as well as any effects that are due to the evaluation design itself, such as instrumentation effects and sampling effects. However, in complex interventions it is more often than not very difficult if not impossible to isolate the effects of various interventions (other interventions by the government, district, NGOs or the role of parents in demanding better school performance, and so on). This is one of the reasons why classical impact evaluation studies – even when they employ true experimental designs – are often found not to enable unequivocal conclusions about causal attribution. This already suggests that the aim of impact evaluation or assessment studies is more than merely attempting to judge whether some impact has occurred (or to what extent), but it is also to understand how the intervention produced or (more often) contributed to the production of certain impacts. It is rarely the case that an intervention is the sole cause of changes. Therefore, “causal attribution” should not be equated with total or sole attribution (that is, the intervention was the only cause), but to partial attribution or to analysing the intervention’s contribution. It is, therefore, important to have a clear understanding of what we mean by “causality” and “causal claims” (such as making claims about causal attribution or contribution).

5.6.1 A note on causality Causality involves relationships between events or conditions and is often discussed in terms of necessary and sufficient conditions. In terms of this distinction, then, does it mean that when we claim that an “intervention caused something” (such as an improvement in service delivery) that the intervention was a necessary condition for the improvement to occur? In most cases the answer is probably no. Necessity means that the impacts can only be realised if there is the specific intervention. Yet most desired impacts, such

182

Chapter 5 as better health or education, reduced poverty, gains in learner performance, improved service delivery, etc., can potentially be realised through a variety of different types of interventions, and not only the specific intervention of interest. In fact, it would be somewhat presumptuous to say that a particular intervention is the only way possible to bring about the desired impacts. So can we claim that our intervention (to improve service delivery) is a sufficient condition for the desired impact to occur? Again, clearly, the answer is no. The reality is that we often discover that there are a number of – perhaps even many – other factors also at work that can produce the desired change. So, on its own, the intervention is not sufficient. But we do expect that the intervention, along with other influencing factors, is indeed sufficient; that collectively this set of actions and conditions, including the intervention, did bring about the impacts. And indeed, when we say X causes Y in everyday discussions, sufficiency is probably what we usually mean; that X did indeed produce or lead to Y. John Mackie (1988), an Australian philosopher, suggested that the best way of talking about causality is to refer to INUS conditions (insufficient but non-redundant parts of a condition which is itself unnecessary but sufficient for the occurrence of the effect). For example, a short circuit as a cause for a house burning down. Consider the collection of events: the short circuit, the proximity of flammable material, and the absence of fire fighters. Together these are unnecessary but sufficient to the house’s burning down (since many other collections of events certainly could have led to the house burning down, for example shooting the house with a flamethrower in the presence of oxygen etc.). Within this collection, the short circuit is an insufficient (since the short circuit by itself would not have caused the fire, but the fire would not have happened without it, everything else being equal) but a non-redundant part of a condition which is itself unnecessary (since something else could also have caused the house to burn down) but sufficient for the occurrence of the effect. So, the short circuit is an INUS condition for the occurrence of the house burning down. Causes that are neither necessary nor sufficient are called contributory causes. Thus, for example, smoking heavily is a contributory cause of lung cancer. It is not a necessary cause, since there are other sources of lung cancer, and it is not a sufficient cause, since not all such smokers suffer from lung cancer. In summary then: A cause may be classified as a “contributory cause”, if the presumed cause precedes the effect, and altering the cause alters the effect. It does not require that all those subjects which possess the contributory cause experience the effect. It does not require that all those subjects who are free of the contributory cause be free of the effect. In other words, a contributory cause may be neither necessary nor sufficient but it must be contributory. Causal attribution, however, is the ultimate goal of rigorous ideal-type scientific research, but this goal is frequently very elusive.

5.6.2 How to establish causal attribution In their classic study on this topic, Cook and Campbell (1979) argued that three conditions must be met before we can infer that a cause-effect relationship exists:

183

Programme Evaluation Designs and Methods 1. Co-variation. Changes in the presumed cause must be related to changes in the presumed effect. Thus, if we introduce, remove or change the level of a treatment or programme we should observe some change in the outcome measures. 2. Temporal precedence. The presumed cause must occur prior to the presumed effect. 3. No plausible alternative explanations. The presumed cause must be the only reasonable explanation for changes in the outcome measures. If there are other factors which could be responsible for changes in the outcome measures, we cannot be confident that the presumed cause-effect relationship is correct. The first condition (co-variation) simply means that the introduction of an intervention (a school feeding programme) must be related (co-vary) to changes in the presumed effects or outcomes (increase in the nutritional status of the learners in the school). Conversely, if the programme is terminated, one would expect that these observed changes would disappear. The second condition (temporal precedence) states a fairly common-sense requirement: causes (interventions) must precede their expected effects in time (programme outcomes). The third condition often poses the biggest challenges for evaluators having to consider what other conditions or factors could have also produced the observed outcomes. In their 1979 study on Quasi-Experimentation, Cook and Campbell produced a very detailed listing of other conditions which need to be considered when ruling out alternative explanations. These would include maturation effects (beneficiaries maturing and changing over the course of an intervention), contamination effects (other interventions also contributing to the observed effects), history effects (historical events that occurred during the course of the programme that may have contributed to the observed effects) and so on. Patricia Rogers (2012) proposed a slightly different “framework” for investigating causal attribution. She suggests that the starting point should be the factual – to compare the actual results to those expected if the theory of change were true. When, where and for whom did the impacts occur? Are these results consistent with the theory that the intervention caused or contributed to the results? The second component is the counterfactual – an estimate of what would have happened in the absence of the intervention. The third component is to investigate and rule out alternative explanations. She also argues that in some cases, it will be possible to include all three components in an impact evaluation. In complex situations, it might not be possible to estimate the counterfactual, and causal analysis will need to depend on the other components. Rogers lists three possible methods for examining the factual (the extent to which actual results match what was expected): • Comparative case studies – did the intervention produce results only in cases when the other necessary elements were in place? • Dose response – were there better outcomes for participants who received more of the intervention (for example, attended more of the workshops or received more support)? • Temporality – did the impacts occur at a time consistent with the theory of change – not before the intervention was implemented? These three methods would apply to all impact as well as outcome evaluation studies (cf. next section). But methods to measure the counterfactual usually require some form of experimental and quasi-experimental design. These “traditional” impact evaluation

184

Chapter 5 designs essentially employ two design principles to measure the counterfactual: control groups (ideally created through randomisation) and baseline measures (before and after logic). Stated simply – we compare two situations; one where one group receives an intervention and another group doesn’t. On the assumption that the two groups are reasonably equivalent (pre-intervention) significant changes in the experimental group (post-intervention) are attributed to the intervention. In the absence of the possibility of comparison groups, we take a before measure of our target group (where the counterfactual thinking is then that things will stay as similar as possible to this situation in the absence of our intervention). Even though some variation of the classic experimental design is still regarded as the most systematic and rigorous design type to measure the counterfactual, this does not mean that such designs are failsafe. Developing a credible counterfactual can be difficult in practice. It is often difficult to match individuals or communities on the variables that really make a difference. In some cases the choice of control groups are very difficult and may even raise issues of ethics. Randomised controlled trials can randomly create non-equivalent rather than equivalent groups. Other methods depend on various assumptions which might not be met. In situations of rapid and unpredictable change, it might not be possible to construct a credible counterfactual. It might be possible to build a strong, empirical case that an intervention produced certain impacts, but not to be sure about what would have happened if the intervention had not been implemented. As mentioned in Chapter 4, one can also use a series of alternative counterfactual designs for contexts where rigorous quantitative designs are not feasible. They can be qualitative or mixed-methods designs (Greene, 2007; Woolcock, 2009; Bamberger, Rao & Woolcock, 2010; Bamberger, Rugh & Mabry, 2012; Rugh & Bamberger, 2012; Sterne et al. 2012; Bamberger, 2012; Perrin, 2012; Rogers, 2012). Contribution Analysis is a specific qualitative approach that is gaining increasing popularity in both international and national evaluations as an alternative approach to rigorous attribution analysis (Rugh & Bamberger, 2012: http:// betterevaluation.org/plan/approach/contribution_analysis). Having said this, experimental designs (and different subtypes) remain the most robust designs to measure causal attribution through focusing on the counterfactual. This resulted in the perception of some evaluators that some of these designs comprise the so-called “gold standard” for evaluation. This view is controversial and the application of these designs in especially lesser developed contexts is increasingly questioned. However, as a result of the continuing prevalence of this approach in many evaluation circles still, the rigorous requirements for good experimental evaluation designs are summarised in more detail below.

5.6.3 Experimental designs and the counterfactual The classic “typology” of quantitative evaluation studies – which distinguished between experimental and quasi-experimental designs – is found in the early works of Donald Campbell and Julian Stanley (1966) and Thomas Cook and Donald Campbell (1979). The basic premise of the Cook typology is that there are two main categories of design types that

185

Programme Evaluation Designs and Methods are appropriate for assessing programme effects or impact: the classic true experimental design (which is the ideal and strongest design possible) and various quasi-experimental designs (weaker design types but still useful). We discuss these basic types below.

5.7

The classic or true experimental design

The true experiment consists of the following components (see Figure 5.6): • Randomly assigning persons, facilities or communities to experimental (or intervention)  groups; • Taking measurements both before and after the intervention (baseline and outcome measures); and • Measuring the impact as the difference between changes in outcome measures for the intervention group and the control group. The strongest of all designs, this is regarded as the gold standard among many (especially quantitatively-oriented) evaluation specialists. It does pose some challenges, however: • It is not always feasible to randomly assign subjects to experimental groups. • In order for randomised experiments to be effective, you need to be able to maintain experimental conditions during the course of the intervention (for example, minimise the number of other interventions introduced to either your experimental or control groups). The true experimental design commences with assigning subjects (or other units of evaluation) randomly from the target population (individuals, companies, organisations, schools, clinics, communities, and so on) to the experimental and control groups respectively. The thinking behind random assignment is that by randomising treatment assignment, the group attributes for the different treatments or interventions will be roughly equivalent and therefore any effect observed can be linked to the intervention and is not a characteristic of the individuals in the group. It is important to emphasise that random assignment does not guarantee that the groups are “matched” or equivalent, only that any differences are due to chance. Figure 5.6: The true experimental randomised design

186

Experimental group

O2

Comparison group

O2

Intervention

O2,O2

O2,O2

Statistical controls

Population of targeted beneficiaries (learners/ patients/etc)

Non-Random assignment of subjects (Matching)

Non-equivalent comparison group design (partial coverage)

Chapter 5 The notion of “random assignment” is used when designing experimental studies and is not to be confused with the notion of “random sampling” or “random selection” which is a notion that is peculiar to survey studies. Random sampling or random selection of cases (a stage in survey design) is often confused with the notion of random assignment (a process in experimental studies). Random sampling is how you draw the sample of people for your study from a population. Random assignment is how you assign the sample that you draw to different groups or treatments in an evaluation study. It is possible to have both random selection and assignment in a study. Let’s say you drew a random sample of 100 clients from a population list of 1000 current clients of your organisation. That is random sampling. Now, let’s say you randomly assign 50 of these clients to get some new additional treatment and the other 50 to be controls. That’s random assignment. It is also possible to have only one of these (random selection or random assignment) but not the other in a study. For instance, if you do not randomly draw the 100 cases from your list of 1000 but instead just take the first 100 on the list, you do not have random selection. But you could still randomly assign this non-random sample to treatment versus control. Or, you could randomly select 100 from your list of 1000 and then non-randomly (haphazardly) assign them to treatment or control. And, it’s possible to have neither random selection nor random assignment. In a typical non-equivalent groups design you might non-randomly choose two fifth grade classes to be in your study. This is non-random selection. Then, you could arbitrarily assign one to get the new educational programme and the other to be the control. This is non-random (or non-equivalent) assignment. Random selection is related to sampling and therefore is mostly related to the external or validity or generalisability of one’s results. After all, we would randomly sample so that our research participants better represent the larger group from which they’re drawn. Random assignment is a principle in experimental design. Therefore, random assignment is most related to internal validity. After all, we randomly assign in order to help assure that our treatment groups are similar to each other (i.e. equivalent) prior to the treatment. Below is an example of a true experimental (randomised control trial) study from the field of school interventions. The selection of this case is intentional as evaluators often argue that social programmes, including school interventions, are so complex that it is difficult, if not impossible, to design experimental evaluation studies to assess the impact of such interventions. This evaluation study, which assessed the introduction of additional learner support materials in a number of Gauteng schools, is an excellent example of how experimental design principles can in fact – with the necessary qualifications – be used in assessing real-life and complex interventions.

5.7.1 Case Study 1: Evaluating the impact of the introduction of additional learner books in schools on learner performance Between January and June 2010, a consortium of evaluators undertook a randomised control trial of learning support materials in Grade 6 classes in 44 primary schools serving lowincome communities in Gauteng province (Fleisch, Taylor, Herholdt & Sapire, 2011). The aim was to assess whether the provision of learner support materials, particularly custom-

187

Programme Evaluation Designs and Methods designed workbooks, improve primary mathematics achievement more cost effectively than providing conventional textbooks. Two sets of quality learner materials (a workbook and a textbook) were tested. Learners in the experimental group were all issued with a “workbook” consisting of a set of carefully sequenced worksheets covering the curriculum. The workbook emphasises basic skill proficiency and the four basic operations: addition, subtraction, multiplication and division. Learners in the control group of schools were all given an approved “textbook” that is widely used in South African primary schools. The research question framing the study was: Does a workbook, customised to address the lack of proficiency in basic mathematical operations by poor South African learners, produce improved learning gains over a conventional textbook? Although the impact evaluation design is described as a “randomised control trial design”, the authors qualify this by indicating that the control group did receive an intervention: “The PMRP Study used a modified standard treatment/control approach. For ethical and policy reasons, it was decided that rather than simply comparing the relative gains of intervention schools with schools who received no intervention, the control schools would receive a complete set of materials, representing not what currently exists, but what could be standard practice if the schools were properly provisioned with existing approved materials and used these regularly” (Fleisch, Taylor, Herholdt & Sapire, 2011:491). Working closely with the University of Michigan’s Capacity Building for Group Level Interventions, the study initially considered using a multisite cluster randomised field trial design. This is a novel approach, which essentially consists of working in relatively large schools and randomly assigning all classes within each school to the treatment or control group respectively, rather than using only one class per school for either treatment or control. The multisite cluster design has a number of advantages, chief of which is that fewer schools are needed to achieve the required number of classes; hence this design holds the promise of reducing costs significantly. The evaluators therefore proceeded to select schools according to the multisite cluster design, using all primary schools in Gauteng with four or more Grade 6 classes as the target population. A list of all Gauteng public primary schools was drawn from the provincial database. All schools with more than two Grade 6 classes, fewer than 30 learners in Grade 6 and all schools in quintile 5 (i.e. the most affluent schools) were excluded. The remaining schools were ordered randomly and intervention and control schools assigned alternately off the random list. Starting at the top of this list, schools were then called and informed of the study. When a school could not be contacted after repeated efforts, it was deleted from the list: of the first 53 schools on the list, nine could not be contacted. In studies of this kind it is generally desirable to have a counterfactual: this is what the subjects of the experimental evaluation would have received if the intervention had not taken place. However, it seems obvious that any group of classes which is subject to an intervention consisting of a set of materials which facilitates appropriate pacing and daily reading and writing would show improved learning, even if the materials were of a relatively mediocre quality. Therefore it was decided to provide control schools with

188

Chapter 5 additional standard materials, thus making the study a comparison of the intervention schools with control schools using “enhanced standard practice”. The evaluation team developed a new set of customised outcome measures for this study. A test covering number and operations (NCS LO1 Grade 6) was constructed by the research team. The first draft of the test consisted of 60 items covering eight skill categories – number concepts (place value, comparing numbers), fractions, addition, subtraction, multiplication, division, problem-solving and mental operations – all specified in the NCS for Grades 3 to 6. This test was administered to both project and control groups as a preand post-test. Item difficulty and discrimination indices were calculated per group for each item on the pre- and post-test. Using the results of the post-test, 40 items were selected with item means between 0.15 and 0.91 and item discrimination indices between 0.20 and 0.80. Findings: The overall findings showed no statistical differences in the gain scores between the project and control schools. Whether assessed on the percentage point gains, on the percentage gains from the pre-test baseline or the increased proportion of learners that “passed” the test, there is little evidence that either set of materials is more advantageous. Both made significant gains of roughly the same magnitude. However, when “dosage” was introduced into the equation, the picture changed. A clear correlation was found between the amount of gain recorded and the coverage of the materials in the classroom. The study tracked coverage of the materials through visits to classrooms on three separate occasions. Fieldworkers examined learner books and noted how far the class had progressed and whether all exercises had been completed up to that point. From these data, percentage coverage could be calculated for each class. The two interventions diverge in terms of the gain scores by degree of coverage at a minimum of 79% and below 79%. Within each coverage category, each group’s mean improved significantly from the pre-test to the post-test suggesting that providing programme materials makes a difference. As would be expected, the improvement is higher when 79% or more of the material is covered. These differences are striking, but their origin is unclear and a great deal more qualitative work is required to gain a better understanding of this phenomenon. In general, for both groups, gain scores increased with increasing coverage, an obvious and expected development, which distribution by group provides strong evidence in support of the hypothesis that it is use of the materials that causes learning gains.

5.8

Quasi-experimental designs

There are a number of so-called quasi-experimental design types that group together those studies where random assignment of subjects or units to the intervention is neither possible nor feasible. The most common types of quasi-experimental designs are: • Non-equivalent comparison group pre-test- post-test design • Time series and multiple-time series design

5.8.1 Non-equivalent comparison group pre-test- post-test design This design is similar to the randomised pre-test- post-test control group design, except that the comparison group is created by matching rather than random assignment, which

189

Programme Evaluation Designs and Methods is why it is called “non-equivalent” (Figure 5.7). The group members are not randomly assigned and do not have an equal chance of being assigned to the intervention or control group. With non-equivalent control group pre-test- post-test design, one would: • Create experimental groups by matching particular characteristics that are considered to be important antecedents of the outcomes sought by the programme; • Take measurements both before and after the intervention; and • Measure impact as the difference between changes in outcome indicators for the intervention group and the comparison group. Figure 5.7: Non-equivalent group design

Experimental group

O2

Comparison group

O2

Intervention

O2,O2

O2,O2

Statistical controls

Population of targeted beneficiaries (learners/ patients/etc)

Non-Random assignment of subjects (Matching)

Non-equivalent comparison group design (partial coverage)

5.8.2 Case Study 2: Evaluation of the Cape Teaching and Leadership Institute (CTLI) The second case illustrates an evaluation which utilised a quasi-experimental design and specifically a non-equivalent comparison group design. The purpose of the evaluation was to evaluate the process, procedures, and impact of the Cape Teaching and Leadership Institute (CTLI), an in-service teacher training centre in the Western Cape. The evaluation ultimately sought to establish whether CTLI’s training courses were making a difference in schools and to identify how CTLI could increase and sustain their impact (De Chaisemartin, 2010). The evaluation was theory-based. The underlying theory of change of the work of the Institute was captured in the following narrative statement: If the right educators are selected for training and the training makes use of high quality materials and is delivered effectively, then this will lead to improved teacher subject knowledge, improved classroom practices and improved professional practices. This in turn will produce better teaching, which will ultimately result in improved learner performance. Some experimental design principles were used in the evaluation design, such as the notions of pre-test and post-test (but only at the teacher level) and comparison schools (for learner

190

Chapter 5 performance data). But, as we will see below, there was no random assignment of the teachers or schools to the programme. The CTLI’s priority has always been to train educators from the weakest performing schools in the system. When the institute first opened its doors in 2003, CTLI received a clear mandate to train only the schools that performed the poorest in the provincial systemic tests. CTLI recognised that districts have different dynamics and therefore did not want to be too prescriptive about how the selection of educators should take place. In 2009, CTLI visited all 49 circuits to promote the institute and encouraged circuits to send their weakest performing schools for training. At the end of September, a circular was sent to share the training course programme for 2010 and open up the registration process. But what happened in reality is a classic case of self-selection. The different districts eventually nominated teachers to attend the CTLI training on the basis of very different considerations: “It appears that district coordinators and officials were minimally involved in the selection process. All districts reported that nominations were predominantly made by schools and the names of educators forwarded straight to the district coordinator. According to their estimates, teachers primarily nominated themselves in Metropole Central and Overberg, and principals primarily nominated their teachers in Metropole South, North, and West Coast. Very few nominations were made at the circuit level, which only took place in two districts. However, it should also be mentioned that Metropole North did provide some guidance to schools on who should be nominated to attend training and Metropole South did verify the list of names that was sent in by the schools. Selection criteria were used in the three urban districts only. Metropole South and North somewhat used the Literacy and Numeracy systemic test results, whereas Metropole Central focused more on educators’ individual needs as defined by IQMS. Surprisingly, only two of the six districts were aware that they were supposed to send seven participants to each course. The two considerations that guided that final selection where that not too many educators should come from one school at a time and if there were too many nominations priority was given to those who hadn’t previously attended training at CTLI” (De Chaisemartin, 2010:12-13). Not surprisingly then, the evaluator concluded that “the profile of schools that attend training at CTLI does not seem to correspond with CTLI’s target group of weak schools”. In fact, it seems as if the “urban districts may have been more involved in the selection process, which would have resulted in a greater number of schools coming from these areas. Metropole South also mentioned that a greater awareness of Lit Num results in their schools has contributed to a greater number of teachers being willing to enrol in training courses” (De Chaisemartin, 2010:2). The end result of the selection process was twofold: First, it was clearly not a process of random assignment but a case where the districts and schools selected (nominated) themselves for inclusion in the training programme and secondly, ironically it seems as if the stronger schools responded better and nominated more teachers than the weaker schools. So not only was there no equivalence between the different groups selected, but the weaker schools – which were the target of the CTLI – were underrepresented in the  programme. We referred above to the problem of attributing causality when there may be a range of factors (conditions and events) that contribute to certain effects. In evaluation studies these

191

Programme Evaluation Designs and Methods are often referred to as “contamination” factors and specifically to the reality that multiple interventions may co-occur in a specific situation. This is especially prevalent in schooling in South Africa where we often find a variety of interventions by government and nongovernment organisations being simultaneously implemented in the same schools. As it happened, this was a reality in this programme as well. The following extract from the same report is a good illustration of the effects of intervention contamination, viz. where another provincial intervention (in this case) was being implemented in many of the same project schools. “Another training strategy being implemented by the WCED to raise the Literacy and Numeracy results in the province is known as the Literacy/Numeracy Intervention. Two hundred and fifty schools are currently participating in the programme. Every education district chose about 31 functional schools, which consisted of weak, average, and a few strong schools. The training model consists of intensive training and year-long in-school support for 1 teacher per grade per learning area (literacy and numeracy) per school. The intention is for the lead teacher that attends the training and receives the support to pass on the knowledge to the rest of the teachers in his/her grade. In 2009, half of the Lit/Num schools received training in Numeracy/Maths while the other half received training in Literacy/Language. In 2010 the schools switched. Those that had received Numeracy/Maths training the previous year received Literacy/Language and vice versa” … and “… (g)iven the amount of support these schools are receiving, it is important to ascertain how many are also sending teachers to Literacy/Language and Numeracy/Maths courses at CTLI. Doubling up on training can be problematic if the two training programmes do not necessarily align with one another and can lead to training overload”. An analysis of the data shows that 75 Lit/Num schools, or close to one third (30%), also attended training at CTLI in 2010. This overlap occurred in all the districts, but was especially pronounced in Metropole East, North and South. Thirteen schools received literacy/language training from CTLI and the Lit/Num intervention in the same year, and 27 schools received CTLI training one year after they had received it from Lit/Num. Similarly, 9 schools received Numeracy/Maths training from CTLI and Lit/Num the same year, and 20 schools received CTLI training the year after. “Also, 36 schools or close to half (48%) attended a School Management course, as part of the Lit/Num Intervention. Considering the investment the department makes through its Lit/Num and CTLI training, it is essential to examine further whether these two training programmes detract or re-enforce one another. If they do reenforce one another, it should be considered whether the added value of attending the other training programme is worth the cost” (De Chaisemartin, 2010:19). The De Chaisemartin report presented a range of empirical findings pertaining to classroom results, teacher outcomes and learner performance. We suffice with a brief comment on learner performance results and what the evaluator reported in this regard. Two types of analysis were used: The first analysis measured gains made in the WCED’s systemic test before and after the training, for schools who had received CTLI training in that particular subject for the first time within a specified period. Gains from this “pre” and “post” measure were then compared to gains made by all other schools in the province who had never received CTLI training in that subject. The second analysis compared the overall gains made from 2002 to 2008 in the Foundation Phase and from 2005 to 2009 in the Intermediate Phase

192

Chapter 5 for schools who received CTLI training versus schools who hadn’t. Their conclusions about these measures are the following: “Results from both analyses suggest that CTLI courses may have had a positive impact on learner performance, as schools that attended CTLI training courses showed more improvement than schools who hadn’t. Moreover, evidence suggests that learner gains may increase when a larger number of teachers attend the same training course. In FP literacy, schools that had sent five or more teachers through the years to the FP literacy course improved their 2002 to 2008 literacy scores by 17.8%. In comparison, schools that had sent one to four teachers improved by 13.0% and those that had sent 0 teachers improved by only 7.8%. In FP numeracy, schools that had sent five or more teachers through the years to the FP numeracy course improved their 2002 to 2008 numeracy scores by 5.0%, whereas schools that had sent one to four teachers improved by 3.4%, and schools that had sent no teachers improved by a mere 0.5%. Similarly, schools that sent five or more teachers to the IP language course improved their 2005 to 2009 scores by 6.1%, whereas schools that had sent one to four teachers improved by 2.0%, and those that had sent none decreased their scores by 0.6%. Finally, schools that sent five or more teachers to the IP maths course improved their 2005 to 2009 scores by 8.4%, whereas schools who sent one to four teachers improved by 7.8%, and schools that sent none improved by only 3.6%. These changes are all statistically significant. Although it is possible that another factor that is common to schools who sought training at CTLI is responsible for the difference in learner gains, it is likely that the actual training course at the very least contributed to those gains. The large relative gains recorded in numeracy and mathematics for schools that sent to CTLI is particularly significant, given the fact that provincial mean scores have hardly changed since the tests were introduced in 2002” (De Chaisemartin, 2010:7-8). This discussion refers to the notion of “dosage” of intervention. The idea is sound: to compare the learner performance of those schools where more educators attended (5 or more) compared to those schools which sent 4 or less. The statistical results show that relatively bigger gains were recorded where there was more intervention dosage. Again, as Rogers has pointed out, this is an important method in establishing the degree of change: to correlate the extent of change achieved with the amount (or dosage) of the intervention delivered.

5.9

Time-series and multi time-series designs

Time-series designs use pre-intervention trends as a measure of comparison (rather than a comparison or control group). Rossi, Freeman and Lipsey (2004:289-290) refer to this as the use of reflexive controls to estimate programme effects: “In studies using reflexive controls, the estimations of program effects comes entirely from information on the targets at two or more points in time, at least one of which is before exposure to the program. When reflexive controls are used, the presumption must be made that the targets have not changed on the outcome variable during the time between observations except for any change induced by the intervention”. In practice, time-series designs involve the following steps (Figure 5.8): • With this design one takes several measurements of outcome measures for a programme’s target population both before and after the intervention; • One then uses these outcome measures (or indicators) prior to implementation to project or estimate what would have happened in the absence of a programme (assuming that this trend would have continued if a programme had not been implemented);

193

Programme Evaluation Designs and Methods • One then compares the trend in indicators prior to the intervention to the postintervention trends; and • Detect programme impact as a change in trends after an intervention was implemented. Figure 5.8: Time-series design

Time series design (full coverage)

Experimental (treatment) group

O

O

O

O

Intervention

O

O

O

Time Figure 5.9: Vehicular Accident Rates 10,000

ALL HOURS AND DAYS

9,000

8,000 1500 7,000

1400 1300

6,000

1200

COMMUTING HOURS

1100 5,000

1000 900

4,000

800

WEEKEND NIGHTS

700 3,000

600 500

2,000

400 300

1,000

200 100

(Source: Ross, Campbell and Glass, 1970)

One of the best examples of time-series designs discussed by Rossi, Freeman and Lipsey is a study of the impact of compulsory breathalyser tests on traffic accidents in the UK. In 1967 the British Government enacted a new policy that allowed police to give breathalyser tests at the scenes of accidents. The test measured the presence of alcohol in the blood of suspects. At the same time, heavier penalties were instituted for drunken driving convictions. Figure 5.9 plots vehicular accident rates by various periods of the week before and after the new legislation went into effect (Ross, Campbell & Glass, 1970). Visual inspection of the chart clearly indicates that a decline in accidents occurred after the legislation, which affected most times of the week

194

Chapter 5 but had especially dramatic effects for weekend periods. Statistical tests verified that these declines are greater than could be expected from chance components of these data. I would argue that evaluators should consider utilising time-series designs more than is currently the case. It seems that evaluators always turn to experimental or quasiexperimental designs where some control or comparison group is required when designing an impact evaluation study. But, as I have argued above, there are many cases where a control group is not available. Time-series designs use the principle of reflexive control to assist in estimating programme effects and impact. They are especially useful in cases where there is a series of observations on the outcome measure before the intervention. There are many fields where this is actually the case: • We regularly measure and monitor learner performance in schools over extended periods of time. • We regularly monitor patient cases in hospitals and clinics. • We regularly monitor service delivery in local and provincial government projects (water and sanitation projects, big infrastructure projects). • We regularly measure economic behaviours and events (household consumption, job creation and so on). However, Rossi, Freeman and Lipsey (2004:292) point out that the units of analysis in timeseries studies are often highly aggregated data. Data is often available at district, provincial or national level and this may restrict the overall use and uptake of this design type. On the other hand, they argue that time-series studies can in fact also be used with disaggregated data: “An example is the analysis of interventions administered to small groups of persons whose behaviour is measured a number of times before, after and perhaps during program participation. Therapists, for example, have used time-series designs to assess the impact of treatments on individual clients. Thus, a child’s performance on some achievement test may be measured periodically before and after a new teaching method is used with the child, or an adult’s drinking behaviour may be measured before and after therapy for alcohol abuse”.

5.10

Outcome evaluations

The discussion of impact evaluations in the previous section has highlighted the design and methodological challenges when an evaluator is faced with conducting a rigorous impact assessment. The greatest challenge is to meet the conditions that have to be met in order to make causal attribution claims. We have argued that even the most robust designs – experimental and quasi-experimental designs – do not always “guarantee” that such claims can be made with reasonable confidence. The other obvious constraint – which was highlighted a number of times in this chapter – is the reality that many (if not most) of our interventions are highly complex (see Patricia Rogers’ excellent paper (2008) on the difference between simple, complicated and complex programmes) and do not lend themselves to the application of any form of experimental or quasi-experimental designs. Experimental designs work best with very “contained” and rather simplistic interventions (such as drug trials) where a single or few causal strands are at work and a single target group is the intended beneficiaries. Interventions in communities, schools, organisations

195

Programme Evaluation Designs and Methods and so on usually do not meet these criteria, which makes it extremely difficult to meet the goals of causal attribution. In cases such as these, it has now become increasingly accepted to conduct an outcome evaluation rather than an impact evaluation. Our discussion of “outcome evaluation designs” is organised around four propositions.

5.10.1 Proposition 1: Outcome evaluation studies is a type of evaluation study distinct from impact evaluation studies Outcome evaluations are studies that attempt to show that the expected outcomes (as described by the programme theory of change and logic) have been achieved. Having demonstrated, through measurement of appropriate outcome indicators, that the programme’s anticipated effects have materialised, the evaluator would claim that an outcome evaluation has been conducted. Such studies are not traditional quantitative impact evaluations as no statistical attempt is made to attribute the observed outcomes unequivocally or solely to the programme. The distinction between the traditional quantitative approach to “impact assessment” and “outcome evaluation” is based on the fact that the former makes strong statistical causal attribution claims whilst the latter does not always do so. In a recent discussion of these two notions – impact assessment and outcome evaluation – Howard White (2009) describes the crucial characteristic of quantitative impact assessment in the following terms: “Impact is defined as the difference in the indicator of interest (Y) with the intervention (Y1) and without the intervention (Y0). That is, impact = Y1 – Y0 (e.g. Ravallion, 2008). An impact evaluation is a study which tackles the issue of attribution by identifying the counterfactual value of Y (Y0) in a rigorous manner”. White (2009), however, also acknowledges that there is an alternative tradition in evaluation studies where impact “refers to the final level of the causal chain (or log frame), with impact differing from outcomes as the former refers to long-term effects. Any evaluation which refers to impact (or often outcome) indicators is thus, by definition, an impact evaluation. Hence, for example, outcome monitoring can fall under the heading of impact evaluation. Many studies can be considered to be impact evaluations since they discuss outcome and impact indicators, whilst making no attempt to attribute changes in those indicators to the intervention. Indeed such studies often explicitly state that attribution is not possible”. Although White suggests a kind of compromise where we accept these two interpretations as simply two different approaches to “impact evaluation”, it is more useful to hold on to the traditional definition of impact evaluation as implying strong statistical causal attribution and to reserve the notion of “outcome evaluation” for a different, more qualitative causal attribution type of evaluation study. The proposal here, however, is to honour the traditional and strong sense in which impact assessment must statistically demonstrate that programme impact can be causally attributed to the programme (and not some other possible intervention(s)), if that is feasible.

196

Chapter 5

5.10.2 Proposition 2: The assessment of programme outcome(s) has to incorporate contextual conditions One of the basic premises of the theory of realistic evaluation (Pawson and Tilley, 1997) is that programmes produce outcomes under specific conditions. Outcome configurations, as they call it, are the result of a programme generating outcomes or effects that are determined by the specific context and circumstances of the target group or unit. This is not a counter intuitive idea. For example, it is well known and well documented that the same intervention delivered at very different sites very often produces very different results. A teacher development programme that is implemented across 50 schools very often produce very different results: in some cases making no difference, in other cases making some degree of difference but not in any systematic fashion, and in other cases producing more widespread results that may or may not be sustained over time. On the assumption that the exact same programme was delivered (in terms of quality, dosage, fidelity to design, etc.) it is logical to assume that the differential effects are the results of differences between these schools (in terms of resourcing, capacities, commitment, leadership, parental support and many other factors). The point is that outcomes vary by intervention, characteristics of the target group and context. Any study which presents a single estimate of impact – especially in the case of complex interventions – is likely to be viewed with some suspicion (as being overly simplistic). It is also likely to be less useful to potential users of the results than a study which demonstrates under which circumstances interventions are more effective, which target groups benefit most, and what environmental settings are useful or not in achieving impact. Hence it could be shown that an educational intervention, such as installing a computer lab in a rural school, works – but only if teachers have acquired a certain level of computer proficiency, or if the internet connection to the school is stable and there is consistent technical support, or the learners’ parents are themselves educated and supportive of their children spending additional time in the lab. In trying to make sense, then, of differential outcomes, it is essential that outcome evaluation studies (as all other forms of evaluation studies), should be based on an explicit and clear understanding of the underlying theory of change of the intervention. Stated differently, it is essential that the evaluator has some understanding of the underlying causal mechanisms at work in the intervention. We simply increase our confidence in making causal claims (and programme outcome claims are still causal claims) if we have some understanding of the causal mechanisms at work in a programme. A claim such as “drug resistance education programmes cause increase drug use” becomes considerably more convincing when at least one element in the causal chain is illuminated. Research has shown that students who have received such a programme not only reported higher levels of drug use in later years, but also considered drug use to be significantly more prevalent than those who had not taken the resistance programme. When Donaldson, Graham and Hansen (1994) tested the links between these elements and found these to be significant, the evidence in favour of a causal effect was enhanced further.

197

Programme Evaluation Designs and Methods

5.10.3 Proposition 3: Outcome evaluation studies have to be theory-based evaluations Our third main proposition is that outcome evaluations have to be theory-based. This, of course, is the essence of all theory-based evaluation (TBE) proponents. TBE has argued that theory is good. It helps to make sense of the world and our intervention programmes. Theory does not replace good design but strengthens it. Both impact and outcome evaluation studies benefit from being informed by theory. Theory-based evaluation highlights the chain of causal events that are likely to produce certain outcomes – either prospectively or retrospectively (how theories). It also postulates certain causal mechanisms that would explain why these specific causes are/were likely to cause certain outcomes. TBE foregrounds the theoretical assumptions underlying a programme by describing how particular actions should lead to particular changes. These assumptions about the causal connections in programmes are commonly known as “programme theory”2 (Weiss, 1998). TBE then tests the assumption that a particular activity (or set of activities) is linked to and will result in a specific objective (or set of objectives) being achieved. Programme theory is defined differently by evaluators, each with his/her own particular emphasis. Chen’s definition (1994:43) focuses on the importance of specifying what must be done to achieve the programme’s desired goals and the impact that may be anticipated, whereas Bickman (1990:5) stressed the construction of a plausible and sensible model of how a programme is meant to work. Bickman also emphasised that programme models are developed for a particular programme and do not represent an “off-the-shelf” use of a single, established social science theory. Weiss (1997:501) stressed the causal element of programme theory. Lipsey (1993:36) saw programme theory as a set of propositions regarding what happens in the black box of a programme during the transformation of input to outcome, i.e. how a bad situation is transformed into a better one through some  intervention. By articulating the programme theory, the evaluation can follow the “mini steps” of the programme being evaluated (Birckmayer & Weiss, 2000:408). Weiss (1997) made it clear that the notion of “theory” in the context of programme theory was meant to be quite specific and relate to the causal chain of a particular intervention and not be as broad or generalisable as was usual in social science theory. They promoted the notion of a theory “with a small t” in the context of TBE – that is, these theories should not have the same weighting or status as a widely-accepted, well-researched or validated theory: “Of course the kind of theory we have in mind is not the global conceptual schemes of the grand theorists, but much more prosaic theories that are concerned with how human organizations work and how social problems are generated” (Chen & Rossi, 1983:285). Chen and Rossi (1983:285) recommended that programme theory should be aligned to social science theory where possible: “Social science theory is not well enough developed that appropriate theoretical frameworks and schemas are ordinarily easily available ‘off the shelf’. But the absence of fully developed theory should not prevent one from using the best of what is already at 2 Programme theory was originally called “chain of objectives” (Suchman, 1967:177).

198

Chapter 5 hand. Most important of all, it is necessary to think theoretically, that is, to rise above the specific and the particular to develop general understandings of social phenomena”. Programme theories are the crux of TBE and are typically represented as graphical diagrams that specify relationships among programmatic actions, outcomes, and other factors, although they also may be expressed in tabular, narrative or other forms. Such representations vary widely in their complexity and level of detail. There has been some criticism of the ability of these diagrams or models a) to represent the often complex relationships in a programme, and b) to focus on the mechanisms of change rather than simply the components of the implementation process (Rogers, Petrosino, Huebner & Hacsi, 2000).

5.10.4 Proposition 4: The logic of outcome evaluation studies is closer to the logic of case studies than experimental designs In the final analysis, we would argue that outcome evaluation studies are best understood as falling in the broad category of case study designs. The main elements of case study designs are the following: 1. In contradistinction to the logic of generalisation it embodies the logic of contextualisation. 2. In contradistinction to the typical experimental logic (and counterfactual thinking), it embodies the logic of “causal narration”. 3. Consistent with theory-based evaluation approaches, the evaluative case study design is embedded in a clearly articulated theory of change. The logic of contextualisation: Case study designs typically aim to capture the (unique) contextuality of the cases they investigate. These studies are sensitive to the peculiar and distinct nature of individual cases and aim to understand such cases within the context of particular circumstances. This logic (“contextualisation”) is in contrast to the standard survey design logic of generalisation: where cases are studied as being representative of some larger population. Outcome evaluation studies aim to assess the effectiveness or impact of specific programmes within specific contexts (a major tenet of realistic evaluation approaches as well), rather than as instances of a larger set of similar interventions. This is even more noteworthy in relation to typically distinctive and even unique programmes. Hence, they require very contextspecific explanations. One of the immediate consequences of the logic of contextualisation is the emphasis on comprehensive and “saturated’ data collection. Evaluative case studies, therefore, as is the case for all case study designs, involve the application of multiple datacollection methods and utilising all possible data sources. The more we know about the case, the more we are able to understand the case within its  context. The logic of causal narration: We have already shown that causality and law-likeness are two very different things. Causal explanations (and there are many examples) do not typically require explanation in terms of some law or generalisation. Within the realistic tradition, causal explanations are understood to refer to a specification of the causal mechanisms

199

Programme Evaluation Designs and Methods that produce or generate observable behaviour. And these causal mechanisms need not conform to any law of human behaviour. The logic of causal narration – the converse of the logic of experimentation – does not attempt to isolate effects, but rather to show the logical linkages between effects (cause and effect chains). Postulating a causal narrative account of a series of events still requires rigorous and convincing evidence. It also employs ‘counterfactual thinking’: other possible reconstructions (‘stories’) of the chain of events have to be considered and argued against. Postulating the most plausible account means that alternative explanations (in this case alternative causal narratives) have been tested and discarded on the basis of the available evidence. Theory-based evaluation: The evaluative case study design incorporates a theory-based approach to impact assessment. I do not subscribe to the view that case studies are typically a-theoretical. On the contrary, the best case study designs are grounded in some theory or conceptual framework that guides the search for the most appropriate evidence. This does not mean that case studies are necessary for theory-testing designs (although that is not ruled out). But there is not logical prohibition against case studies based on some set of theoretical postulates or assumptions: in the case of evaluative case study designs, these theoretical postulates are to be found in the implicit or explicit theory of change of the intervention. This means that evaluative case study designs should always include an element of clarificatory evaluation where the intervention’s theory of change is clearly articulated up front. Once this theory of change has been articulated it provides the evaluator with the most appropriate theoretical framework for undertaking the case study. We need to remind ourselves that the theory of change does not only specify what the objectives and expected outcomes of an intervention are. It also traces the causal paths from objectives, through activities and outputs to outcomes. A clear theory of change, as represented in a logic model, is in fact nothing but a “causal hypothesis” that is waiting to be tested. This again is one of the main tenets of realistic approaches to evaluation research.

5.11

A concluding framework

We conclude this chapter with a final summary framework of the different evaluation designs discussed in this chapter (Table 5.6). The main propositions of this chapter are captured in the framework: • Evaluation design (in distinction from evaluation methodology) has a distinct logic which is manifested in the different questions that it addresses. • Different evaluation designs study different dimensions of interventions: Clarificatory studies evaluate programme design and conceptualisation; process evaluation (and programme monitoring) studies evaluate programme delivery; impact evaluations evaluate the effects and impact of programmes and make strong claims of causal attribution and outcome evaluations evaluate whether programme outcomes have materialised and to what extent. • The framework captures the design principles of each of these evaluation design types in the last column.

200

Chapter 5 Designing evaluation studies is not easy. Evaluation design is essentially a logical and conceptual exercise where the evaluator must take into consideration a wide range of factors – the nature of the intervention, characteristics of the target group, expectations of stakeholders, resource constraints and many more – when designing a specific evaluation study. But it is arguably the most important task that evaluators have to master. Unless evaluations are well designed, there is little value in executing them in reality. Decisions about evaluation methodology (case selection, measurement, data collection and data analysis) are all conditional on logically and well thought through designs. These, mostly generic applied research methods, have not been dealt with in detail in this book. Table 5.6: Final analytical framework for assessment of evaluation design types Programme dimension

Design logic and types

Design principles

Programme design

Clarificatory evaluation

Clarification of programme logic (goals/objectives/activities/ outputs/outcomes and impact) utilising qualitative conceptual tools such as the Logic Model/Logframe or Outcome Mapping and collaborative and participatory processes with programme staff.

Programme delivery

Process (implementation) evaluation. Programme monitoring studies.

Assessment of the fidelity of programme delivery through close investigation of implementation activities and deliverables (including targets). Reliance on qualitative ethnographic principles and observational and surveillance methods rather than selfreporting.

Programme impact

Experimental logic (logic of the counterfactual)

True experimental designs

Random assignment of subjects or units to experimental or intervention conditions. Experimental and control groups. Before (pre-test) and after (post-test) measures (customised to expected programme outcomes).

Quasiexperimental designs

No randomisation. No control groups but often matched comparison groups. Before and after measures (appropriate to expected programme outcomes). Various qualitative alternative approaches to deal with counterfactual (details in case annexe 6).

Evaluative case study designs (Multiple/ comparative/ longitudinal case study designs)

Case study principles (saturated data-collection using multiple quantitative and qualitative methods). Theory-based qualitative reconstruction and evaluation of programme implementation/delivery including – where possible – some experimental design principles (before and after measures and comparative cases).

Programme Outcome

Outcome evaluations (Case study logic)

201

Programme Evaluation Designs and Methods

References Babbie, E. & Mouton, J. 2001. The practice of social research. Oxford University Press. Bamberger, M. 2012. Introduction to Mixed Methods in Impact Evaluation. Guidance Note 3 on Impact Evaluation. http://www.interaction.org/impact-evaluation-notes. Retrieved: 15 March 2014. Bamberger, M., Rao, V. & Woolcock, M. 2010. Using Mixed Methods in Monitoring and Evaluation: Experiences from International Development. Policy Research Working Paper 5245. Washington, DC: World Bank. Bamberger, M., Rugh, J. & Mabry, L. 2012. Real World Evaluation: Working Under Budget, Time, Data, and Political Constraints. 2nd edition. Sage Publications Inc., Chapter 16. Bickman, L. (ed.). 1990. Advances in Program Theory. New Directions for Program Evaluation, 47. San Francisco: Jossey-Bass. Birckmayer, J.D., & Weiss, C. http://www.uwex.edu/ces/lmcourse/interface/coop_M1_Overview. htmH. 2000. Theory-Based Evaluation in Practice: What Do We Learn? Evaluation Review, 24(4): 407-431. Campbell, D.T. & Stanley, J.C. 1966. Experimental and quasi-experimental designs for research. Chicago: Rand McNally College Pub. Co. Chen, H.T. 1994. Theory-Driven Evaluations. Berkely: SAGE. Chen, H.T. & Rossi, P.H. 1983. Evaluating with sense: The theory-driven approach. Evaluation Review, 7: 283-302. Cook, T.D. & Campbell, D.T. 1979. Quasi-experimentation: design and analysis issues for field settings. Chicago: Rand McNally College Pub. Co. Davidson, J. 2010. Why genuine evaluation must be value-based. http://genuineevaluation.com/whygenuine-evaluation-must-be-value-based/. Retrieved: 18 April 2014. Donaldson, S.I., Graham, J.W. & Hansen, W.B. 1994. Testing the generalizability of intervening mechanism theories: Understanding the effects of adolescent drug use interventions. Journal of Behavioral Medicine, 17: 195-216. De Chaisemartin, T. 2010. Evaluation of the Cape Teaching and leadership Institute. JET Education  Services. Fleisch, B., Taylor, N., Herholdt, R. & Sapire, I. 2011. Evaluation of back to basics mathematics workbooks: A randomised control trial of the primary mathematics research project. South African Journal of Education, 31: 488-504. Greene, J. 2007. Mixing methods in social enquiry. San Francisco: Jossey-Bass. Mackie, J.L. 1988. The Cement of the Universe: A study in Causation. Oxford, England: Clarendon Press. Morell, J.A. 2010. Evaluation in the Face of Uncertainty: Anticipating Surprise and Responding to the Inevitable. New York: Guilford Press. Morris, L.L. & Fitz-Gibbon, C.T. 1978. How to Measure Program Implementation. Berkeley: Sage. Mouton, J. 2009. Assessing the impact of complex social interventions. The Journal of Public Administration, 44: 849-865. Owen, J. & Rogers, P. 1999. Program Evaluation: Forms and Approaches. St Leonards, NSW: Allen  &  Unwin. Patton, M.Q. 1997. Utilization-focused evaluation: The new-century text. Thousand Oaks, CA: Sage. Pawson, R. & Tilley, N. 1997. Realistic Evaluation. London: Sage Publications Perrin, B. 2012. Linking Monitoring and Evaluation to Impact Evaluation. Guidance Note 2 on Impact Evaluation. http://www.interaction.org/impact-evaluation-notes. Retrieved: 15 March 2014. Rabie, B. & Burger, A. 2014. Advanced Monitoring And Evaluation: Evaluation Research And Analysis Slide Presentation. Bellville: School of Public Leadership. Rogers, P. 2008. Using Programme Theory to Evaluate Complicated and Complex Aspects of Interventions. Evaluation, 14(1): 29-48. Rogers, P. 2012. Introduction to Impact Evaluation. Guidance Note 1 on Impact Evaluation. http:// www.interaction.org/impact-evaluation-notes. Retrieved: 15 March 2014. Rogers, P.J. 2000. Program Theory in Evaluation challenges and opportunities. New Directions for Evaluation, 87. Jossey-Bass.

202

Chapter 5 Rogers, P.J., Petrosino, A., Huebner, T.A. & Hacsi, T.A. 2000. Program theory evaluation: Practice, promise, and problems New Directions for Evaluations, 87. Ross, H.L., Campbell, D.T. & Glass, G.V. 1970. Determining the social effects of a legal reform: The British breathanalyzer crackdown of 1967. American Behavioral Scientist, 494-509, in Rossi, P.H., Freeman, H.E. & Lipsey, M. 2004: 294. Rossi, P.H., Freeman, H.E. & Lipsey, M. 2004. Evaluation: A Systematic Approach. 7th edition. California: SAGE. Rugh, J. & Bamberger, M. 2012. The Challenges of Evaluating Complex, Multi-component Programs, in European Evaluation Society Newsletter: Evaluation Connections. May 2012: 4-7. Scriven, M. 1991. Evaluation Thesaurus. Berkeley: Sage. Scriven, M. 1993. Hard Won Lessons in Program Evaluation. New Directions for Program Evaluation, 58. San Francisco: Jossey-Bass. Stern, E.N.S., Mayne, J., Forss, K., Davies, R. & Befani, B. 2012. Broadening the range of designs and methods for impact evaluation. DFID Working Paper 38. London. www.dfid.gov.uk. Retrieved: 28 March 2013. Suchman, E. 1967. Evaluative research: Principles and practice in public service and social action programs. New York: Russell Sage. Weiss, C.H. 1997. How can theory-based evaluations make greater headway? Evaluation Review 21: 501-24. Weiss, C.H. 1998. Evaluation for decisions: Is anybody there? Does anybody care? Evaluation Practice, 9(1): 5-19. Weiss, C.H. 1999. The interface between evaluation and public policy. Evaluation, 5(4): 468-486. White, H. 2009. Some Reflections on Current Debates in Impact Evaluation. 3ie Working Paper 1. http://www.3ieimpact.org/media/filer/2012/05/07/Working_Paper_1.pdf. New Delhi: 3ie. Retrieved: 18 April 2014. Woolcock, M. 2009. Toward a plurality of methods in project evaluation: a contextualised approach to understanding impact trajectories and efficacy. Journal of Development Effectiveness, 1(1):1-14.

203

Chapter 6 Indicators for Evidence-based Measurement in Evaluation Babette Rabie 6.1 Introduction Chapter 1 explained that evaluation is all about measuring the nature and extent of change in societal conditions, frequently attempted to improve those conditions in some or other way and therefore to achieve developmental goals. Change and development are complex societal phenomena and are always subject to normative interpretation and assessment. The complexity of societal change and development makes accurate measurement of these phenomena difficult. Evaluators therefore have to consider very closely which measuring approaches and instruments they are going to apply in order to reflect, as accurately as possible, on the evaluand. The paradigm shift from opinion-influenced judgement to evidence-influenced and even evidence-based judgement, was summarised in Chapters 1 and 2 that dealt with the nature and historical development of systematic policy, programme and project evaluation. Chapter 3 referred inter alia to the importance to identify logical sequences of cause and effect between problems to be addressed through policy, programme and project interventions, the envisaged goals to be achieved, the resources, activities and outputs and their results or consequences in the end (the so-called “theories of change” structured in a systematic programme logic format). Chapter 4 summarised the different conceptual approaches that are implicit in choices of what to evaluate, and why and how it should be done. Chapter 5 dealt with different evaluation designs and methodologies in order to maximise the scientific, credibility, rigour, validity and therefore credibility of the  evaluation. These discussions all emphasised the importance of as accurate as possible or desirable measurement of the evaluand in order to be able to make evidence-influenced or evidencebased judgements about the merit or worth of the intervention concerned, instead of subjectively informed opinion-influenced judgements that might reflect one or more deliberate or subconscious biases of some sort. In addition to the considered use of the conceptual tools referred to above, the use of indicators as another measuring tool has achieved wide acceptance in evaluation circles. Policy, programme and project goals are not always directly measurable (e.g. alleviating poverty, improving quality of life, empowering a community, achieving equitable developmental goals, democratising society, etc.). Indicators are used to measure change and progress through more simplified

204

Chapter 6 and indirect variables associated with the change process. Indicators articulate the most salient aspects of the envisaged change that can be regarded as evidence of the success or not of a policy, strategy, programme or project. The process of simplification is both an advantage and a drawback of indicators – whilst it provides a means of measuring change and progress that may otherwise be too complex, simplifying the desired performance presents the inherent danger that the complexity of the development progress may be reduced to singular measures, which fail to capture or properly present the complexity of the development change desired. This chapter discusses indicators as progress measurement tools, and will differentiate between alternative measurement instruments, levels and types of indicators, as well as important considerations in designing indicators versus selecting from pre-designed indicators to ensure a balanced, reliable indicator set. Finally some international indicator initiatives will be summarised before concluding with the importance of data management and the value and limitations of indicators in practice.

6.2

Definitions and conceptualisation

Indicators are measurement instruments used to track and assess progress in the attainment of objectives and outcomes. Miles (1989:16) defines indicators as “a measuring instrument used to give a concrete, measurable but indirect value to an otherwise immeasurable, intangible concept”. Atkinson and Wellman (2003:6) regard them as “pointers” that show whether goals are achieved, whereas Smith (in Mathison, 2005:199) regard them as signifying performance by describing what “can be empirically observed that will signal the occurrence of the aspect or facet of the evaluand under study”. Indicators are either quantitative or qualitative variables that provide a simple and reliable means to measure achievement, to reflect the changes connected to an intervention, or to help assess the performance of an organisation against the stated outcome (Kusek & Rist, 2004:65). Indicators may be internal, subjective perceptions or external, more objective measurements (Miles, 1989:17). Indicators may be used to track and evaluate the “impacts, outcomes, outputs, and inputs” of a project, programme or policy during and after the implementation process (Mosse & Sontheimer, 1996:1). Indicators are broken down, partial dimensions of performance that we measure or observe to determine whether, and to what extent, progress is being made towards adopted goals (UNDP, n.d.:introduction). In this, they provide a definition of success in the specification of how success will be measured or determined. Thus, while they are useful and critical in the evaluation process, they should be used with caution in order to avoid criticism related to possible normative bias or “quantificationism” (Carley, 1981:89,173). Indicators in practice include three types of measuring instruments. These are direct measures, observable or verifiable change, and the measurement of associated variables (proxy indicators). • Direct measures assess the performance of the evaluand directly. For example, if we wish to assess the performance of a housing programme, at output level we may adopt an indicator that measures the “number of houses constructed”. This indicator directly measures the aspect that we are interested in (number of houses constructed) and as such

205

Indicators for Evidence-based Measurement in Evaluation is fairly reliable, in providing a valid measurement as it requires little interpretation of the relationship between the indicator and evaluand, and also fairly objective, in that various persons required to populate the indicator (count the houses) will return the same value. • Observable or verifiable change provides qualitative observations or verification that change has occurred as a result of the evaluand. For example, one may observe the change from prior conditions after implementing programmes that clean up refuse from vacant land or town streets, remove pollution or garbage from a water resource, remove graffiti from structures or remove alien vegetation from river beds. These observable changes, where evidence of the change may be a pre and post photograph demonstrating the effect of the evaluand, are often not readily quantifiable (and direct measures becomes ill-suited) and complex to capture through associated variables (where the complexity of designing composite measures may not be worth the cost in times of money and data to provide evidence of the change that has occurred). • Associated measures or proxy indicators measure a variable associated with an otherwise immeasurable, intangible concept. For example, the quality of constructed houses refers to an intangible concept which cannot be measured directly. However, there is an associated relationship between the quality of the structure and adherence to building standards. Therefore, an associated variable that may reflect the quality of the house is “adherence to building standards”. The accuracy of proxy indicators depend on the strength of the association between the evaluand that we are interested in (quality of the house) and the adopted indicator (adherence to building standards). If the assumed associated relationship between the proxy indicator and the evaluand is incorrect, improvements reflected by the indicator will not be indicative of improvement of the evaluand. Box 6.1 demonstrates some of the limitations in defining the associated relationship between the indicator and the evaluand. Box 6.1: Measuring the perception of safety and security in a community Possible indicator or measurable objective

Problem with the indicator / measurable objective

Changes in crime statistics

Crime statistics reflect on reported crime

Regularity of police patrol  vehicles

May afford a greater sense of security, but may also lead to a greater sense of non-security (why are more regular patrols necessary/priority in this  area?)

Crime deterrent infrastructure and private  security

Is crime deterrent infrastructure an indication that people feel safe, or not safe (hence they invested in this infrastructure)? Is living in a security estate the result of not feeling safe, a status symbol, or an investment decision?

Insurance premiums

Insurance may provide an indication of the risk perception of a community, but in lower income communities the lack of insurance cannot be associated with a lack of risk perception.

Size of the dogs

Whilst bigger dogs may be associated with greater security, having big dogs may also just mean that the person likes big dogs.

People walking about at night

Whilst this may provide an indication of a perception of safety, in poorer communities walking may be a necessity even when people do not feel that it is safe to do so.

Low crime statistics does not imply greater perception of safety

(Source: Author)

206

Chapter 6 Whilst some may prefer direct measures to other more indirect measuring instruments because of the advantages in terms of objectivity and validity, all of these measurement instruments are useful and the “best” measurement instrument should be determined by the nature of the evaluand, rather than preference for a specific type of tool. In concluding the conceptualisation of indicators, it is also useful to discuss the difference between indicator and measurable objectives. Where the measurable objective describes what is to be measured in terms of the evaluand or envisioned change, it may not yet be formulated as a measurable concept. For example, the consequence (outcome) of a housing programme may be an improvement in living conditions (measurable objective), whereas the measurement tool will be “the percentage improvement against the Human Development Index”. This differentiation however may become less distinct when using qualitative indicators which relies on the verification or observation of the change, as described in the measurement objective. With qualitative indicators, a clear description of the observable change or the process of verification is required to ensure accurate measurement that change has indeed occurred as envisioned. Indicators should also be differentiated from standards that provide a universally or widely accepted, agreed upon, or established means of determining what something should be; as well as benchmarks that provide a reference point against which trends, progress or performance may be measured or compared. Measurable objectives, standards and benchmarks provide an indication of what should be measured, but is not necessarily formulated in a measurable, verifiable or observable manner. Finally, indicators are differentiated from baselines and targets, where the indicator is always the value-neutral measurement instrument, with baselines providing the measured value against the indicator prior to the intervention, and targets providing the desired measured level against the indicator at a specified future point.

6.3

Developing and formulating indicators

The process of developing indicators may be described in three steps. First, the adoption of an explicit programme theory of change and its concomitant programme logic model for the policy, programme or project. Secondly, the identification of a list of alternative indicators that may be used to measure or verify performance at the various stages of the policy, programme or project through a consultative process. The final step entails assessing the potential indicators against pre-determined criteria to select the most appropriate indicators that will ensure accurate, valid and reliable measurement of performance and change.

6.3.1 Clarifying the programme theory of change The programme theory, as explained in Chapter 3, provides “a systematic and visual way to present and create understanding of the relationships among the resources of a program, its activities, and the changes or results it wishes to achieve” (Kellogg Foundation, 2004:9). Once the underlying programme theory has been clarified and tested as valid, performance indicators should be established for all levels of the system so that everything from inputs, activities, output through to outcomes and impact may be measured.

207

Indicators for Evidence-based Measurement in Evaluation As explained in Chapter 3, the programme logic model links the outputs, activities and resources to the outcomes and impacts of the policy, programme or project in terms of the applicable theory of change. Figure 6.1 depicts in a visual manner how a list of expected resource requirements, activities, outputs and outcomes precedes the adoption of specific indicators with corresponding targets that will be used to assess progress and performance as the intervention proceeds. Figure 6.1: The programme logic chain

Programme Action - Logic Model Inputs

Outputs Activities

Priorities Situation Needs and assets Symptoms versus problems Stakeholder engagement

Consider: Mission Vision Values Mandates Resource Local dynamics Collaborators Competitors

Intended outcomes

What we invest Staff Volunteers Time Money Research base Materials Equipment Technology Partners

Participation

What we do

Who we reach

Conduct workshops, meetings Deliver services Develop products, curriculum, resources Train Provide counselling Facilitate Partner Work with media

Participants Clients Agencies Decisionmakers Customers

Outcomes - Impact Short Term

Assumptions

Long Term

What the short term results are

What the medium term results are

What the ultimate impact(s) is

Learning

Action

Conditions

Awareness

Behaviour

Social

Knowledge

Practice

Economic

Attitudes

Decisionmaking

Civic

Skills Satisfaction

Medium Term

Opinions Aspirations

Policies

Environmental

Social Action

Motivations

External Factors

Evaluation Focus - Collect Data - Analyse and Interpret - Report

(Source: Taylor-Powell, Jones & Henert 2003)

6.3.2 Selecting indicators Levels of indicators Different indicators provide information on different aspects of the intervention, and at various stages of the intervention life-span. Input and process (activity) indicators allow managers to track performance; detect problems and take corrective action; and predict ultimate success, even in the early planning and implementation stages through the early feedback of information. Indicators for output, outcome and impact are essential to affirm the success of the intervention and to shape the form of future interventions based on what has been proved to succeed or fail in previous interventions, but the information is often only available in retrospect. Measuring the different stages of the programme cycle separately provides the following  advantages: a. It allows for the monitoring of progress to detect early deviance, or provide early indication of possible success or failure at the end (e.g. insufficient resources provide

208

Chapter 6 early indications of probable output and outcome failures; whilst good public participation and empowerment processes during the implementation process may provide an early indication of empowerment as an end-goal). This enables better performance management through early response to potential failures and enhances possible successes. b. It enables the evaluator to differentiate between programme failure (where the inputs and activities do not produce the planned outputs) and theory failure (where the produced outputs do not lead to the planned outcome or impact change). This allows for improvement of the theory of change by identifying the causes of programme  failure. c. Data at all levels of the system enables the evaluator to test for causality and attribution of outcome results to the preceding outputs and activities and to identify the effect of externalities. If the programme outcomes are caused by the outputs, one should see an increase in outcomes following an increase in outputs. Data for both outputs and outcomes over the time of implementation is required in order to assess this  relationship. The assessment of public management performance therefore comprises of the monitoring and evaluation of implementation and outputs (the end result of administrative efforts to convert resources into concrete deliverables/results through specific actions), and the assessment of the consequences (outcomes and impacts) of these activities and concrete milestones (to determine to what extent the strategic purposes of the programme, policy or project have been achieved). It also comprises an assessment of the extent to which these deliverables and their outcomes, as well as the processes through which they came about, comply with the normative and empirical standards set for them. The nature and focus of indicators at the various levels of the programme logic of the theory of change concerned may be summarised as follows: • Input indicators: Measure the financial, physical, human, information and time resources that are fed into a project. “Input indicators measure the quantity (and sometimes the quality) of resources provided for project activities” (Mosse & Sontheimer, 1996:11). More intangible resources for a programme may also include political, strategic and community support (buy-in) and the quality of the strategies and plans that will guide implementation. Input indicators provide reflection on the feasibility of the envisioned theory of change, and specify what is required before the outputs and outcomes depicted in the programme theory of change can realistically be achieved. • Process indicators: Process indicators focus on how a programme achieved its goals (Chen, 2005:10). These indicators track the conversion of resources to policy outputs and outcomes and reflect on efficiency, productivity compliance to good governance principles and normative considerations of the conversion processes. Implementation processes may be evaluated qualitatively with a description of the programme activities and perceptions, or quantitatively through client satisfaction and perception surveys (Görgens & Kusek, 2009:394). • O utput indicators: Measure the product or direct result of a specific process and project. “Output indicators measure the quantity … and quality of the goods or services creates or provided through the use of inputs” (Mosse & Sontheimer, 1996:11). Measuring output

209

Indicators for Evidence-based Measurement in Evaluation indicators are essential for administrative accountability to higher management levels and oversight by the legislature and other institutions. • Outcome indicators: Refer to the direct consequence or result of an activity, project or process (DPLG, 2000:8) including the objective short-term changes as a result of the project concerned, as well as the subjective reaction of the client (see also Poate, 1997). “Outcomes refer to direct consequences or results that follow from an activity or process” (Reese & Fasenfest, 1997; Görgens & Kusek, 2009:394), therefore outcome indicators focus on those changes observed under or by the beneficiaries of the programme and is ascribed to the intervention that was undertaken. “Outcome indicators measure the quantity and quality of the results achieved through the provision of project goods and services” (Mosse & Sontheimer, 1996:12). • Impact indicators: Reflect the longer term, broader societal implications and development outcomes of an intervention in the long term (DPLG, 2000:8; Cloete, Møller, Dzengwa & Davids, 2003:25), not necessarily limited to the direct beneficiaries of the programme. Distinction may be made between demographic, geographic, environmental, social, organisational, technological, financial and economic impact indicators. Impact indicators are the most difficult to measure, because of lag times and difficulty in accurately ascribing the affected change to particular interventions. As impact is seldomly attributable to single programmes, the combined effects of a number of programmes may be measured. For this reason, and the high cost, impact evaluations are usually commissioned by national authorities (Görgens & Kusek, 2009:394). The World Bank however emphasises the importance of impact indicators by recommending for its development projects that no more than a dozen indicators are measured, at least where half of which should be impact indicators measuring against major development objectives (Mosse & Sontheimer, 1996:18). • Results indicators: Measure the consequences of activities in relation to objectives at project or programme level (Mosse & Sontheimer, 1996:11; Boyle & Lemaire, 1999:25). Results indicators may include output, outcome and impact indicators. The differing nature and foci of the above indicators are again summarised in Table 6.1. The above indicators at the various programme levels vary in degree of difficulty and cost implications to measure accurately. Input and output indicators often track quantifiable, tangible resources and products with data readily available in the administrative domain and are therefore simpler to measure. Process indicators relating to efficiency and productivity of the resource conversion process are generic between different projects, and can draw on the research of efficiency and financial cost-effectiveness studies. Other process indicators track adherence to milestones, which relies again on data available in the administrative domain, but at times more complicated when service delivery is decentralised or sub-contracted. Outcome and impact indicators however often focus on intangible, unquantifiable improvements that requires composite or multiple indicators to reflect various performance aspects (Rossi, Lipsey & Freeman, 2004:214). These indicators mostly track changes outside of the administrative domain, and therefore availability of data becomes a greater constraint, exasperated by the time-lag in the manifestation of measurable results. Outcomes and impacts depend also on the contribution of multiple

210

Chapter 6 role players from different units or departments which complicate the attribution of results through the introduction of more externalities (see Smith in Modell & Grönlund, 2007:277). Table 6.1: Focus of evaluation indicators Strategic planning level

Policyplanning focus

Programme level

Evaluation focus

Evaluation indicator area

Time frame

Vision

Policy (what to do?)

Intangible impacts/ Outcomes

Final Multisectoral/ Integrated goals/ consequences

Empowerment, growth, equity, redistribution, democracy, stability, sustainability, etc.

Long term

Mission

Strategies (how to do it?)

Intangible outcomes and concrete progress

Intermediate Sector-specific results/goal achievement

Improved Medium term education, health, economy, community, politics, culture, environment, etc.

Programmes/ Projects

Operational plan (what deliverables?)

Concrete outputs: Deliverables/ products/ milestones

Quantity, diversity, quality

More and better ranges of houses, pass rates, clinics, roads, technology, harvests, jobs, electricity

Short to medium term

Activities

Process plan (what, who, when, how?)

Resource conversion processes

Relevant, appropriate levels of work

Efficiency, effectiveness productivity, scheduling, participation, timeliness, costs, benefits

Short to medium term

Problems and Resources

Required resources for problem (with what?)

Inputs: Concrete ingredients to address problem

Optimal use of resources

Availability, feasibility, risk, adequacy of funds, people, supplies, time, priorities, information

Short to medium term

(Source: Cloete & De Coning, 2011:208)

“These technical difficulties imply that outcome indicators are often associated with considerable ambiguity, which opens up the possibility of conflicting interpretations of the value of public service provision, which may contribute to politicizing control practices, especially because critical scrutiny of agency effectiveness may lead to questioning of their roles in society” (Modell, Pollitt, Stewart & Walsh summarised by Modell & Grönlund, 2007:277). The reality in public sector performance, however, is that “both (administrative) outputs and outcomes provide useful and important definitions of public value, and overemphasis on either can produce dysfunctional results”

211

Indicators for Evidence-based Measurement in Evaluation (Norman, 2007:538). The strengths and weaknesses of output and outcome indicators of public sector programmes are summarised by Norman in Table 6.2. Table 6.2: Strengths and weaknesses of outputs and outcomes Outputs

Outcomes

Strengths

Clear, measurable statements of results, defined by quality, quantity and timeliness indicators. They can be clearly linked to the ability of a particular organisation and chief executive to achieve and provide a “no excuse approach” to accountability of results rather than inputs.

Purpose-orientated descriptions of the results, which take a broad and long-term perspective. They are potentially inspirational and motivational and sufficiently broad to incorporate contributions from a number of  organisations.

Weaknesses

The focus of measurement can shift toward that which can be measured and easily audited. The output can become the goal in the process of goal displacement, at the expense of longer term and more meaningful achievements.

Outcomes can become so broad that they can literally mean all things to all people, with achievements being very difficult, if not impossible, to measure. Outcome statements can become window dressing that prevent outsiders from assessing how well an organisation is doing. (Source: Norman, 1997:538)

The difficulty in measuring outcomes at times accentuates the importance of accurate output measurement, as an early indication of the probability of meeting the outcome. “Outcomes are at the top of the staircase ... and can only be tackled effectively once the core business, as defined by outputs, is under control. Achievement of outcomes relies on effective delivery of outputs and the maintenance processes of equity, consistency, and integrity ... (While) accountability for outputs is the bottom line of public sector management, the real gains are to be made from focusing on the top line of outcomes” (Hughes in Norman, 2007:545). Box 6.2: Focus on Efficiency of Effectiveness A 2005 Canadian study on local government performance measurement by Pollanen found that both efficiency indicators (measuring process and output indicators) and the effectiveness indicators (measuring outcome and impact) are used in Canadian municipalities. While difficulties regarding accurate measurement of outcomes persist, the study indicated an increase in the adoption of effectiveness measures by municipalities when including on performance, both internally and to the external public. “These findings imply that both types (effectiveness and efficiency) of measures are now regarded as legitimate and potentially useful tools for various managerial and reporting purposes, and highlight the need to focus specifically on the further development of meaningful effectiveness measures” (Pollanen, 2005:15-16). The need for additional effectiveness/outcome indicators was especially related to the more intangible development deliverables of local governments, as the study found that effectiveness measures were mostly developed and used for engineering-related services (e.g. road maintenance, waste management and water supply) which produce physical results, than for the softer (social) services that present a greater challenge with regard to obtaining accurate outcome data. In the softer issues, the use of efficiency measures compared to effectiveness measures was found to be significantly higher. Source: Pollanen (2005:12)

Types of indicators In adopting indicators at the various levels of the intervention, it is also useful to consider the nature of the adopted indicators and the data required for measurement purposes,

212

Chapter 6 so as to enhance the validity of measurements and ensure a balanced perspective of the  evaluand. When referring to the nature of indicators, the most common distinction is between quantitative and qualitative indicators, where quantitative indicators provide feedback on the actual progress that has been made in terms of numbers or percentages and qualitative indicators capture insights, attitudes, beliefs, motives and behaviours that describe perceptions of the progress and other intangible results (Kusek & Rist, 2004:69). Whilst qualitative indicators are often based on more subjective interpretations of performance results, they often provide richer information than the measurements provided by quantitative information that reflect only on aspects that are objectively measurable. Some of the useful distinctions in the nature of indicators are summarised below: • Quantitative indicators versus Qualitative indicators: Quantitative indicators draw on numerical data such as numbers, percentages or rations to answer questions relating to “what ...?” or “how much ...?” (UNDP, n.d.:signals and scales; Atkinson & Wellman, 2003:11-12). Qualitative indicators are based on narrative descriptions or categories, such as true/false, existence (yes/no) and classification (high/medium/low). They measure perceptions, attitudes and preferences typically linked to “why ...?” questions (UNDP, n.d.:signals and scales; Atkinson & Wellman, 2003:11-12). • Objective versus subjective indicators: Indicators can measure either subjective opinion or objectively verifiable levels of achievement. In illustration, a family’s “quality of life” can subjectively be measured by their own perceptions of their quality of life (e.g. the Australian Unity Well-being Index), or objectively against education, health, cultural, social, income and expenditure levels, as well as the accommodation, infrastructure and other facilities available to them (e.g. StatsSA survey data) (see also Schneider, 1976: 297). • Static versus dynamic indicators: Static indicators measure actual performance at a particular point in time (e.g. crime rate in the second trimester) while dynamic indicators measure the degree of change between two static indicator measurements over time (e.g. decrease in crime from the first to the second trimester). Dynamic indicators therefore rely on the availability of data from static measures, but provide a more direct indication of whether desired trends envisioned during planning is realising. • Single versus composite indicators: Single indicators focus on one element of performance only (e.g. matriculate pass rate, income level) or within one policy sector only (e.g. percentage of the population that have access to clean water). Composite indicators calculate the combined scores on a basket of indicators from one or multiple policy sectors into a single measure or combined result (e.g. the Human Development Index, Livelihood Index, Corruption Index, Whole School Evaluation, the UN National Accounting System). Composite indicators are useful in collating performance information, allowing managers to track the combined result of multiple performance aspects in a singular measure, which helps avoid information overload and enables the manager to identify the most critical interventions that require urgent attention. The disadvantage however is that combined scorings may hide underperformance in particular performance aspects, and hence not reach the attention of the manager. A combination of singular and composite indicators may help to overcome this dilemma.

213

Indicators for Evidence-based Measurement in Evaluation • Effectiveness, efficiency and efficacy indicators: Effectiveness indicators measure “the ratio of outputs (or resources used to produce the outputs) per unit of project outcome or impact”, thus the extent to which activities and output contribute towards the intended outcome or impact and whether available resources are being applied appropriately (Mosse & Sontheimer, 1996:14). Efficiency indicators measure “the ratio of inputs needed per unit of output produced” and include service quality indicators such as timeliness, turnaround time, accuracy, thoroughness, accessibility, convenience, courtesy and safety of services delivered (Mosse & Sontheimer, 1996:14; Poister, 2004:100). Efficacy indicators demonstrate “how well the results at one level of project implementation have been translated into results at the next level: the efficiency of inputs, effectiveness of project outputs, and sustainability of project impact. They measure a project’s efficacy in achieving its objectives, rather that its results” (see Mosse & Sontheimer, 1996:14). • Comprehensive indicators versus performance indicators: Performance indicators adopt an internal administrative perspective by measuring inputs and outcomes controlled by government agencies. In contrast, comprehensive indicators adopt a larger scope by measuring many aspects of the economy, environment and society. These indicators allow capturing aspects of quality of life and sustainability that may be the assumed, but untested, end result of administrative performance indicators (Greenwood & Holt, 2010:145). In this regard, sustainability indicators are adopted to measure “the persistence of project benefits over time, particularly after project funding ends” (Mosse & Sontheimer, 1996:15). A focus on sustainability necessitates that both policy products and processes be subject to systematic, integrative evaluation (Poate, 1997:53; Bell & Morse, 2003:20). Prominent international programmes that promote the measurement of sustainable development include the UN Commission on Sustainable Development and the UN Millenium Goals and Indicators Project. • Data-driven versus theory-driven indicators: With the data-driven approach, the availability of data to populate the indicator is the central criterion, which has the advantage that all indicators can be populated with accurate baseline data at the start of the intervention. In contrast, the theory-driven approach focuses on identifying the best possible indicators on the basis of the programme logic, regardless of the availability of the data at the start of the intervention (Niemeijer, 2002:91). In the absence of baseline data, initial establishment of progress is impossible, but in subsequent measurements, the indicator may provide more accurate and useful information. Niemeijer concludes that both approaches have the drawback that they “are based on assumptions on causeeffect relations and correlations that may not always be justified” (Niemeijer, 2002:101). One transparent way of overcoming this problem is to reduce the number of indicators to include only those for which accurate data (and theory assumptions) are available (Niemeijer, 2002:101). • Self-developed versus pre-designed indicators: Self-developed indicators are adopted with a bottom-up approach to provide feedback on the specific performance information need requirements of the policy, programme or project decision makers (Kusek & Rist, 2004:73-74). In contrast, decision makers may opt to select indicators from those already used in the sector, thereby benefiting from the lessons learned elsewhere and expediting the process of developing reliable indicators. “Frameworks of objectives, indicators and

214

Chapter 6 targets have been standardised for some public service sectors, including agriculture, education and health because if everybody use(s) the same indicators it is possible to compare performance between departments and over time” (PSC, 2007:51). Duplication of existing activities is a costly and wasteful practice. Adopting and adapting indicators that have already been proven as reliable is therefore recommended wherever it is feasible to do so. Predesigned indicators also offer the advantage of benchmarking and comparison between projects, programmes and policies by comparing data from the same indicators. Selfdesigned indicators however become critical for performance aspects that are deemed important, but for which pre-designed indicators are not readily available. The indicator classification system proposed by Marais, Human and Botes (2008) also provides for a differentiation of indicators based on the purpose of the indicator. As such, they refer to descriptive indicators that provide a description of facts; evaluative indicators that draw conclusions about the relationship between indicators; information indicators that describe a situation through a time series; predictive indicators that support models and systems; system indicators that combine individual measures with technical and scientific insight; and finally performance indicators that provide a tool for comparison by combining descriptive indicators with targets and target dates (Marais, Human and Botes, 2008:382). The nature of the various types of indicators focuses attention on alternative performance aspects in measurement. As the Public Service Commission rightly concludes, “indicators are almost always proxies of the outcomes or concepts they measure … (and) the value of indicators lies in the fact that they are expected to correlate with the desired impact/outcome, but the correlation is rarely perfect” (PSC, 2007:Section 5.3). Adopting indicators with alternative foci enable a more balanced perspective on the performance of the evaluand. In this regard, it is perhaps useful to use indicators not in isolation, but rather in sets as indicators help to provide different perspectives and angles on the subject being evaluated (PSC, 2007:Section 5.3). Limiting subjectivity through Rubrics One of the main challenges with indicators is that their formulation is often fragmented, and as such leaves the interpretation of the indicator somewhat subjected to the interpretation of data capturers or users of the indicator. There is also great subjectivity in the actual rating against the indicator. The traditional 3-, 5- or 7-point rating scales very seldom provide evaluators with an unbiased and clear understanding of the exact meaning of each point on the rating scale. This exposes the evaluation process to evaluation constraints associated with individual bias, with the individual strictness or leniency of respective evaluators complicating the synthesis of data from multiple sources to obtain an overall picture of combined progress and performance. Rubrics provide descriptions of what the evidence should look like at different levels of performance, thereby enabling the evaluator to be systematic and transparent in performing the assessment (Davidson, 2011; Davidson, 2013). They comprise short descriptions of alternative evidence “scenarios”, where the evaluator then needs to match the actual presented evidence with each description to select the best fit. The rubrics thereby provide a “definition of how good is good and how good is good enough” (Davidson, 2011). Rubrics

215

Indicators for Evidence-based Measurement in Evaluation are normally developed to fit the requirements of a specific evaluation, but may also be phrased to allow application to different situations, as demonstrated in the “Criteria for Rating Answers to Key Evaluation Questions’ of the New Zealand Qualifications Authority (see example in Table 6.3). Table 6.3: Criteria for rating answers to key evaluation questions Rubric 1: Criteria for Rating Answers to Key Questions Excellent

• • •

Performance is clearly exemplary in relation to the question. Very few or no gaps or wekanesses. Any gaps or weaknesses have no significant impact and are managed effectively.

Good

• • •

Performance is generally strong in relation to the question. Few gaps or weaknesses. Gaps or weaknesses have some impact but are mostly managed effectively.

Adequate

• • •

Performance is inconsistent in relation to the question. Some gaps or weaknesses have impact, and are not managed effectively. Meets minimum expectations/requirements1 as far as can be determined.

Poor

• • •

Performance is unacceptably weak in relation to the question. Significant gaps or weaknesses are not managed effectively. Does not meet minimum expectations/requirements.1 (Source: New Zealand Qualifications Authority, 2014)

While eliminating bias and uncertainty, rubrics also may serve to guide the evaluand towards better performance. Through a self-assessment process, the evaluand may use the rubric to identify areas of improvement, and to set targets to migrate to the next evaluation level. Rubrics may therefore be regarded as “ladders of change” (Davidson, 2011). The South African Statistical Quality Assurance Framework (StatsSA, 2010) serves as a guideline to allow the producers of statistics to assess the quality of such statistics. The excerpt from the document (See Table 6.4) demonstrates how the evaluand, through a process of self-assessment, may use the framework to systematically improve the quality of produced statistics by incorporating criteria from the higher levels into the process of data production.

6.3.3 Refining indicators The final step in developing indicators involves the review of adopted indicators to promote the relevance, validity and reliability of the indicators in reflecting performance of the evaluand. Indicators are seldom perfect after the first attempt, with problems characterising measurement of both newly designed indicators, as well as those adopted from other contexts. Once the indicator is adopted, managers are often faced with implementation problems such as resource constraints in gathering and interpreting the required data, different interpretations of the same indicator by alternative data collectors presenting contradictory results, or unexpected findings where the indicator fails to present properly the actual performance of the evaluand, perhaps as a result of a limited performance focus. This necessitates revisiting indicators in order to refine their ability to measure performance accurately.

216

Chapter 6 Table 6.4: Selected SASQAF Indicators and Standards for quality Statistics Indicator

Standards

Assessment levels Quality Statistics Level 4

Acceptable Statistics Level 3

Questionable Statistics Level 2

Poor Statistics Level 1

2.1 Have both the internal user of the data been identified?

2.1.1 An up-to-date user database must exist.

An up-to-date user database exists.

A user database exists but is not upto-date.

Users are known but not recorded in a database.

Users have not been identified.

2.2 Is there a process to identfiy user needs?

2.2.1 A process to identify user needs must exist.

A process to identify user needs exists.

N/A

N/A

A process to identify user needs does not exist.

2.3 Are user needs and the usage of statistical information analysed?

2.3.1 A report containing the findings of user needs and the usage of statistical information must be available.

A report containing the findings of user needs and the usage of statistical information is available.

User needs and the usage of statistical information are analysed, but a report is not available.

One of user needs or usage of statistical information is analysed, but a report is not available.

User needs and usage of statistical information are not analysed.

2.4 Changes are made as a result of user needs assessments.

2.4.1 The results of the user needs assessment must influence decisions on the statistical value chain of the survey or on administrative data collection systems, where feasible. Documented reasons for not implementing user needs must be provided as feedback to users.

The results of the user needs assessment influence decisions on the statistical value chain of the survey or administrative data

The results of the user needs assessment influence decisions on the statistical value chain of the survey or administrative data collections systems, where feasible. Documented reasons for not implementing user needs are not provided as feedback to users.

The results of the user needs assessment do not ifluence in any way decisions on the statistical value chain of the survey or on administrative data collection systems. Documented reasons for not implementing user needs are provided as feedback to users.

The results of the user needs assessment do not influence in any way decisions on the statistical value chainof the survey or on administrative data collection systems. Documented reasons for not implementing user needs are not provided as feedback to users.

(Source: StatsSA, 2010:14-15)

217

Indicators for Evidence-based Measurement in Evaluation Criteria that may be used to objectively assess developed indicators are: • indicators must reflect the intent of the programme (Posavac & Carey, 1997:44-46) • “performance indicators should be clear, relevant, economical, adequate and monitorable” (Kusek & Rist, 2004:68) • indicators should be relevant, significant, original, legitimate, reliable, valid, objective, timely and usable (Cloete et al., 2006:259) and should be manageable by those responsible for data collection to sustainably provide accurate information • indicators “should be as responsive as possible to the programme effects” by focusing specifically on the target population of the programme when obtaining data for M&E (Rossi, Lipsey & Freeman, 2004:225) • indicators should be valid to the actual outcome, balanced, sensitive to change, equal to all groups, practical and time- and effort-efficient, clear and owned by stakeholders and should not be open to manipulation (Vera Institute of Justice in the PSC, 2007: Section 5.5) • indicators must respond to criteria for data quality which includes among other variables that the indicator is accessible and affordable, comparable (standardised), consistent and reliable (relating to the stability of the measurement, measures the same thing in the same way in repeated tests), credible (defined as trustworthy or believable), measurable, relevant (in that it measure what matters) and valid (measuring what it is supposed to measure) (Morra Imas & Rist, 2009:293-294) indicators should also be regarded as useful to various stakeholders (including • an ability to measure progress towards a goal, compel interest and excite, focus on resources and assets, focus on causes, not symptoms, so as to predict future problems, make linkages and relationships between issues, relate to the whole community and be understandable) (Baltimore Neighborhood Indicator Alliance, 2006:14) • indicators should be “relevant to the … sectoral development objectives of the project and, if possible, to the overall country objectives” (Mosse & Sontheimer, 1996:18) • indicators must enjoy buy-in from relevant stakeholders and must differentiate between aspects of the programme that can be influenced by the staff and those indicators reflecting performance beyond the control of staff (Posavac & Carey, 1997:44-46). Developed indicators may be tested against the following checklist from the United Way of America (See Table 6.5), the UN Matrix (UNDP, n.d.:68) or the checklist developed by the New Zealand Ministry for the Environment (See Table 6.6), which provides a useful guideline for assessing the relevance, validity, cost-effectiveness and simplicity of developed indicators. Table 6.5. Indicator checklist from United Way of America ▪▪ The indicator is a close reflection of the outcome ▪▪ The indicator is sufficiently precise to ensure objective measurement ▪▪ The indicator calls for practical, cost-effective data collection ▪▪ The indicator is sensitive to changes in the outcome but relatively unaffected by other changes ▪▪ The indicator can be disaggregated as needed during reporting (Source: Kusek & Rist, 2004:71)

218

Chapter 6 Table 6.6: New Zealand checklist for assessing developed indicators Policy relevant The indicator will monitor the key outcomes or policy or legislation, and measure progress towards goals. The indicator will provide information to a level appropriate for policy decision making. Analytically valid The indicator is measurable. The indicator is representative of the system being assessed. The indicator is reproducible and based on critical attributes of the system. The indicator was developed within a consistent analytical framework. The indicator is credible and robust. The indicator is helpful in relating causes, effects and responses. The indicator is responsive to environmental change. Data collection will use standard methodologies with known accuracy and precision (statistical accuracy). The indicator is able to detect human-induced change from natural variations. The indicator is responsive to environmental change, and allows trend analysis or provides a baseline for future  trends. The indicator has predictive capabilities. Cost effective The indicator requires limited numbers of parameters to be established. The indicator uses existing data and information wherever possible. The indicator is simple to monitor. Simple and easy to understand The indicator is simple to interpret, accessible and publicly appealing. The indicator clearly displays the extent of the issue. (Source: New Zealand, n.d.)

The United Nations proposes a matrix for assessing the relative usefulness of alternative indicators. Each indicator is scored individually against six criteria of good indicators, namely: • the meaning of the indicator is clear • data is easily available • the effort to collect the data is within the power of the project team and does not require additional expertise • the indicator sufficiently represents the intended outcome or result • the indicator is tangible and can be observed • the indicators are difficult to quantify, but so important that it should be considered regardless (UNDP, n.d.:68) The final score of each proposed indicator is calculated by adding the respective scores for all six criteria together, enabling the selection of the one or two indicators with the highest score for alternative performance dimension of the evaluand. This method translates the qualitative assessment of each criterion into an overall quantitative score.

219

Indicators for Evidence-based Measurement in Evaluation These guidelines provide a simple means to test an indicator set and refine it over time to ensure accurate and reliable performance measurements.

6.3.4 Indicator meta-data As indicators are often formulated as short fragments and presented as value-neutral to allow its adoption and relevance over time, the meta-data (or data about the data) that accompanies the indicator plays an important role in understanding the indicator. Metadata typically provides the full definition of the indicator as well as the past baseline and envisioned target measurement to measure the trend against the indicator over time. Figure 6.2 below compares the meta-data presented for the South Africa Mid Term Development Indicators, to the meta-data provided for the sustainability indicators adopted by the UN Council for Sustainable Development. The absence of detailed meta-data may result in different interpretations of the exact meaning of the indicator, different data sources being used to populate the indicator and a lack of understanding of the implications of measurement, as the indicator measurement becomes meaningful only within the context of the baseline and target measurements. Figure 6.2: Examples of useful indicator meta-data

Comparative Indicator Presentation Structure & Metadata UN-CSD Indicators of Sustainable Development 2007 Indicator  Definition  Unit of measurement  Position in indicator set  Policy relevance  Purpose  Relevance for SD  Conventions  Targets/standards  Linkage to other indicators

SA Presidency Mid-term Dev Indicators 2008





Methodological description  Concepts  Methodological measurements  Limitations  Alternative indicators



Data assessments  Data needed  Data available & sources  Data references  Agencies involved  Lead agency  Other

       

Indicator Goal Definition Statistics Graph Trend analysis Data source Date note

(Source: The Editors)

220

Chapter 6

6.4

Examples of indicator initiatives

6.4.1 International organisations indicator initiatives Given the widespread adoption of indicators as performance measurement tools, countless sets of indicators have been developed for various purposes. These may serve as examples of indicator practice and formulating indicators, but provides also a source in identifying relevant indicators for policies, programmes and projects. Figure 6.3: Selected indicators from the Millennium Development Goal Initiative Millennium Development Goal Indicator for Target 1.A (UN, 2013:43)

Millennium Development Goal Indicator for Target 1.A (UN, 2013:6)

(Source: UN, 2013:43)

At international level the United Nations, an organisation that promotes the mutual interest of its comprising independent states, developed and tracked indicators for measuring human development and the sustainability of development. The Millennium Development Goals programme identifies, for each of the 8 development goals, a series of indicators and related targets against which development progress may be tracked at country level. Progress reports are produced at country level, based on a master set of data that is compiled by an Inter-Agency and Expert Group which comprises representatives of the international organisations such as the International Labour Organisation, the World Bank, the World Health Organisation, continental Economic and Social Commissions and various groups and programmes under the United Nations. The report presents data on

221

Indicators for Evidence-based Measurement in Evaluation the various indicators that were identified as appropriate for monitoring progress towards each of the adopted Millennium Development Goals (UN, 2013:preface). The latest country snapshots, global and regional progress towards the goals, can be accessed at http://mdgs.un.org/unsd/mdg/Default.aspx. An example is also presented in Figure 6.3. Reports are produced annually and compare overall progress at international level and results between countries. Additional analyses for each development goal are also prepared at international, regional and country level, with further reflection and focus areas for further improvement. The Millennium Development Goals are regarded as the most successful and important international development initiative, and the success demonstrated against the selected indicators demonstrates the importance of a focused international attempt to coordinate local and external development initiatives. Its success also emphasises the importance of the use of indicators in focusing attention on what matters, critical especially when one deals with elusive concepts like “development”. Another global indicator initiative is the World Bank’s World Development Indicators (WDI) covering 214 countries from 1960 to 2011 and presenting results for a selection of indicators for different development sectors, such as economic policy, labour protection and the private sector; progress in the education, health, agriculture, government services, energy and mining and science and technology sectors; and broad development themes such as poverty, environment, climate change, social development and urban development. The specific indicators per sector and the latest progress reports are available at http://data. worldbank.org/indicator, and a small excerpt from the World Bank Table on the indicator: “ Population below international poverty lines” are presented in Figure 6.4. This indicator initiative serves as a good example of the value in developing generic indicator definitions that may be adopted by different institutions, with the data from these indicators then enabling the comparison of trends up to the international level. Adopting a focused financial perspective, Financial Soundness Indicators prepared by the International Monetary Fund assist in the analysis of financial systems with the aim of identifying the strengths and vulnerabilities of financial systems. The financial soundness reports are prepared per country, with a meta-analysis of the global financial stability and comparisons between countries also prepared at international level. Indicators included in the analysis focus on capital, income, expense and interest ratios, loans, assets and foreign exchange ratios. The most recent of these reports may be accessed online at http://fsi.imf.org/ The UN council on sustainable development provided the platform for the development of a series of international sustainable development goals, augmenting the millennium development goals to include indicators for social, economic and environmental development and sustainability. The fifteen most common indicators adopted as part of sustainable development sets include greenhouse gas emissions, education attainment, GDP per capita, collection and disposal of waste, biodiversity, development assistance, unemployment rate, life expectancy, share of energy from renewable sources, risk of poverty, air pollution, energy use and intensity, water quality, central government net debt and research and development expenditure (UN, 2009:34). The multi-sectoral nature of sustainable development indicator sets becomes evident from the combination of

222

Chapter 6 indicators listed here. Further indicators and publications on sustainable development can be found at http://sustainabledevelopment.un.org. Indicators are grouped into thematic links indicating the primary and proxy themes that they reflect on (see UN, 2007:15-20; and Figure 6.5).

Poverty rates

Figure 6.4: Selected indicators from the World Bank’s World Development Indicators Population below international poverty linesa

International poverty line in local currency

São Tomé and Príncipe Senegal Serbia Seychelles Sierra Leone Slovak Republic Slovenia South Africa Sri Lanka St. Lucia Sudan

$1.25 a day

$2 a day

2005

2005

Survey year b

7,953.9 12,726.3 372.8

596.5

42.9

68.6

5.6c

9.0 c

1,745.3

2,792.4

23.5

37.7

198.2

2005

Population Poverty gap Population below at $1.25 below $2 a day a day $1.25 a day % % %

Poverty gap at $2 a day %

Survey year b

Population Poverty gap Population below at $1.25 below $2 a day a day $1.25 a day % % %

Poverty gap at $2 a day %

..

..

..

..

2001

28.2

7.9

54.2

20.6

33.5

10.8

60.4

24.7

2011

29.6

9.1

55.2

21.9

2009